Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- audio_soundfile-mixer_12bece30.txt +5 -0
- client_rtvi-standard_0baf08ac.txt +5 -0
- client_rtvi-standard_1c78a449.txt +5 -0
- client_rtvi-standard_6334db93.txt +5 -0
- daily_rest-helpers_a3648078.txt +5 -0
- deployment_fly_7f6b1819.txt +5 -0
- deployment_pipecat-cloud_104d38f9.txt +5 -0
- features_pipecat-flows_9f18b554.txt +5 -0
- filters_function-filter_badd26ec.txt +5 -0
- filters_stt-mute_582aa4e6.txt +5 -0
- flows_pipecat-flows_7e77192f.txt +5 -0
- fundamentals_custom-frame-processor_1883c610.txt +5 -0
- fundamentals_function-calling_ddda5fcd.txt +5 -0
- fundamentals_recording-audio_b24720b6.txt +5 -0
- fundamentals_user-input-muting_ce884159.txt +5 -0
- getting-started_next-steps_8fb394a8.txt +5 -0
- getting-started_overview_eeef50f1.txt +5 -0
- image-generation_openai_6fec2f22.txt +5 -0
- links_server-reference_d423e4b9.txt +5 -0
- llm_aws_6a0dbbf2.txt +5 -0
- llm_fireworks_d900a6ea.txt +5 -0
- llm_google-vertex_251a3b87.txt +5 -0
- llm_groq_574a7686.txt +5 -0
- llm_groq_d45aa023.txt +5 -0
- llm_nim_a940263f.txt +5 -0
- llm_openrouter_fe03d51e.txt +5 -0
- llm_perplexity_762eff8f.txt +5 -0
- llm_sambanova_7c015e95.txt +5 -0
- memory_mem0_a1309820.txt +5 -0
- memory_mem0_b45c279e.txt +5 -0
- observers_debug-observer_4a2d52de.txt +5 -0
- pipecat-client-android_indexhtml_b4bd768d.txt +5 -0
- pipecat-transport-gemini-live-websocket_indexhtml_2f6300ba.txt +5 -0
- pipeline_pipeline-idle-detection_5ab87df7.txt +5 -0
- pipeline_pipeline-params_32384788.txt +5 -0
- pipeline_pipeline-task_f3dd5190.txt +5 -0
- react_components_0aff756c.txt +5 -0
- rtvi_rtvi-observer_0ca25fde.txt +5 -0
- s2s_gemini_469bd240.txt +5 -0
- s2s_gemini_6a3001c2.txt +5 -0
- serializers_exotel_51ec3374.txt +5 -0
- serializers_plivo_97adcda6.txt +5 -0
- server_utilities_76f22f96.txt +5 -0
- smart-turn_fal-smart-turn_960d4280.txt +5 -0
- stt_aws_8bf978a2.txt +5 -0
- stt_cartesia_fd324549.txt +5 -0
- stt_gladia_a7cb568d.txt +5 -0
- stt_google_8b354826.txt +5 -0
- stt_groq_c74a892d.txt +5 -0
- stt_sambanova_7965bf86.txt +5 -0
audio_soundfile-mixer_12bece30.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/soundfile-mixer#param-mixer-update-settings-frame
|
2 |
+
Title: SoundfileMixer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SoundfileMixer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing SoundfileMixer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SoundfileMixer is an audio mixer that combines incoming audio with audio from files. It supports multiple audio file formats through the soundfile library and can handle runtime volume adjustments and sound switching. Installation The soundfile mixer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[soundfile]" Constructor Parameters sound_files Mapping[str, str] required Dictionary mapping sound names to file paths. Files must be mono (single channel). default_sound str required Name of the default sound to play (must be a key in sound_files). volume float default: "0.4" Initial volume for the mixed sound. Values typically range from 0.0 to 1.0, but can go higher. loop bool default: "true" Whether to loop the sound file when it reaches the end. Control Frames MixerUpdateSettingsFrame Frame Updates mixer settings at runtime Show properties sound str Changes the current playing sound (must be a key in sound_files) volume float Updates the mixing volume loop bool Updates whether the sound should loop MixerEnableFrame Frame Enables or disables the mixer Show properties enable bool Whether mixing should be enabled Usage Example Copy Ask AI # Initialize mixer with sound files mixer = SoundfileMixer( sound_files = { "office" : "office_ambience.wav" }, default_sound = "office" , volume = 2.0 , ) # Add to transport transport = DailyTransport( room_url, token, "Audio Bot" , DailyParams( audio_out_enabled = True , audio_out_mixer = mixer, ), ) # Control mixer at runtime await task.queue_frame(MixerUpdateSettingsFrame({ "volume" : 0.5 })) await task.queue_frame(MixerEnableFrame( False )) # Disable mixing await task.queue_frame(MixerEnableFrame( True )) # Enable mixing Notes Supports any audio format that soundfile can read Automatically resamples audio files to match output sample rate Files must be mono (single channel) Thread-safe for pipeline processing Can dynamically switch between multiple sound files Volume can be adjusted in real-time Mixing can be enabled/disabled on demand SileroVADAnalyzer FrameFilter On this page Overview Installation Constructor Parameters Control Frames Usage Example Notes Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_0baf08ac.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#bot-started-speaking-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_1c78a449.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#client-message-%F0%9F%8F%84
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_6334db93.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#param-label
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_a3648078.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-expiry-time
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
deployment_fly_7f6b1819.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/fly#creating-the-pipecat-project
|
2 |
+
Title: Example: Fly.io - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Example: Fly.io - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Example: Fly.io Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Project setup Let’s explore how we can use fly.io to make our app scalable for production by spawning our Pipecat bots on virtual machines with their own resources. We mentioned before that you would ideally containerize the bot_runner.py web service and the bot.py separately. To keep this example simple, we’ll use the same container image for both services. Install the Fly CLI You can find instructions for creating and setting up your fly account here . Creating the Pipecat project We have created a template project here which you can clone. Since we’re targeting production use-cases, this example uses Daily (WebRTC) as a transport, but you can configure your bot however you like. Adding a fly.toml Add a fly.toml to the root of your project directory. Here is a basic example: fly.toml Copy Ask AI app = 'some-unique-app-name' primary_region = 'sjc' [ build ] [ env ] FLY_APP_NAME = 'some-unique-app-name' [ http_service ] internal_port = 7860 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 0 processes = [ 'app' ] [[ vm ]] memory = 512 cpu_kind = 'shared' cpus = 1 For apps with lots of users, consider what resources your HTTP service will require to meet load. We’ll define our bot.py resources later, so you can set and scale these as you like ( fly scale ... ) Environment setup Our bot requires some API keys and configuration, so create a .env in your project root: .env Copy Ask AI DAILY_API_KEY = OPENAI_API_KEY = ELEVENLABS_API_KEY = ELEVENLABS_VOICE_ID = FLY_API_KEY = FLY_APP_NAME = Of course, the exact keys you need will depend on which services you are using within your bot.py . Important: your FLY_APP_NAME should match the name of your fly instance, such as that declared in your fly.toml. The .env will allow us to test in local development, but is not included in the deployment. You’ll need to set them as Fly app secrets, which you can do via the Fly dashboard or cli. fly secrets set ... Containerize our app Our Fly deployment will need a container image; let’s create a simple Dockerfile in the root of the project: Dockerfile .dockerignore Copy Ask AI FROM python:3.11-slim-bookworm # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } You can use any base image as long as Python is available Our container does the following: Opens port 7860 to serve our bot_runner.py FastAPI service. Downloads the necessary python dependencies. Download / cache the model dependencies the bot.py requires. Runs the bot_runner.py and listens for web requests. What models are we downloading? To support voice activity detection, we’re using Silero VAD. Whilst the filesize is not huge, having each new machine download the Silero model at runtime will impact bootup time. Instead, we include the model as part of the Docker image so it’s cached and available. You could, of course, also attach a network volume to each instance if you plan to include larger files as part of your deployment and don’t want to bloat the size of your image. Launching new machines in bot_runner.py When a user starts a session with our Pipecat bot, we want to launch a new machine on fly.io with it’s own system resources. Let’s grab the bot_runner.py from the example repo here . This runner differs from others in the Pipecat repo; we’ve added a new method that sends a REST request to Fly to provision a new machine for the session. This method is invoked as part of the /start_bot endpoint: bot_runner.py Copy Ask AI FLY_API_HOST = os.getenv( "FLY_API_HOST" , "https://api.machines.dev/v1" ) FLY_APP_NAME = os.getenv( "FLY_APP_NAME" , "your-fly-app-name" ) FLY_API_KEY = os.getenv( "FLY_API_KEY" , "" ) FLY_HEADERS = { 'Authorization' : f "Bearer { FLY_API_KEY } " , 'Content-Type' : 'application/json' } def spawn_fly_machine ( room_url : str , token : str ): # Use the same image as the bot runner res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Unable to get machine info from Fly: { res.text } " ) image = res.json()[ 0 ][ 'config' ][ 'image' ] # Machine configuration cmd = f "python3 bot.py -u { room_url } -t { token } " cmd = cmd.split() worker_props = { "config" : { "image" : image, "auto_destroy" : True , "init" : { "cmd" : cmd }, "restart" : { "policy" : "no" }, "guest" : { "cpu_kind" : "shared" , "cpus" : 1 , "memory_mb" : 1024 # Note: 512 is just enough to run VAD, but 1gb is better } }, } # Spawn a new machine instance res = requests.post( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS , json = worker_props) if res.status_code != 200 : raise Exception ( f "Problem starting a bot worker: { res.text } " ) # Wait for the machine to enter the started state vm_id = res.json()[ 'id' ] res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines/ { vm_id } /wait?state=started" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Bot was unable to enter started state: { res.text } " ) We want to make sure the machine started ok before returning any data to the user. Fly launches machines pretty fast, but will timeout if things take longer than they should. Depending on your transport method, you may want to optimistically return a response to the user, so they can join the room and poll for the status of their bot. Launch the Fly project Getting your bot on Fly is as simple as: fly launch or fly launch --org orgname if you’re part of a team. This will step you through some configuration, and build and deploy your Docker image. Be sure to configure your app secrets with the necessary environment variables once the deployment has complete. Assuming all goes well, you can update with any changes with fly deploy . Test it out Start a new bot instance by sending a POST request to https://your-fly-url.fly.dev/start_bot . All being well, this will return a room URL and token. A nice feature of Fly is the ability to monitor your machines (with live logs) via their dashboard: https://fly.io/apps/YOUR-APP_NAME/machines This is really helpful for monitoring the status of your spawned machine, and debugging if things do not work as expected. This example is configured to expire after 5 minutes. The bot process is also configured to exit after the user leaves the room. This is a good way to ensure we don’t have any hanging VMs, although you’ll likely need to configure this behaviour this to meet your own needs. You’ll also notice that we set restart policy to no . This prevents the machine attempting to restart after the session has concluded and the process exits. Important considerations This example does little in the way of load balancing or app security. Indeed, a user can spawn a new machine on your account simply by sending a POST request to the bot_runner.py . Be sure to configure a maximum number of instances, or authenticate requests to avoid costs getting out of control. We also deployed our bot.py on a machine with the same image as our bot_runner.py . To optimize container file sizes and increase security, consider individual images that only deploy resources they require. Example: Pipecat Cloud Example: Cerebrium On this page Project setup Install the Fly CLI Creating the Pipecat project Adding a fly.toml Environment setup Containerize our app What models are we downloading? Launching new machines in bot_runner.py Launch the Fly project Test it out Important considerations Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pipecat-cloud_104d38f9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/pipecat-cloud#project-structure
|
2 |
+
Title: Example: Pipecat Cloud - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Example: Pipecat Cloud - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Example: Pipecat Cloud Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Cloud is a managed platform for hosting and scaling Pipecat agents in production. Prerequisites Before you begin, you’ll need: A Pipecat Cloud account Docker installed Python 3.10+ The Pipecat Cloud CLI: pip install pipecatcloud Quickstart Guide Follow a step-by-step guided experience to deploy your first agent Choosing a starting point Pipecat Cloud offers several ways to get started: Use a starter template - Pre-built agent configurations for common use cases Build from the base image - Create a custom agent using the official base image Clone the starter project - A bare-bones project template to customize Starter templates Pipecat Cloud provides several ready-made templates for common agent types: Template Description voice Voice conversation agent with STT, LLM and TTS twilio Telephony agent that works with Twilio natural_conversation Agent focused on natural dialogue flow, allowing a user time to think openai_realtime Agent using OpenAI’s Realtime API gemini_multimodal_live Multimodal agent using Google’s Gemini Multimodal Live API vision Computer vision agent that can analyze images These templates include a functioning implementation and Dockerfile. You can use them directly: Copy Ask AI # Clone the repository git clone https://github.com/daily-co/pipecat-cloud-images.git # Navigate to a starter template cd pipecat-cloud-images/pipecat-starters/voice # Customize the agent for your needs Project structure Whether using a starter template or building from scratch, a basic Pipecat Cloud project typically includes: Copy Ask AI my-agent/ ├── bot.py # Your Pipecat pipeline ├── Dockerfile # Container definition ├── requirements.txt # Python dependencies └── pcc-deploy.toml # Deployment config (optional) Agent implementation with bot.py Your agent’s bot.py code must include a specific bot() function that serves as the entry point for Pipecat Cloud. This function has different signatures depending on the transport method: For WebRTC/Daily transports Copy Ask AI async def bot ( args : DailySessionArguments): """Main bot entry point compatible with the FastAPI route handler. Args: config: The configuration object from the request body room_url: The Daily room URL token: The Daily room token session_id: The session ID for logging """ logger.info( f "Bot process initialized { args.room_url } { args.token } " ) try : await main(args.room_url, args.token) logger.info( "Bot process completed" ) except Exception as e: logger.exception( f "Error in bot process: { str (e) } " ) raise For WebSocket transports (e.g., Twilio) Copy Ask AI async def bot ( args : WebSocketSessionArguments): """Main bot entry point for WebSocket connections. Args: ws: The WebSocket connection session_id: The session ID for logging """ logger.info( "WebSocket bot process initialized" ) try : await main(args.websocket) logger.info( "WebSocket bot process completed" ) except Exception as e: logger.exception( f "Error in WebSocket bot process: { str (e) } " ) raise Complete Example: Voice Bot View a complete WebRTC voice bot implementation Complete Example: Twilio Bot View a complete Twilio WebSocket bot implementation Example Dockerfile Pipecat Cloud provides base images that include common dependencies for Pipecat agents: Copy Ask AI FROM dailyco/pipecat-base:latest COPY ./requirements.txt requirements.txt RUN pip install --no-cache-dir --upgrade -r requirements.txt COPY ./bot.py bot.py This Dockerfile: Uses the official Pipecat base image Installs Python dependencies from requirements.txt Copies your bot.py file to the container The base image (dailyco/pipecat-base) includes the HTTP API server, session management, and platform integration required to run on Pipecat Cloud. See the base image source code for details. Building and pushing your Docker image With your project structure in place, build and push your Docker image: Copy Ask AI # Build the image docker build --platform=linux/arm64 -t my-first-agent:latest . # Tag the image for your repository docker tag my-first-agent:latest your-username/my-first-agent:0.1 # Push the tagged image to the repository docker push your-username/my-first-agent:0.1 Pipecat Cloud requires ARM64 images. Make sure to specify the --platform=linux/arm64 flag when building. Managing secrets Your agent likely requires API keys and other credentials. Create a secret set to store them securely: Copy Ask AI # Create an .env file with your credentials touch .env # Create a secret set from the file pcc secrets set my-first-agent-secrets --file .env Deploying your agent Deploy your agent with the CLI: Copy Ask AI pcc deploy my-first-agent your-username/my-first-agent:0.1 --secret-set my-first-agent-secrets For a more maintainable approach, create a pcc-deploy.toml file: Copy Ask AI agent_name = "my-first-agent" image = "your-username/my-first-agent:0.1" secret_set = "my-first-agent-secrets" image_credentials = "my-first-agent-image-credentials" # For private repos [ scaling ] min_instances = 0 Then deploy using: Copy Ask AI pcc deploy Starting a session Once deployed, you can start a session with your agent: Copy Ask AI # Create and set a public access key if needed pcc organizations keys create pcc organizations keys use # Start a session using Daily for WebRTC pcc agent start my-first-agent --use-daily This will open a Daily room where you can interact with your agent. Checking deployment status Monitor your agent deployment: Copy Ask AI # Check deployment status pcc agent status my-first-agent # View deployment logs pcc agent logs my-first-agent Next steps Agent Images Learn about containerizing your agent Secrets Managing sensitive information Scaling Configure scaling for production workloads Active Sessions Understand how sessions work Deployment pattern Example: Fly.io On this page Prerequisites Choosing a starting point Starter templates Project structure Agent implementation with bot.py For WebRTC/Daily transports For WebSocket transports (e.g., Twilio) Example Dockerfile Building and pushing your Docker image Managing secrets Deploying your agent Starting a session Checking deployment status Next steps Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_9f18b554.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#when-to-use-static-vs-dynamic-flows
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
filters_function-filter_badd26ec.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/function-filter#usage-example
|
2 |
+
Title: FunctionFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
FunctionFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters FunctionFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FunctionFilter is a flexible processor that uses a custom async function to determine which frames to pass through. This allows for complex, dynamic filtering logic beyond simple type checking. Constructor Parameters filter Callable[[Frame], Awaitable[bool]] required Async function that examines each frame and returns True to allow it or False to filter it out direction FrameDirection default: "FrameDirection.DOWNSTREAM" Which direction of frames to filter (DOWNSTREAM or UPSTREAM) Functionality When a frame passes through the processor: System frames and end frames are always passed through Frames moving in a different direction than specified are always passed through Other frames are passed to the filter function If the filter function returns True, the frame is passed through Output Frames The processor conditionally passes through frames based on: Frame type (system frames and end frames always pass) Frame direction (only filters in the specified direction) Result of the custom filter function Usage Example Copy Ask AI from pipecat.frames.frames import TextFrame, Frame from pipecat.processors.filters import FunctionFilter from pipecat.processors.frame_processor import FrameDirection # Create filter that only allows TextFrames with more than 10 characters async def long_text_filter ( frame : Frame) -> bool : if isinstance (frame, TextFrame): return len (frame.text) > 10 return False # Apply filter to downstream frames only text_length_filter = FunctionFilter( filter = long_text_filter, direction = FrameDirection. DOWNSTREAM ) # Add to pipeline pipeline = Pipeline([ source, text_length_filter, # Filters out short text frames destination ]) Frame Flow Notes Provides maximum flexibility for complex filtering logic Can incorporate dynamic conditions that change at runtime Only filters frames moving in the specified direction Always passes through system frames for proper pipeline operation Can be used to create sophisticated content-based filters Supports async filter functions for complex processing FrameFilter IdentityFilter On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_stt-mute_582aa4e6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/stt-mute#constructor-parameters
|
2 |
+
Title: STTMuteFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview STTMuteFilter is a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during specified conditions (e.g., bot speech, function calls), providing a cleaner conversation flow. The processor supports multiple simultaneous strategies for when to mute the STT service, making it flexible for different use cases. Want to try it out? Check out the STTMuteFilter foundational demo Constructor Parameters config STTMuteConfig required Configuration object that defines the muting strategies and optional custom logic stt_service Optional[STTService] required The STT service to control (deprecated, will be removed in a future version) Configuration The processor is configured using STTMuteConfig , which determines when and how the STT service should be muted: strategies set[STTMuteStrategy] Set of muting strategies to apply should_mute_callback Callable[[STTMuteFilter], Awaitable[bool]] default: "None" Optional callback for custom muting logic (required when strategy is CUSTOM ) Muting Strategies STTMuteConfig accepts a set of these STTMuteStrategy values: FIRST_SPEECH STTMuteStrategy Mute only during the bot’s first speech (typically during introduction) MUTE_UNTIL_FIRST_BOT_COMPLETE STTMuteStrategy Start muted and remain muted until first bot speech completes. Useful when bot speaks first and you want to ensure its first response cannot be interrupted. FUNCTION_CALL STTMuteStrategy Mute during LLM function calls (e.g., API requests, external service calls) ALWAYS STTMuteStrategy Mute during all bot speech CUSTOM STTMuteStrategy Use custom logic provided via callback to determine when to mute. The callback is invoked when the bot is speaking and can use application state to decide whether to mute. When the bot stops speaking, unmuting occurs automatically if no other strategy requires muting. MUTE_UNTIL_FIRST_BOT_COMPLETE and FIRST_SPEECH strategies should not be used together as they handle the first bot speech differently. Input Frames BotStartedSpeakingFrame Frame Indicates bot has started speaking BotStoppedSpeakingFrame Frame Indicates bot has stopped speaking FunctionCallInProgressFrame Frame Indicates a function call has started FunctionCallResultFrame Frame Indicates a function call has completed StartInterruptionFrame Frame User interruption start event (suppressed when muted) StopInterruptionFrame Frame User interruption stop event (suppressed when muted) UserStartedSpeakingFrame Frame Indicates user has started speaking (suppressed when muted) UserStoppedSpeakingFrame Frame Indicates user has stopped speaking (suppressed when muted) Output Frames STTMuteFrame Frame Control frame to mute/unmute the STT service All input frames are passed through except VAD-related frames (interruptions and user speaking events) when muted. Usage Examples Basic Usage (Mute During Bot’s First Speech) Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FIRST_SPEECH }) ) pipeline = Pipeline([ transport.input(), stt_mute_filter, # Add before STT service stt, # ... rest of pipeline ]) Mute Until First Bot Response Completes Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE }) ) This ensures no user speech is processed until after the bot’s first complete response. Always Mute During Bot Speech Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. ALWAYS }) ) Custom Muting Logic The CUSTOM strategy allows you to control muting based on application state when the bot is speaking. The callback will be invoked whenever the bot is speaking, and your logic decides whether to mute: Copy Ask AI # Create a state manager class SessionState : def __init__ ( self ): self .session_ending = False session_state = SessionState() # Callback function that determines whether to mute async def session_state_mute_logic ( stt_filter : STTMuteFilter) -> bool : # Return True to mute, False otherwise # This is called when the bot is speaking return session_state.session_ending # Configure filter with CUSTOM strategy stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. CUSTOM }, should_mute_callback = session_state_mute_logic ) ) # Later, when you want to trigger muting (e.g., during session timeout): async def handle_session_timeout (): # Update state that will be checked by the callback session_state.session_ending = True # Send goodbye message goodbye_message = "Thank you for using our service. This session is now ending." await pipeline.push_frame(TTSSpeakFrame( text = goodbye_message)) # The system will automatically mute during this message because: # 1. Bot starts speaking, triggering the callback # 2. Callback returns True (session_ending is True) # 3. When bot stops speaking, unmuting happens automatically Combining Multiple Strategies Copy Ask AI async def custom_mute_logic ( processor : STTMuteFilter) -> bool : # Example: Mute during business hours only current_hour = datetime.now().hour return 9 <= current_hour < 17 stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FUNCTION_CALL , # Mute during function calls STTMuteStrategy. CUSTOM , # And during business hours STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE # And until first bot speech completes }, should_mute_callback = custom_mute_logic ) ) Frame Flow Notes Combines STT muting and interruption control into a single concept Muting prevents both transcription and interruptions Multiple strategies can be active simultaneously CUSTOM strategy callback is only invoked when the bot is speaking Unmuting happens automatically when bot speech ends (if no other strategy requires muting) Placed before STT service in pipeline Maintains conversation flow during bot speech and function calls Efficient state tracking for minimal overhead NullFilter WakeCheckFilter On this page Overview Constructor Parameters Configuration Muting Strategies Input Frames Output Frames Usage Examples Basic Usage (Mute During Bot’s First Speech) Mute Until First Bot Response Completes Always Mute During Bot Speech Custom Muting Logic Combining Multiple Strategies Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_7e77192f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-flow-error
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_custom-frame-processor_1883c610.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/custom-frame-processor#critical-responsibility%3A-frame-forwarding
|
2 |
+
Title: Custom FrameProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Custom FrameProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Custom FrameProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat’s architecture is made up of a Pipeline, FrameProcessors, and Frames. See the Core Concepts for a full review. From that architecture, recall that FrameProcessors are the workers in the pipeline that receive frames and complete actions based on the frames received. Pipecat comes with many FrameProcessors built in. These consist of services, like OpenAILLMService or CartesiaTTSService , utilities, like UserIdleProcessor , and other things. Largely, you can build most of your application with these built-in FrameProcessors, but commonly, your application code may require custom frame processing logic. For example, you may want to perform an action as a result of a frame that’s pushed in the pipeline. Example: ImageSyncAggregator Let’s look at an example custom FrameProcessor that synchronizes images with bot speech: Copy Ask AI class ImageSyncAggregator ( FrameProcessor ): def __init__ ( self , speaking_path : str , waiting_path : str ): super (). __init__ () self ._speaking_image = Image.open(speaking_path) self ._speaking_image_format = self ._speaking_image.format self ._speaking_image_bytes = self ._speaking_image.tobytes() self ._waiting_image = Image.open(waiting_path) self ._waiting_image_format = self ._waiting_image.format self ._waiting_image_bytes = self ._waiting_image.tobytes() async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) if isinstance (frame, BotStartedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._speaking_image_bytes, size = ( 1024 , 1024 ), format = self ._speaking_image_format, ) ) elif isinstance (frame, BotStoppedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._waiting_image_bytes, size = ( 1024 , 1024 ), format = self ._waiting_image_format, ) ) await self .push_frame(frame) This example custom FrameProcessor looks for BotStartedSpeakingFrame and BotStoppedSpeakingFrame . When it sees a BotStartedSpeakingFrame , it will show an image that says the bot is speaking. When it sees a BotStoppedSpeakingFrame , it will show an image that says the bot is not speaking. See this working example using the ImageSyncAggregator FrameProcessor Adding to a Pipeline This custom FrameProcessor can be added to a Pipeline just before the transport output: Copy Ask AI # Create and initialize the custom FrameProcessor image_sync_aggregator = ImageSyncAggregator( os.path.join(os.path.dirname( __file__ ), "assets" , "speaking.png" ), os.path.join(os.path.dirname( __file__ ), "assets" , "waiting.png" ), ) pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, image_sync_aggregator, # Our custom FrameProcessor transport.output(), context_aggregator.assistant(), ] ) With this positioning, the ImageSyncAggregator FrameProcessor will receive the BotStartedSpeakingFrame and BotStoppedSpeakingFrame outputted by the TTS processor and then push its own frame— OutputImageRawFrame —to the output transport. Key Requirements FrameProcessors must inherit from the base FrameProcessor class. This ensures that your custom FrameProcessor will correctly handle frames like StartFrame , EndFrame , StartInterruptionFrame without having to write custom logic for those frames. This inheritance also provides it with the ability to process_frame() and push_frame() : process_frame() is what allows the FrameProcessor to receive frames and add custom conditional logic based on the frames that are received. push_frame() allows the FrameProcessor to push frames to the pipeline. Normally, frames are pushed DOWNSTREAM, but based on which processors need the output, you can also push UPSTREAM or in both directions. Essential Implementation Details To ensure proper base class inheritance, it’s critical to include: super().__init__() in your __init__ method await super().process_frame(frame, direction) in your process_frame() method Copy Ask AI class MyCustomProcessor ( FrameProcessor ): def __init__ ( self , ** kwargs ): super (). __init__ ( ** kwargs) # ✅ Required # Your initialization code here async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) # ✅ Required # Your custom frame processing logic here if isinstance (frame, SomeSpecificFrame): # Handle the frame pass await self .push_frame(frame) # ✅ Required - pass frame through Critical Responsibility: Frame Forwarding FrameProcessors receive all frames that are pushed through the pipeline. This gives them a lot of power, but also a great responsibility. Critically, they must push all frames through the pipeline; if they don’t, they block frames from moving through the Pipeline, which will cause issues in how your application functions. You can see this at work in the ImageSyncAggregator ’s process_frame() method. It handles both bot speaking frames and also has an await self.push_frame(frame) which pushes the frame through to the next processor in the pipeline. Frame Direction When pushing frames, you can specify the direction: Copy Ask AI # Push downstream (default) await self .push_frame(frame) await self .push_frame(frame, FrameDirection. DOWNSTREAM ) # Push upstream await self .push_frame(frame, FrameDirection. UPSTREAM ) Most custom FrameProcessors will push frames downstream, but upstream can be useful for sending control frames or error notifications back up the pipeline. Best Practices Always call the parent methods : Use super().__init__() and await super().process_frame() Forward all frames : Make sure every frame is pushed through with await self.push_frame(frame) Handle frames conditionally : Use isinstance() checks to handle specific frame types Use proper error handling : Wrap risky operations in try/catch blocks Position carefully in pipeline : Consider where in the pipeline your processor needs to be to receive the right frames With these patterns, you can create powerful custom FrameProcessors that extend Pipecat’s capabilities for your specific use case. Context Management Detecting Idle Users On this page Example: ImageSyncAggregator Adding to a Pipeline Key Requirements Essential Implementation Details Critical Responsibility: Frame Forwarding Frame Direction Best Practices Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_ddda5fcd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#next-steps
|
2 |
+
Title: Function Calling - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response Implementation 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) Function Handler Details FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results function_name str Name of the function being called tool_call_id str Unique identifier for the function call arguments Mapping[str, Any] Arguments passed by the LLM to the function llm LLMService Reference to the LLM service that initiated the call context OpenAILLMContext Current conversation context result_callback FunctionCallResultCallback Async callback function to return results Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. Properties run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_recording-audio_b24720b6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/recording-audio#option-1%3A-record-using-your-transport-service-provider
|
2 |
+
Title: Recording Conversation Audio - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Recording Conversation Audio - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Recording Conversation Audio Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview Recording audio from conversations provides valuable data for analysis, debugging, and quality control. You have two options for how to record with Pipecat: Option 1: Record using your transport service provider Record without writing custom code by using your transport provider’s recording capabilities. In addition to saving you development time, some providers offer unique recording capabilities. Refer to your service provider’s documentation to learn more. Option 2: Create your own recording pipeline Pipecat’s AudioBufferProcessor makes it easy to capture high-quality audio recordings of both the user and bot during interactions. Opt for this approach if you want more control over your recording. This guide focuses on how to recording using the AudioBufferProcessor , including high-level guidance for how to set up post-processing jobs for longer recordings. How the AudioBufferProcessor Works The AudioBufferProcessor captures audio by: Collecting audio frames from both the user (input) and bot (output) Emitting events with recorded audio data Providing options for composite or separate track recordings Add the processor to your pipeline after the transport.output() to capture both the user audio and the bot audio as it’s spoken. Audio Recording Options The AudioBufferProcessor offers several configuration options: Composite recording : Combined audio from both user and bot Track-level recording : Separate audio files for user and bot Turn-based recording : Individual audio clips for each speaking turn Mono or stereo output : Single channel mixing or two-channel separation Basic Implementation Step 1: Create an Audio Buffer Processor Initialize the audio buffer processor with your desired configuration: Copy Ask AI from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor # Create audio buffer processor with default settings audiobuffer = AudioBufferProcessor( num_channels = 1 , # 1 for mono, 2 for stereo (user left, bot right) enable_turn_audio = False , # Enable per-turn audio recording user_continuous_stream = True , # User has continuous audio stream ) Step 2: Add to Your Pipeline Place the processor in your pipeline after all audio-producing components: Copy Ask AI pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), audiobuffer, # Add after all audio components context_aggregator.assistant(), ] ) Step 3: Start Recording Explicitly start recording when needed, typically when a session begins: Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( f "Client connected" ) # Important: Start recording explicitly await audiobuffer.start_recording() # Continue with session initialization... You must call start_recording() explicitly to begin capturing audio. The processor won’t record automatically when initialized. Step 4: Handle Audio Data Register an event handler to process audio data: Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio , sample_rate , num_channels ): # Save or process the composite audio timestamp = datetime.datetime.now().strftime( "%Y%m %d _%H%M%S" ) filename = f "recordings/conversation_ { timestamp } .wav" # Create the WAV file with wave.open(filename, "wb" ) as wf: wf.setnchannels(num_channels) wf.setsampwidth( 2 ) # 16-bit audio wf.setframerate(sample_rate) wf.writeframes(audio) logger.info( f "Saved recording to { filename } " ) If recording separate tracks, you can use the on_track_audio_data event handler to save user and bot audio separately. Recording Longer Conversations For conversations that last a few minutes, it may be sufficient to just buffer the audio in memory. However, for longer sessions, storing audio in memory poses two challenges: Memory Usage : Long recordings can consume significant memory, leading to potential crashes or performance issues. Conversation Loss : If the application crashes or the connection drops, you may lose all recorded audio. Instead, consider using a chunked approach to record audio in manageable segments. This allows you to periodically save audio data to disk or upload it to cloud storage, reducing memory usage and ensuring data persistence. Chunked Recording Set a reasonable buffer_size to trigger periodic uploads: Copy Ask AI # 30-second chunks (recommended for most use cases) SAMPLE_RATE = 24000 CHUNK_DURATION = 30 # seconds audiobuffer = AudioBufferProcessor( sample_rate = SAMPLE_RATE , buffer_size = SAMPLE_RATE * 2 * CHUNK_DURATION # 2 bytes per sample (16-bit) ) chunk_counter = 0 @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_chunk_ready ( buffer , user_audio , bot_audio , sample_rate , num_channels ): global chunk_counter # Upload or save individual chunks await upload_audio_chunk( f "user_chunk_ { chunk_counter :03d} .wav" , user_audio, sample_rate, 1 ) await upload_audio_chunk( f "bot_chunk_ { chunk_counter :03d} .wav" , bot_audio, sample_rate, 1 ) chunk_counter += 1 Multipart Upload Strategy For cloud storage, consider using multipart uploads to stream audio chunks: Conceptual Approach: Initialize multipart upload when recording starts Upload chunks as parts when buffers fill (every ~30 seconds) Complete multipart upload when recording ends Post-process to create final WAV file(s) Benefits: Memory efficient for long sessions Fault tolerant (no data loss if connection drops) Enables real-time processing and analysis Parallel upload of multiple tracks Post-Processing Pipeline After uploading chunks, create final audio files using tools like FFmpeg: Concatenating Audio Files: Copy Ask AI # Method 1: Simple concatenation (same format) ffmpeg -i "concat:chunk_001.wav|chunk_002.wav|chunk_003.wav" -acodec copy final.wav # Method 2: Using file list (recommended for many chunks) # Create filelist.txt with format: # file 'chunk_001.wav' # file 'chunk_002.wav' # ... ffmpeg -f concat -safe 0 -i filelist.txt -c copy final_recording.wav Automation Considerations: Use sequence numbers in chunk filenames for proper ordering Include metadata (sample rate, channels, duration) with each chunk Implement retry logic for failed uploads Consider using cloud functions/lambdas for automatic post-processing Next Steps Try the Audio Recording Example Explore a complete working example that demonstrates how to record and save both composite and track-level audio with Pipecat. AudioBufferProcessor Reference Read the complete API reference documentation for advanced configuration options and event handlers. Consider implementing audio recording in your application for quality assurance, training data collection, or creating conversation archives. The recorded audio can be stored locally, uploaded to cloud storage, or processed in real-time for further analysis. Muting User Input Recording Transcripts On this page Overview Option 1: Record using your transport service provider Option 2: Create your own recording pipeline How the AudioBufferProcessor Works Audio Recording Options Basic Implementation Step 1: Create an Audio Buffer Processor Step 2: Add to Your Pipeline Step 3: Start Recording Step 4: Handle Audio Data Recording Longer Conversations Chunked Recording Multipart Upload Strategy Post-Processing Pipeline Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_user-input-muting_ce884159.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/user-input-muting#step-2%3A-add-to-your-pipeline
|
2 |
+
Title: User Input Muting with STTMuteFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
User Input Muting with STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals User Input Muting with STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview In conversational applications, there are moments when you don’t want to process user speech, such as during bot introductions or while executing function calls. Pipecat’s STTMuteFilter lets you selectively “mute” user input based on different conversation states. When to Use STTMuteFilter Common scenarios for muting user input include: During introductions : Prevent the bot from being interrupted during its initial greeting While processing functions : Block input while the bot is retrieving external data During bot speech : Reduce false transcriptions while the bot is speaking For guided conversations : Create more structured interactions with clear turn-taking How It Works The STTMuteFilter works by blocking specific user-related frames from flowing through your pipeline. When muted, it filters: Voice activity detection (VAD) events Interruption signals Raw audio input frames This prevents the Speech-to-Text service from receiving and processing the user’s speech during muted periods. The filter must be placed between your Transport and STT service in the pipeline to work correctly. Mute Strategies The STTMuteFilter supports several strategies for determining when to mute user input: FIRST_SPEECH Mute only during the bot’s first speech utterance. Useful for introductions when you want the bot to complete its greeting before the user can speak. MUTE_UNTIL_FIRST_BOT_COMPLETE Start muted and remain muted until the first bot utterance completes. Ensures the bot’s initial instructions are fully delivered. FUNCTION_CALL Mute during function calls. Prevents users from speaking while the bot is processing external data requests. ALWAYS Mute whenever the bot is speaking. Creates a strict turn-taking conversation pattern. CUSTOM Use custom logic via callback to determine when to mute. Provides maximum flexibility for complex muting rules. The FIRST_SPEECH and MUTE_UNTIL_FIRST_BOT_COMPLETE strategies should not be used together as they handle the first bot speech differently. Basic Implementation Step 1: Configure the Filter First, create a configuration for the STTMuteFilter : Copy Ask AI from pipecat.processors.filters.stt_mute_filter import STTMuteConfig, STTMuteFilter, STTMuteStrategy # Configure with one or more strategies stt_mute_processor = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE , STTMuteStrategy. FUNCTION_CALL , } ), ) Step 2: Add to Your Pipeline Place the filter between your transport input and STT service: Copy Ask AI pipeline = Pipeline( [ transport.input(), # Transport user input stt_mute_processor, # Add the mute processor before STT stt, # Speech-to-text service context_aggregator.user(), # User responses llm, # LLM tts, # Text-to-speech transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) Best Practices Place the filter correctly : Always position STTMuteFilter between transport input and STT Choose strategies wisely : Select the minimal set of strategies needed for your use case Test user experience : Excessive muting can frustrate users; balance control with usability Consider feedback : Provide visual cues when the user is muted to improve the experience Next Steps Try the STTMuteFilter Example Explore a complete working example that demonstrates how to use STTMuteFilter to control user input during bot speech and function calls. STTMuteFilter Reference Read the complete API reference documentation for advanced configuration options and muting strategies. Experiment with different muting strategies to find the right balance for your application. For advanced scenarios, try implementing custom muting logic based on specific conversation states or content. Function Calling Recording Audio On this page Overview When to Use STTMuteFilter How It Works Mute Strategies Basic Implementation Step 1: Configure the Filter Step 2: Add to Your Pipeline Best Practices Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
getting-started_next-steps_8fb394a8.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/getting-started/next-steps#foundational-examples
|
2 |
+
Title: Next Steps & Examples - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Next Steps & Examples - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Next Steps & Examples Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples After completing the quickstart, you’re ready to explore more advanced Pipecat features. This guide will help you choose the right pathway based on your project goals. Choose Your Development Path Learn the Fundamentals Master Pipecat’s core concepts through progressive examples Build Web Applications Create client/server voice applications with web interfaces Develop Phone Bots Build voice bots accessible through regular phone calls Structure Conversations Design complex conversational flows with Pipecat Flows Explore Use Cases See how others are using Pipecat in production Join the Community Connect with other developers building with Pipecat Foundational Examples If you’re new to Pipecat, we recommend exploring our foundational examples that demonstrate core concepts in a progressive manner: Foundational Examples Repository A series of step-by-step examples that build upon each other to teach Pipecat fundamentals. Web & Client/Server Applications For building web-based conversational applications with separate client and server components: Simple Chatbot Example A complete client/server example demonstrating how to connect your Pipecat bot to JS, React, iOS, and Android clients. Telephony Integrations To make your bots accessible via regular phone calls: Twilio Chatbot Build phone-accessible bots using Twilio’s telephony platform Telnyx Chatbot Create voice bots using Telnyx’s communication API Structured Conversations with Pipecat Flows For complex applications requiring structured dialogs: Pipecat Flows A powerful extension to Pipecat that allows you to build structured conversations and manage the conversation context. Try the Flow Editor to visually design your conversation flows. Real-world Examples Get inspired by applications others have built with Pipecat: Storytelling Bot AI-powered storyteller that creates and narrates interactive stories Patient Intake System Automated form-filling system for healthcare applications StudyPal Interactive study assistant that helps with learning Vision-enabled Bot Multimodal assistant that can see and discuss images Community Resources Join the Pipecat Discord Connect with other Pipecat developers to get help, share your projects, and stay updated on new features and releases. Core Concepts On this page Choose Your Development Path Foundational Examples Web & Client/Server Applications Telephony Integrations Structured Conversations with Pipecat Flows Real-world Examples Community Resources Assistant Responses are generated using AI and may contain mistakes.
|
getting-started_overview_eeef50f1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/getting-started/overview#what-you-can-build
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
image-generation_openai_6fec2f22.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/image-generation/openai#constructor-parameters
|
2 |
+
Title: OpenAI Image Generation - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenAI Image Generation - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Image Generation OpenAI Image Generation Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview OpenAIImageGenService provides high-quality image generation capabilities using OpenAI’s DALL-E models. It transforms text prompts into images with various size options and model configurations. Installation No additional installation is required for the OpenAIImageGenService as it is part of the Pipecat AI package. You’ll also need an OpenAI API key for authentication. Configuration Constructor Parameters api_key str required OpenAI API key for authentication base_url str default: "None" Optional base URL for OpenAI API requests aiohttp_session aiohttp.ClientSession required HTTP session for making requests image_size str required Image dimensions - one of “256x256”, “512x512”, “1024x1024”, “1792x1024”, “1024x1792” model str default: "dall-e-3" OpenAI model identifier for image generation Input The service accepts text prompts through its image generation pipeline. Output Frames URLImageRawFrame url string Generated image URL from OpenAI image bytes Raw image data size tuple Image dimensions (width, height) format string Image format (e.g., ‘JPEG’) ErrorFrame error string Error information if generation fails Usage Example Copy Ask AI import aiohttp from pipecat.pipeline.pipeline import Pipeline from pipecat.services.openai.image import OpenAIImageGenService # Create an aiohttp session aiohttp_session = aiohttp.ClientSession() # Configure service image_gen = OpenAIImageGenService( api_key = "your-openai-api-key" , aiohttp_session = aiohttp_session, image_size = "1024x1024" , model = "dall-e-3" ) # Use in pipeline main_pipeline = Pipeline( [ transport.input(), context_aggregator.user(), llm_service, image_gen, tts_service, transport.output(), context_aggregator.assistant(), ] ) Frame Flow Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration API response metrics Model Support OpenAI’s image generation service offers different model variants: Model ID Description dall-e-3 Latest DALL-E model with higher quality and better prompt following dall-e-2 Previous generation model with good quality and lower cost Image Size Options Size Option Aspect Ratio Description 256x256 1:1 Small square image 512x512 1:1 Medium square image 1024x1024 1:1 Large square image 1792x1024 16:9 Horizontal/landscape orientation 1024x1792 9:16 Vertical/portrait orientation Error Handling Copy Ask AI try : async for frame in image_gen.run_image_gen(prompt): if isinstance (frame, ErrorFrame): logger.error( f "Image generation error: { frame.error } " ) else : # Process successful image generation pass except Exception as e: logger.error( f "Unexpected error during image generation: { e } " ) Google Imagen Simli On this page Overview Installation Configuration Constructor Parameters Input Output Frames URLImageRawFrame ErrorFrame Usage Example Frame Flow Metrics Support Model Support Image Size Options Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
links_server-reference_d423e4b9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/links/server-reference
|
2 |
+
Title: Pipecat API Reference — pipecat-ai documentation
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat API Reference — pipecat-ai documentation Pipecat API Reference View page source Pipecat API Reference Welcome to the Pipecat API reference. Use the navigation on the left to browse modules, or search using the search box. New to Pipecat? Check out the main documentation for tutorials, guides, and client SDK information. Quick Links GitHub Repository Join our Community
|
llm_aws_6a0dbbf2.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/aws#overview
|
2 |
+
Title: AWS Bedrock - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Bedrock - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM AWS Bedrock Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWS Bedrock LLM service provides access to Amazon’s foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon’s managed AI service. API Reference Complete API documentation and method details AWS Bedrock Docs Official AWS Bedrock documentation and features Example Code Working example with function calling Installation To use AWS Bedrock services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Set up an IAM user with Amazon Bedrock access in your AWS account to obtain credentials. Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.aws.llm import AWSBedrockLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AWSBedrockLLMService( aws_region = "us-west-2" , model = "us.anthropic.claude-3-5-haiku-20241022-v1:0" , params = AWSBedrockLLMService.InputParams( temperature = 0.7 , ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Register function handler async def get_current_weather ( params ): location = params.arguments[ "location" ] format_type = params.arguments[ "format" ] result = { "conditions" : "sunny" , "temperature" : "75" , "unit" : format_type} await params.result_callback(result) llm.register_function( "get_current_weather" , get_current_weather) # Create context with system message messages = [ { "role" : "system" , "content" : "You are a helpful assistant with access to weather information." } ] context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with AWS Bedrock tts, transport.output(), context_aggregator.assistant() # Captures responses ]) Metrics The service provides comprehensive AWS Bedrock metrics: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Input tokens, output tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and AWS Bedrock message formats Performance Modes : Choose “standard” or “optimized” latency based on your needs Regional Availability : Different models available in different AWS regions Vision Support : Image processing available with compatible models like Claude 3 Anthropic Azure On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_fireworks_d900a6ea.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/fireworks#input
|
2 |
+
Title: Fireworks AI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Fireworks AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Fireworks AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FireworksLLMService provides access to Fireworks AI’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Fireworks Docs Official Fireworks AI API documentation and features Example Code Working example with function calling Installation To use Fireworks AI services, install the required dependency: Copy Ask AI pip install "pipecat-ai[fireworks]" You’ll also need to set up your Fireworks API key as an environment variable: FIREWORKS_API_KEY . Get your API key from Fireworks AI Console . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.fireworks.llm import FireworksLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Fireworks service llm = FireworksLLMService( api_key = os.getenv( "FIREWORKS_API_KEY" ), model = "accounts/fireworks/models/firefunction-v2" , # Optimized for function calling params = FireworksLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice interactions. Keep responses concise and avoid special characters for audio output.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Function Calling : Specialized firefunction models optimized for tool use Cost Effective : Competitive pricing for open-source model inference DeepSeek Google Gemini On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_google-vertex_251a3b87.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/google-vertex#param-credentials-path
|
2 |
+
Title: Google Vertex AI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Google Vertex AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Google Vertex AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GoogleVertexLLMService provides access to Google’s language models through Vertex AI while maintaining an OpenAI-compatible interface. It inherits from OpenAILLMService and supports all the features of the OpenAI interface while connecting to Google’s AI services. Installation To use GoogleVertexLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up Google Cloud credentials. You can either: Set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to your service account JSON file Provide credentials directly to the service constructor Configuration Constructor Parameters credentials Optional[str] JSON string of Google service account credentials credentials_path Optional[str] Path to the Google service account JSON file model str default: "google/gemini-2.0-flash-001" Model identifier params InputParams Vertex AI specific parameters Input Parameters Extends the OpenAI input parameters with Vertex AI specific options: location str default: "us-east4" Google Cloud region where the model is deployed project_id str required Google Cloud project ID Also inherits all OpenAI-compatible parameters: frequency_penalty Optional[float] Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0] max_tokens Optional[int] Maximum number of tokens to generate. Must be greater than or equal to 1 presence_penalty Optional[float] Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0] temperature Optional[float] Controls randomness in the output. Range: [0.0, 2.0] top_p Optional[float] Controls diversity via nucleus sampling. Range: [0.0, 1.0] Usage Example Copy Ask AI from pipecat.services.google.llm_vertex import GoogleVertexLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask # Configure service llm = GoogleVertexLLMService( credentials_path = "/path/to/service-account.json" , model = "google/gemini-2.0-flash-001" , params = GoogleVertexLLMService.InputParams( project_id = "your-google-cloud-project-id" , location = "us-east4" ) ) # Create context with system message context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant in a voice conversation. Keep responses concise." } ] ) # Create context aggregator for message handling context_aggregator = llm.create_context_aggregator(context) # Set up pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) # Create and configure task task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), ) Authentication The service supports multiple authentication methods: Direct credentials string - Pass the JSON credentials as a string to the constructor Credentials file path - Provide a path to the service account JSON file Environment variable - Set GOOGLE_APPLICATION_CREDENTIALS to the path of your service account file The service automatically handles token refresh, with tokens having a 1-hour lifetime. Methods See the LLM base class methods for additional functionality. Function Calling This service supports function calling (also known as tool calling) through the OpenAI-compatible interface, which allows the LLM to request information from external services and APIs. Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Available Models Model Name Description google/gemini-2.0-flash-001 Fast, efficient text generation model google/gemini-2.0-pro-001 Comprehensive, high-quality model google/gemini-1.5-pro-001 Versatile multimodal model google/gemini-1.5-flash-001 Fast, efficient multimodal model See Google Vertex AI documentation for a complete list of supported models and their capabilities. Frame Flow Inherits the OpenAI LLM Service frame flow: Metrics Support The service collects standard LLM metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Function call metrics Notes Uses Google Cloud’s Vertex AI API Maintains OpenAI-compatible interface Supports streaming responses Handles function calling Manages conversation context Includes token usage tracking Thread-safe processing Automatic token refresh Requires Google Cloud project setup Google Gemini Grok On this page Overview Installation Configuration Constructor Parameters Input Parameters Usage Example Authentication Methods Function Calling Available Models Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_groq_574a7686.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/groq#input
|
2 |
+
Title: Groq - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Groq - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Groq Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GroqLLMService provides access to Groq’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Groq Docs Official Groq API documentation and features Example Code Working example with function calling Installation To use Groq services, install the required dependency: Copy Ask AI pip install "pipecat-ai[groq]" You’ll also need to set up your Groq API key as an environment variable: GROQ_API_KEY . Get your API key for free from Groq Console . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.groq.llm import GroqLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Groq service for speed llm = GroqLLMService( api_key = os.getenv( "GROQ_API_KEY" ), model = "llama-3.3-70b-versatile" , # Fast, capable model params = GroqLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice conversations. Keep responses concise and avoid special characters that don't work well in speech.""" } ], tools = tools ) # Create context aggregators with fast timeout for speed from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams context_aggregator = llm.create_context_aggregator( context, user_params = LLMUserAggregatorParams( aggregation_timeout = 0.05 ) # Fast aggregation ) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback for better UX @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline with Groq STT for full Groq stack pipeline = Pipeline([ transport.input(), groq_stt, # GroqSTTService for consistent ecosystem context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Ultra-low latency measurements Processing Duration - Hardware-accelerated processing times Token Usage - Prompt tokens, completion tokens, and totals Function Call Metrics - Tool usage and execution tracking Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Optimized : Ideal for conversational AI and streaming applications Open Source Models : Access to Llama, Mixtral, and other open-source models Vision Support : Select models support image understanding capabilities Free Tier : Generous free tier available for development and testing Grok NVIDIA NIM On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_groq_d45aa023.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/groq
|
2 |
+
Title: Groq - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Groq - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Groq Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GroqLLMService provides access to Groq’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Groq Docs Official Groq API documentation and features Example Code Working example with function calling Installation To use Groq services, install the required dependency: Copy Ask AI pip install "pipecat-ai[groq]" You’ll also need to set up your Groq API key as an environment variable: GROQ_API_KEY . Get your API key for free from Groq Console . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.groq.llm import GroqLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Groq service for speed llm = GroqLLMService( api_key = os.getenv( "GROQ_API_KEY" ), model = "llama-3.3-70b-versatile" , # Fast, capable model params = GroqLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice conversations. Keep responses concise and avoid special characters that don't work well in speech.""" } ], tools = tools ) # Create context aggregators with fast timeout for speed from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams context_aggregator = llm.create_context_aggregator( context, user_params = LLMUserAggregatorParams( aggregation_timeout = 0.05 ) # Fast aggregation ) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback for better UX @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline with Groq STT for full Groq stack pipeline = Pipeline([ transport.input(), groq_stt, # GroqSTTService for consistent ecosystem context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Ultra-low latency measurements Processing Duration - Hardware-accelerated processing times Token Usage - Prompt tokens, completion tokens, and totals Function Call Metrics - Tool usage and execution tracking Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Optimized : Ideal for conversational AI and streaming applications Open Source Models : Access to Llama, Mixtral, and other open-source models Vision Support : Select models support image understanding capabilities Free Tier : Generous free tier available for development and testing Grok NVIDIA NIM On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_nim_a940263f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/nim
|
2 |
+
Title: NVIDIA NIM - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
NVIDIA NIM - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM NVIDIA NIM Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview NimLLMService provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting. API Reference Complete API documentation and method details NVIDIA NIM Docs Official NVIDIA NIM documentation and setup Example Code Working example with function calling Installation To use NVIDIA NIM services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[nim]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . Get your API key from NVIDIA Build . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.nim.llm import NimLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure NVIDIA NIM service llm = NimLLMService( api_key = os.getenv( "NVIDIA_API_KEY" ), model = "nvidia/llama-3.1-nemotron-70b-instruct" , params = NimLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice interactions. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Includes specialized token usage tracking for NIM’s incremental reporting: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Tracks tokens used per request, compatible with NIM’s incremental reporting Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters NVIDIA Optimization : Hardware-accelerated inference on NVIDIA infrastructure Token Reporting : Custom handling for NIM’s incremental vs. OpenAI’s final token reporting Model Variety : Access to Nemotron and other NVIDIA-optimized model variants Groq Ollama On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_openrouter_fe03d51e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/openrouter#output
|
2 |
+
Title: OpenRouter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenRouter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM OpenRouter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview OpenRouterLLMService provides access to OpenRouter’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details OpenRouter Docs Official OpenRouter API documentation and features Example Code Working example with function calling Installation To use OpenRouterLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openrouter]" You’ll also need to set up your OpenRouter API key as an environment variable: OPENROUTER_API_KEY . Get your API key from OpenRouter . Free tier includes $1 of credits. Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.openrouter.llm import OpenRouterLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure OpenRouter service llm = OpenRouterLLMService( api_key = os.getenv( "OPENROUTER_API_KEY" ), model = "openai/gpt-4o-2024-11-20" , # Easy model switching params = OpenRouterLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice conversations. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) # Easy model switching for different use cases # llm.set_model_name("anthropic/claude-3.5-sonnet") # Switch to Claude # llm.set_model_name("meta-llama/llama-3.1-70b-instruct") # Switch to Llama Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes Model Variety : Access 70+ models from OpenAI, Anthropic, Meta, Google, and more OpenAI Compatibility : Full compatibility with existing OpenAI code Easy Switching : Change models with a single parameter update Fallback Support : Built-in model fallbacks for high availability OpenPipe Perplexity On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_perplexity_762eff8f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/perplexity#context-management
|
2 |
+
Title: Perplexity - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Perplexity - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Perplexity Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PerplexityLLMService provides access to Perplexity’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses and context management, with special handling for Perplexity’s incremental token reporting. API Reference Complete API documentation and method details Perplexity Docs Official Perplexity API documentation and features Example Code Working example with search capabilities Unlike other LLM services, Perplexity does not support function calling. Instead, they offer native internet search built in without requiring special function calls. Installation To use PerplexityLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[perplexity]" You’ll also need to set up your Perplexity API key as an environment variable: PERPLEXITY_API_KEY . Get your API key from Perplexity API . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks with citations ErrorFrame - API or processing errors Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.perplexity.llm import PerplexityLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext # Configure Perplexity service llm = PerplexityLLMService( api_key = os.getenv( "PERPLEXITY_API_KEY" ), model = "sonar-pro" , # Pro model for enhanced capabilities params = PerplexityLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Create context optimized for search and current information context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a knowledgeable assistant with access to real-time information. When answering questions, use your search capabilities to provide current, accurate information. Always cite your sources when possible. Keep responses concise for voice output.""" } ] ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Use in pipeline for information-rich conversations pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, # Will automatically search and cite sources tts, transport.output(), context_aggregator.assistant() ]) # Enable metrics with special TTFB reporting for Perplexity task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True , report_only_initial_ttfb = True , # Optimized for Perplexity's response pattern ) ) Metrics The service provides specialized token tracking for Perplexity’s incremental reporting: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt and completion tokens Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True , ) ) Additional Notes No Function Calling : Perplexity doesn’t support traditional function calling but provides superior built-in search Real-time Data : Access to current information without complex function orchestration Source Citations : Automatic citation of web sources in responses OpenAI Compatible : Uses familiar OpenAI-style interface and parameters OpenRouter Qwen On this page Overview Installation Frames Input Output Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_sambanova_7c015e95.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/sambanova#param-api-key
|
2 |
+
Title: SambaNova - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SambaNova - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM SambaNova Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SambaNovaLLMService provides access to SambaNova’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. Installation To use SambaNovaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[sambanova]" You also need to set up your SambaNova API key as an environment variable: SAMBANOVA_API_KEY . Get your SambaNova API key here . Configuration Constructor Parameters api_key str required Your SambaNova API key model str default: "Llama-4-Maverick-17B-128E-Instruct" Model identifier base_url str default: "https://api.sambanova.ai/v1" SambaNova API endpoint Input Parameters Inherits OpenAI-compatible parameters: max_tokens Optional[int] Maximum number of tokens to generate. Must be greater than or equal to 1. temperature Optional[float] Controls randomness in the output. Range: [0.0, 1.0]. top_p Optional[float] Controls diversity via nucleus sampling. Range: [0.0, 1.0] Usage Example Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.sambanova.llm import SambaNovaLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from openai.types.chat import ChatCompletionToolParam from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.llm_service import FunctionCallParams # Configure service llm = SambaNovaLLMService( api_key 'your-sambanova-api-key' , model = 'Llama-4-Maverick-17B-128E-Instruct' , params = SambaNovaLLMService.InputParams( temperature = 0.7 , max_tokens = 1024 ), ) # Define function to call async def fetch_weather ( params : FunctionCallParams) -> Any: """Mock function that fetches the weather forcast from an API.""" await params.result_callback({ 'conditions' : 'nice' , 'temperature' : '20 Degrees Celsius' }) # Register function handlers llm.register_function( 'get_current_weather' , fetch_weather) # Define weather function using standardized schema weather_function = FunctionSchema( name = 'get_current_weather' , description = 'Get the current weather' , properties = { 'location' : { 'type' : 'string' , 'description' : 'The city and state.' , }, 'format' : { 'type' : 'string' , 'enum' : [ 'celsius' , 'fahrenheit' ], 'description' : "The temperature unit to use. Infer this from the user's location." , }, }, required = [ 'location' , 'format' ], ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Define system message messages = [ { 'role' : 'system' , 'content' : 'You are a helpful LLM in a WebRTC call. ' 'Your goal is to demonstrate your capabilities of weather forecasting in a succinct way. ' 'Introduce yourself to the user and then wait for their question. ' 'Elaborate your response into a conversational answer in a creative and helpful way. ' 'Your output will be converted to audio so do not include special characters in your answer. ' 'Once the final answer has been provided, please stop, unless the user asks another question. ' , }, ] # Create context with system message and tools context = OpenAILLMContext(messages, tools) # Context aggregator context_aggregator = llm.create_context_aggregator(context) # Create context aggregator for message handling context_aggregator = llm.create_context_aggregator(context) # Set up pipeline pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ] ) # Create and configure task task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), ) Methods See the LLM base class methods for additional functionality. Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions. Query databases. Access external APIs. Perform custom actions. Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Available Models Model Name Description DeepSeek-R1 deepseek-ai/DeepSeek-R1 DeepSeek-R1-Distill-Llama-70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B DeepSeek-V3-0324 deepseek-ai/DeepSeek-V3-0324 Llama-4-Maverick-17B-128E-Instruct meta-llama/Llama-4-Maverick-17B-128E-Instruct Llama-4-Scout-17B-16E-Instruct meta-llama/Llama-4-Scout-17B-16E-Instruct Meta-Llama-3.3-70B-Instruct meta-llama/Llama-3.3-70B-Instruct Meta-Llama-3.2-3B-Instruct meta-llama/Llama-3.2-3B-Instruct Meta-Llama-3.2-1B-Instruct meta-llama/Llama-3.2-1B-Instruct Meta-Llama-3.1-405B-Instruct meta-llama/Llama-3.1-405B-Instruct Meta-Llama-3.1-8B-Instruct meta-llama/Llama-3.1-8B-Instruct Meta-Llama-Guard-3-8B meta-llama/Llama-Guard-3-8B QwQ-32B Qwen/QwQ-32B Qwen3-32B Qwen/Qwen3-32B Llama-3.3-Swallow-70B-Instruct-v0.4 Tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 See SambaNova’s docs for a complete list of supported models. Frame Flow Inherits the OpenAI LLM Service frame flow: Metrics Support The service collects standard LLM metrics: Token usage (prompt and completion). Processing duration. Time to First Byte (TTFB). Function call metrics. Notes OpenAI-compatible interface. Supports streaming responses. Handles function calling. Manages conversation context. Includes token usage tracking. Thread-safe processing. Automatic error handling. Qwen Together AI On this page Overview Installation Configuration Constructor Parameters Input Parameters Usage Example Methods Function Calling Available Models Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
memory_mem0_a1309820.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/memory/mem0#param-params
|
2 |
+
Title: Mem0 - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Mem0 - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Memory Mem0 Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Mem0 Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Mem0MemoryService provides long-term memory capabilities for conversational agents by integrating with Mem0’s API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions. Installation To use the Mem0 memory service, install the required dependencies: Copy Ask AI pip install "pipecat-ai[mem0]" You’ll also need to set up your Mem0 API key as an environment variable: MEM0_API_KEY . You can obtain a Mem0 API key by signing up at mem0.ai . Mem0MemoryService Constructor Parameters api_key str required Mem0 API key for accessing the service user_id str Unique identifier for the end user to associate with memories agent_id str Identifier for the agent using the memory service run_id str Identifier for the specific conversation session params InputParams Configuration parameters for memory retrieval (see below) local_config dict Configuration for using local LLMs and embedders instead of Mem0’s cloud API (see Local Configuration section) At least one of user_id , agent_id , or run_id must be provided to organize memories. Input Parameters The params object accepts the following configuration settings: search_limit int default: "10" Maximum number of relevant memories to retrieve per query search_threshold float default: "0.1" Relevance threshold for memory retrieval (0.0 to 1.0) api_version str default: "v2" Mem0 API version to use system_prompt str Prefix text to add before retrieved memories add_as_system_message bool default: "True" Whether to add memories as a system message (True) or user message (False) position int default: "1" Position in the context where memories should be inserted Input Frames The service processes the following input frames: OpenAILLMContextFrame Frame Contains OpenAI-specific conversation context LLMMessagesFrame Frame Contains conversation messages in standard format Output Frames The service may produce the following output frames: LLMMessagesFrame Frame Enhanced messages with relevant memories included OpenAILLMContextFrame Frame Enhanced OpenAI context with memories included ErrorFrame Frame Contains error information if memory operations fail Memory Operations The service performs two main operations automatically: Message Storage All conversation messages are stored in Mem0 for future reference. The service: Captures full message history from context frames Associates messages with the specified user/agent/run IDs Stores metadata to enable efficient retrieval Memory Retrieval When a new user message is detected, the service: Uses the message as a search query Retrieves relevant past memories from Mem0 Formats memories with the configured system prompt Adds the formatted memories to the conversation context Passes the enhanced context downstream in the pipeline Pipeline Positioning The memory service should be positioned after the user context aggregator but before the LLM service: Copy Ask AI context_aggregator.user() → memory_service → llm This ensures that: The user’s latest message is included in the context The memory service can enhance the context before the LLM processes it The LLM receives the enhanced context with relevant memories Usage Examples Basic Integration Copy Ask AI from pipecat.services.mem0.memory import Mem0MemoryService from pipecat.pipeline.pipeline import Pipeline # Create the memory service memory = Mem0MemoryService( api_key = os.getenv( "MEM0_API_KEY" ), user_id = "user123" , # Unique user identifier ) # Position the memory service between context aggregator and LLM pipeline = Pipeline([ transport.input(), context_aggregator.user(), memory, # <-- Memory service enhances context here llm, tts, transport.output(), context_aggregator.assistant() ]) Using Local Configuration The local_config parameter allows you to use your own LLM and embedding providers instead of Mem0’s cloud API. This is useful for self-hosted deployments or when you want more control over the memory processing. Copy Ask AI local_config = { "llm" : { "provider" : str , # LLM provider name (e.g., "anthropic", "openai") "config" : { # Provider-specific configuration "model" : str , # Model name "api_key" : str , # API key for the provider # Other provider-specific parameters } }, "embedder" : { "provider" : str , # Embedding provider name (e.g., "openai") "config" : { # Provider-specific configuration "model" : str , # Model name # Other provider-specific parameters } } } # Initialize Mem0 memory service with local configuration memory = Mem0MemoryService( local_config = local_config, # Use local LLM for memory processing user_id = "user123" , # Unique identifier for the user ) When using local_config do not provide the api_key parameter. Frame Flow Error Handling The service includes basic error handling to ensure conversation flow continues even when memory operations fail: Exceptions during memory storage and retrieval are caught and logged If an error occurs during frame processing, an ErrorFrame is emitted with error details The original frame is still passed downstream to prevent the pipeline from stalling Connection and authentication errors from the Mem0 API will be logged but won’t interrupt the conversation While the service attempts to handle errors gracefully, memory operations that fail may result in missing context in conversations. Monitor your application logs for memory-related errors. Tavus Moondream On this page Overview Installation Mem0MemoryService Constructor Parameters Input Parameters Input Frames Output Frames Memory Operations Message Storage Memory Retrieval Pipeline Positioning Usage Examples Basic Integration Using Local Configuration Frame Flow Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
memory_mem0_b45c279e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/memory/mem0#param-user-id
|
2 |
+
Title: Mem0 - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Mem0 - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Memory Mem0 Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Mem0 Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Mem0MemoryService provides long-term memory capabilities for conversational agents by integrating with Mem0’s API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions. Installation To use the Mem0 memory service, install the required dependencies: Copy Ask AI pip install "pipecat-ai[mem0]" You’ll also need to set up your Mem0 API key as an environment variable: MEM0_API_KEY . You can obtain a Mem0 API key by signing up at mem0.ai . Mem0MemoryService Constructor Parameters api_key str required Mem0 API key for accessing the service user_id str Unique identifier for the end user to associate with memories agent_id str Identifier for the agent using the memory service run_id str Identifier for the specific conversation session params InputParams Configuration parameters for memory retrieval (see below) local_config dict Configuration for using local LLMs and embedders instead of Mem0’s cloud API (see Local Configuration section) At least one of user_id , agent_id , or run_id must be provided to organize memories. Input Parameters The params object accepts the following configuration settings: search_limit int default: "10" Maximum number of relevant memories to retrieve per query search_threshold float default: "0.1" Relevance threshold for memory retrieval (0.0 to 1.0) api_version str default: "v2" Mem0 API version to use system_prompt str Prefix text to add before retrieved memories add_as_system_message bool default: "True" Whether to add memories as a system message (True) or user message (False) position int default: "1" Position in the context where memories should be inserted Input Frames The service processes the following input frames: OpenAILLMContextFrame Frame Contains OpenAI-specific conversation context LLMMessagesFrame Frame Contains conversation messages in standard format Output Frames The service may produce the following output frames: LLMMessagesFrame Frame Enhanced messages with relevant memories included OpenAILLMContextFrame Frame Enhanced OpenAI context with memories included ErrorFrame Frame Contains error information if memory operations fail Memory Operations The service performs two main operations automatically: Message Storage All conversation messages are stored in Mem0 for future reference. The service: Captures full message history from context frames Associates messages with the specified user/agent/run IDs Stores metadata to enable efficient retrieval Memory Retrieval When a new user message is detected, the service: Uses the message as a search query Retrieves relevant past memories from Mem0 Formats memories with the configured system prompt Adds the formatted memories to the conversation context Passes the enhanced context downstream in the pipeline Pipeline Positioning The memory service should be positioned after the user context aggregator but before the LLM service: Copy Ask AI context_aggregator.user() → memory_service → llm This ensures that: The user’s latest message is included in the context The memory service can enhance the context before the LLM processes it The LLM receives the enhanced context with relevant memories Usage Examples Basic Integration Copy Ask AI from pipecat.services.mem0.memory import Mem0MemoryService from pipecat.pipeline.pipeline import Pipeline # Create the memory service memory = Mem0MemoryService( api_key = os.getenv( "MEM0_API_KEY" ), user_id = "user123" , # Unique user identifier ) # Position the memory service between context aggregator and LLM pipeline = Pipeline([ transport.input(), context_aggregator.user(), memory, # <-- Memory service enhances context here llm, tts, transport.output(), context_aggregator.assistant() ]) Using Local Configuration The local_config parameter allows you to use your own LLM and embedding providers instead of Mem0’s cloud API. This is useful for self-hosted deployments or when you want more control over the memory processing. Copy Ask AI local_config = { "llm" : { "provider" : str , # LLM provider name (e.g., "anthropic", "openai") "config" : { # Provider-specific configuration "model" : str , # Model name "api_key" : str , # API key for the provider # Other provider-specific parameters } }, "embedder" : { "provider" : str , # Embedding provider name (e.g., "openai") "config" : { # Provider-specific configuration "model" : str , # Model name # Other provider-specific parameters } } } # Initialize Mem0 memory service with local configuration memory = Mem0MemoryService( local_config = local_config, # Use local LLM for memory processing user_id = "user123" , # Unique identifier for the user ) When using local_config do not provide the api_key parameter. Frame Flow Error Handling The service includes basic error handling to ensure conversation flow continues even when memory operations fail: Exceptions during memory storage and retrieval are caught and logged If an error occurs during frame processing, an ErrorFrame is emitted with error details The original frame is still passed downstream to prevent the pipeline from stalling Connection and authentication errors from the Mem0 API will be logged but won’t interrupt the conversation While the service attempts to handle errors gracefully, memory operations that fail may result in missing context in conversations. Monitor your application logs for memory-related errors. Tavus Moondream On this page Overview Installation Mem0MemoryService Constructor Parameters Input Parameters Input Frames Output Frames Memory Operations Message Storage Memory Retrieval Pipeline Positioning Usage Examples Basic Integration Using Local Configuration Frame Flow Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
observers_debug-observer_4a2d52de.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/observers/debug-observer#frameendpoint-enum
|
2 |
+
Title: Debug Log Observer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Debug Log Observer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Observers Debug Log Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Observer Pattern Debug Observer LLM Observer Transcription Observer Turn Tracking Observer Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The DebugLogObserver provides detailed logging of frame activity in your Pipecat pipeline, with full visibility into frame content and flexible filtering options. Features Log all frame types and their content Filter by specific frame types Filter by source or destination components Automatic formatting of frame fields Special handling for complex data structures Usage Log All Frames Log all frames passing through the pipeline: Copy Ask AI from pipecat.observers.loggers.debug_log_observer import DebugLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [DebugLogObserver()], ), ) Filter by Frame Types Log only specific frame types: Copy Ask AI from pipecat.frames.frames import TranscriptionFrame, InterimTranscriptionFrame from pipecat.observers.loggers.debug_log_observer import DebugLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [ DebugLogObserver( frame_types = ( TranscriptionFrame, InterimTranscriptionFrame )) ], ), ) Advanced Source/Destination Filtering Filter frames based on their type and source/destination: Copy Ask AI from pipecat.frames.frames import StartInterruptionFrame, UserStartedSpeakingFrame, LLMTextFrame from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint from pipecat.transports.base_output_transport import BaseOutputTransport from pipecat.services.stt_service import STTService task = PipelineTask( pipeline, params = PipelineParams( observers = [ DebugLogObserver( frame_types = { # Only log StartInterruptionFrame when source is BaseOutputTransport StartInterruptionFrame: (BaseOutputTransport, FrameEndpoint. SOURCE ), # Only log UserStartedSpeakingFrame when destination is STTService UserStartedSpeakingFrame: (STTService, FrameEndpoint. DESTINATION ), # Log LLMTextFrame regardless of source or destination LLMTextFrame: None }) ], ), ) Log Output Format The observer logs each frame with its complete details: Copy Ask AI [Source] → [Destination]: [FrameType] [field1: value1, field2: value2, ...] at [timestamp]s For example: Copy Ask AI OpenAILLMService#0 → DailyTransport#0: LLMTextFrame text: 'Hello, how can I help you today?' at 1.24s Configuration Options Parameter Type Description frame_types Tuple[Type[Frame], ...] or Dict[Type[Frame], Optional[Tuple[Type, FrameEndpoint]]] Frame types to log, with optional source/destination filtering exclude_fields Set[str] Field names to exclude from logging (defaults to binary fields) FrameEndpoint Enum The FrameEndpoint enum is used for source/destination filtering: FrameEndpoint.SOURCE : Filter by source component FrameEndpoint.DESTINATION : Filter by destination component Observer Pattern LLM Observer On this page Features Usage Log All Frames Filter by Frame Types Advanced Source/Destination Filtering Log Output Format Configuration Options FrameEndpoint Enum Assistant Responses are generated using AI and may contain mistakes.
|
pipecat-client-android_indexhtml_b4bd768d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/pipecat-client-android/index.html#join-our-community
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
pipecat-transport-gemini-live-websocket_indexhtml_2f6300ba.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/pipecat-transport-gemini-live-websocket/index.html#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-idle-detection_5ab87df7.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-idle-detection#example-implementation
|
2 |
+
Title: Pipeline Idle Detection - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipeline Idle Detection - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline Pipeline Idle Detection Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Pipeline idle detection is a feature that monitors activity in your pipeline and can automatically cancel tasks when no meaningful bot interactions are occurring. This helps prevent pipelines from running indefinitely when a conversation has naturally ended but wasn’t properly terminated. How It Works The system monitors specific “activity frames” that indicate the bot is actively engaged in the conversation. By default, these are: BotSpeakingFrame - When the bot is speaking LLMFullResponseEndFrame - When the LLM has completed a response If no activity frames are detected within the configured timeout period (5 minutes by default), the system considers the pipeline idle and can automatically terminate it. Idle detection only starts after the pipeline has begun processing frames. The idle timer resets whenever an activity frame (as specified in idle_timeout_frames ) is received. Configuration You can configure idle detection behavior when creating a PipelineTask : Copy Ask AI from pipecat.pipeline.task import PipelineParams, PipelineTask # Default configuration - cancel after 5 minutes of inactivity task = PipelineTask(pipeline) # Custom configuration task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True ), idle_timeout_secs = 600 , # 10 minute timeout idle_timeout_frames = (BotSpeakingFrame,), # Only monitor bot speaking cancel_on_idle_timeout = False , # Don't auto-cancel, just notify ) Configuration Parameters idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. Handling Idle Timeouts You can respond to idle timeout events by adding an event handler: Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): logger.info( "Pipeline has been idle for too long" ) # Perform any custom cleanup or logging # Note: If cancel_on_idle_timeout=True, the pipeline will be cancelled after this handler runs Example Implementation Here’s a complete example showing how to configure idle detection with custom handling: Copy Ask AI from pipecat.frames.frames import BotSpeakingFrame, LLMFullResponseEndFrame, TTSSpeakFrame from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create pipeline pipeline = Pipeline([ ... ]) # Configure task with custom idle settings task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True ), idle_timeout_secs = 180 , # 3 minutes cancel_on_idle_timeout = False # Don't auto-cancel ) # Add event handler for idle timeout @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): logger.info( "Conversation has been idle for 3 minutes" ) # Add a farewell message await task.queue_frame(TTSSpeakFrame( "I haven't heard from you in a while. Goodbye!" )) # Then end the conversation gracefully await task.stop_when_done() runner = PipelineRunner() await runner.run(task) PipelineTask Pipeline Heartbeats On this page Overview How It Works Configuration Configuration Parameters Handling Idle Timeouts Example Implementation Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-params_32384788.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-params#param-audio-out-sample-rate
|
2 |
+
Title: PipelineParams - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineParams - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineParams Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The PipelineParams class provides a structured way to configure various aspects of pipeline execution. These parameters control behaviors like audio settings, metrics collection, heartbeat monitoring, and interruption handling. Basic Usage Copy Ask AI from pipecat.pipeline.task import PipelineParams, PipelineTask # Create with default parameters params = PipelineParams() # Or customize specific parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 16000 , enable_metrics = True ) # Pass to PipelineTask pipeline = Pipeline([ ... ]) task = PipelineTask(pipeline, params = params) Available Parameters allow_interruptions bool default: "False" Whether to allow pipeline interruptions. When enabled, a user’s speech will immediately interrupt the bot’s response. audio_in_sample_rate int default: "16000" Input audio sample rate in Hz. Setting the audio_in_sample_rate as a PipelineParam sets the input sample rate for all corresponding services in the pipeline. audio_out_sample_rate int default: "24000" Output audio sample rate in Hz. Setting the audio_out_sample_rate as a PipelineParam sets the output sample rate for all corresponding services in the pipeline. enable_heartbeats bool default: "False" Whether to enable heartbeat monitoring to detect pipeline stalls. See Heartbeats for details. heartbeats_period_secs float default: "1.0" Period between heartbeats in seconds (when heartbeats are enabled). enable_metrics bool default: "False" Whether to enable metrics collection for pipeline performance. enable_usage_metrics bool default: "False" Whether to enable usage metrics tracking. report_only_initial_ttfb bool default: "False" Whether to report only initial time to first byte metric. send_initial_empty_metrics bool default: "True" Whether to send initial empty metrics frame at pipeline start. start_metadata Dict[str, Any] default: "{}" Additional metadata to include in the StartFrame. Common Configurations Audio Processing Configuration You can set the audio input and output sample rates in the PipelineParams to set the sample rate for all input and output services in the pipeline. This acts as a convenience to avoid setting the sample rate for each service individually. Note, if services are set individually, they will supersede the values set in PipelineParams . Copy Ask AI params = PipelineParams( audio_in_sample_rate = 8000 , # Lower quality input audio audio_out_sample_rate = 8000 # High quality output audio ) Performance Monitoring Configuration Pipeline heartbeats provide a way to monitor the health of your pipeline by sending periodic heartbeat frames through the system. When enabled, the pipeline will send heartbeat frames every second and monitor their progress through the pipeline. Copy Ask AI params = PipelineParams( enable_heartbeats = True , heartbeats_period_secs = 2.0 , # Send heartbeats every 2 seconds enable_metrics = True ) How Parameters Are Used The parameters you set in PipelineParams are passed to various components of the pipeline: StartFrame : Many parameters are included in the StartFrame that initializes the pipeline Metrics Collection : Metrics settings configure what performance data is gathered Heartbeat Monitoring : Controls the pipeline’s health monitoring system Audio Processing : Sample rates affect how audio is processed throughout the pipeline Complete Example Copy Ask AI from pipecat.frames.frames import TTSSpeakFrame from pipecat.observers.file_observer import FileObserver from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.runner import PipelineRunner # Create comprehensive parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 8000 , audio_out_sample_rate = 8000 , enable_heartbeats = True , enable_metrics = True , enable_usage_metrics = True , heartbeats_period_secs = 1.0 , report_only_initial_ttfb = False , start_metadata = { "conversation_id" : "conv-123" , "session_data" : { "user_id" : "user-456" , "start_time" : "2023-10-25T14:30:00Z" } } ) # Create pipeline and task pipeline = Pipeline([ ... ]) task = PipelineTask( pipeline, params = params, observers = [FileObserver( "pipeline_logs.jsonl" )] ) # Run the pipeline runner = PipelineRunner() await runner.run(task) Additional Information Parameters are immutable once the pipeline starts The start_metadata dictionary can contain any serializable data For metrics collection to work properly, enable_metrics must be set to True Pipecat Flows PipelineTask On this page Overview Basic Usage Available Parameters Common Configurations Audio Processing Configuration Performance Monitoring Configuration How Parameters Are Used Complete Example Additional Information Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-task_f3dd5190.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-task#param-enable-turn-tracking
|
2 |
+
Title: PipelineTask - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineTask - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineTask Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PipelineTask is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity. Basic Usage Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create a pipeline pipeline = Pipeline([ ... ]) # Create a task with the pipeline task = PipelineTask(pipeline) # Queue frames for processing await task.queue_frame(TTSSpeakFrame( "Hello, how can I help you today?" )) # Run the pipeline runner = PipelineRunner() await runner.run(task) Constructor Parameters pipeline BasePipeline required The pipeline to execute. params PipelineParams default: "PipelineParams()" Configuration parameters for the pipeline. See PipelineParams for details. observers List[BaseObserver] default: "[]" List of observers for monitoring pipeline execution. See Observers for details. clock BaseClock default: "SystemClock()" Clock implementation for timing operations. task_manager Optional[BaseTaskManager] default: "None" Custom task manager for handling asyncio tasks. If None, a default TaskManager is used. check_dangling_tasks bool default: "True" Whether to check for processors’ tasks finishing properly. idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. See Pipeline Idle Detection for details. idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. See Pipeline Idle Detection for details. cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. See Pipeline Idle Detection for details. enable_tracing bool default: "False" Whether to enable OpenTelemetry tracing. See The OpenTelemetry guide for details. enable_turn_tracking bool default: "False" Whether to enable turn tracking. See The OpenTelemetry guide for details. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated. See The OpenTelemetry guide for details. additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. See The OpenTelemetry guide for details. Methods Task Lifecycle Management run() async Starts and manages the pipeline execution until completion or cancellation. Copy Ask AI await task.run() stop_when_done() async Sends an EndFrame to the pipeline to gracefully stop the task after all queued frames have been processed. Copy Ask AI await task.stop_when_done() cancel() async Stops the running pipeline immediately by sending a CancelFrame. Copy Ask AI await task.cancel() has_finished() bool Returns whether the task has finished (all processors have stopped). Copy Ask AI if task.has_finished(): print ( "Task is complete" ) Frame Management queue_frame() async Queues a single frame to be pushed down the pipeline. Copy Ask AI await task.queue_frame(TTSSpeakFrame( "Hello!" )) queue_frames() async Queues multiple frames to be pushed down the pipeline. Copy Ask AI frames = [TTSSpeakFrame( "Hello!" ), TTSSpeakFrame( "How are you?" )] await task.queue_frames(frames) Event Handlers PipelineTask provides an event handler that can be registered using the event_handler decorator: on_idle_timeout Triggered when no activity frames (as specified by idle_timeout_frames ) have been received within the idle timeout period. Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): print ( "Pipeline has been idle too long" ) await task.queue_frame(TTSSpeakFrame( "Are you still there?" )) PipelineParams Pipeline Idle Detection On this page Overview Basic Usage Constructor Parameters Methods Task Lifecycle Management Frame Management Event Handlers on_idle_timeout Assistant Responses are generated using AI and may contain mistakes.
|
react_components_0aff756c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/components#voicevisualizer
|
2 |
+
Title: Components - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Components - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Components Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides several components for handling audio, video, and visualization in your application. PipecatClientProvider The root component for providing Pipecat client context to your application. Copy Ask AI < PipecatClientProvider client = { pcClient } > { /* Child components */ } </ PipecatClientProvider > Props client PipecatClient required A singleton instance of PipecatClient PipecatClientAudio Creates a new <audio> element that mounts the bot’s audio track. Copy Ask AI < PipecatClientAudio /> Props No props required PipecatClientVideo Creates a new <video> element that renders either the bot or local participant’s video track. Copy Ask AI < PipecatClientVideo participant = "local" fit = "cover" mirror onResize = { ({ aspectRatio , height , width }) => { console . log ( "Video dimensions changed:" , { aspectRatio , height , width }); } } /> Props participant ('local' | 'bot') required Defines which participant’s video track is rendered fit ('contain' | 'cover') Defines whether the video should be fully contained or cover the box. Default: ‘contain’ mirror boolean Forces the video to be mirrored, if set onResize(dimensions: object) function Triggered whenever the video’s rendered width or height changes PipecatClientCamToggle A headless component to read and set the local participant’s camera state. Copy Ask AI < PipecatClientCamToggle onCamEnabledChanged = { ( enabled ) => console . log ( "Camera enabled:" , enabled ) } disabled = { false } > { ({ disabled , isCamEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ) } </ PipecatClientCamToggle > Props onCamEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s camera state changes disabled boolean If true, the component will not allow toggling the camera state. Default: false children function A render prop that provides state and handlers to the children PipecatClientMicToggle A headless component to read and set the local participant’s microphone state. Copy Ask AI < PipecatClientMicToggle onMicEnabledChanged = { ( enabled ) => console . log ( "Microphone enabled:" , enabled ) } disabled = { false } > { ({ disabled , isMicEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ) } </ PipecatClientMicToggle > Props onMicEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s microphone state changes disabled boolean If true, the component will not allow toggling the microphone state. Default: false children function A render prop that provides state and handlers to the children VoiceVisualizer Renders a visual representation of audio input levels on a <canvas> element. Copy Ask AI < VoiceVisualizer participantType = "local" backgroundColor = "white" barColor = "black" barGap = { 1 } barWidth = { 4 } barMaxHeight = { 24 } /> Props participantType string required The participant type to visualize audio for backgroundColor string The background color of the canvas. Default: ‘transparent’ barColor string The color of the audio level bars. Default: ‘black’ barCount number The number of bars to display. Default: 5 barGap number The gap between bars in pixels. Default: 12 barWidth number The width of each bar in pixels. Default: 30 barMaxHeight number The maximum height at full volume of each bar in pixels. Default: 120 SDK Introduction Hooks On this page PipecatClientProvider PipecatClientAudio PipecatClientVideo PipecatClientCamToggle PipecatClientMicToggle VoiceVisualizer Assistant Responses are generated using AI and may contain mistakes.
|
rtvi_rtvi-observer_0ca25fde.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/rtvi/rtvi-observer#purpose
|
2 |
+
Title: RTVI Observer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
RTVI Observer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation RTVI RTVI Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Overview RTVIProcessor RTVI Observer Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The RTVIObserver translates Pipecat’s internal pipeline events into standardized RTVI protocol messages. It monitors frame flow through the pipeline and generates corresponding client messages based on event types. Purpose The RTVIObserver serves two main functions: Converting internal pipeline frames to client-compatible RTVI messages Managing aggregated state for multi-frame events (like bot transcriptions) Adding to a Pipeline The observer is attached to a pipeline task along with the RTVI processor: Copy Ask AI # Create the RTVIProcessor rtvi = RTVIProcessor( config = RTVIConfig( config = [])) # Add to pipeline pipeline = Pipeline([ transport.input(), rtvi, # Other processors... ]) # Create pipeline task with observer task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True ), observers = [RTVIObserver(rtvi)], # Add the observer here ) Frame Translation The observer maps Pipecat’s internal frames to RTVI protocol messages: Pipeline Frame RTVI Message Speech Events UserStartedSpeakingFrame RTVIUserStartedSpeakingMessage UserStoppedSpeakingFrame RTVIUserStoppedSpeakingMessage BotStartedSpeakingFrame RTVIBotStartedSpeakingMessage BotStoppedSpeakingFrame RTVIBotStoppedSpeakingMessage Transcription TranscriptionFrame RTVIUserTranscriptionMessage(final=true) InterimTranscriptionFrame RTVIUserTranscriptionMessage(final=false) LLM Processing LLMFullResponseStartFrame RTVIBotLLMStartedMessage LLMFullResponseEndFrame RTVIBotLLMStoppedMessage LLMTextFrame RTVIBotLLMTextMessage TTS Events TTSStartedFrame RTVIBotTTSStartedMessage TTSStoppedFrame RTVIBotTTSStoppedMessage TTSTextFrame RTVIBotTTSTextMessage Context/Metrics OpenAILLMContextFrame RTVIUserLLMTextMessage MetricsFrame RTVIMetricsMessage RTVIServerMessageFrame RTVIServerMessage RTVIProcessor Pipecat Flows On this page Purpose Adding to a Pipeline Frame Translation Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_469bd240.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-open-aillm-context-frame
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) Configuration Constructor Parameters api_key str required Your Google API key base_url str API endpoint URL model str Gemini model to use (upgraded to new v1beta model) voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) start_audio_paused bool default: "False" Whether to start with audio input paused start_video_paused bool default: "False" Whether to start with video input paused tools Union[List[dict], ToolsSchema] Tools/functions available to the model inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set Input Parameters frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 max_tokens int default: "4096" Maximum number of tokens to generate modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) Methods set_audio_input_paused(paused: bool) method Pause or unpause audio input processing set_video_input_paused(paused: bool) method Pause or unpause video input processing set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) set_language(language: Language) method Change the language for generation set_context(context: OpenAILLMContext) method Set the conversation context explicitly create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state Frame Types Input Frames InputAudioRawFrame Frame Raw audio data for speech input InputImageRawFrame Frame Raw image data for visual input StartInterruptionFrame Frame Signals start of user interruption UserStartedSpeakingFrame Frame Signals user started speaking UserStoppedSpeakingFrame Frame Signals user stopped speaking OpenAILLMContextFrame Frame Contains conversation context LLMMessagesAppendFrame Frame Adds messages to the conversation LLMUpdateSettingsFrame Frame Updates LLM settings LLMSetToolsFrame Frame Sets available tools for the LLM Output Frames TTSAudioRawFrame Frame Generated speech audio TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals end of speech synthesis LLMTextFrame Frame Generated text responses from the LLM TTSTextFrame Frame Text used for speech synthesis TranscriptionFrame Frame Speech transcriptions from user audio LLMFullResponseStartFrame Frame Signals the start of a complete LLM response LLMFullResponseEndFrame Frame Signals the end of a complete LLM response Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) Next Steps Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_6a3001c2.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-temperature
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) Configuration Constructor Parameters api_key str required Your Google API key base_url str API endpoint URL model str Gemini model to use (upgraded to new v1beta model) voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) start_audio_paused bool default: "False" Whether to start with audio input paused start_video_paused bool default: "False" Whether to start with video input paused tools Union[List[dict], ToolsSchema] Tools/functions available to the model inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set Input Parameters frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 max_tokens int default: "4096" Maximum number of tokens to generate modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) Methods set_audio_input_paused(paused: bool) method Pause or unpause audio input processing set_video_input_paused(paused: bool) method Pause or unpause video input processing set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) set_language(language: Language) method Change the language for generation set_context(context: OpenAILLMContext) method Set the conversation context explicitly create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state Frame Types Input Frames InputAudioRawFrame Frame Raw audio data for speech input InputImageRawFrame Frame Raw image data for visual input StartInterruptionFrame Frame Signals start of user interruption UserStartedSpeakingFrame Frame Signals user started speaking UserStoppedSpeakingFrame Frame Signals user stopped speaking OpenAILLMContextFrame Frame Contains conversation context LLMMessagesAppendFrame Frame Adds messages to the conversation LLMUpdateSettingsFrame Frame Updates LLM settings LLMSetToolsFrame Frame Sets available tools for the LLM Output Frames TTSAudioRawFrame Frame Generated speech audio TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals end of speech synthesis LLMTextFrame Frame Generated text responses from the LLM TTSTextFrame Frame Text used for speech synthesis TranscriptionFrame Frame Speech transcriptions from user audio LLMFullResponseStartFrame Frame Signals the start of a complete LLM response LLMFullResponseEndFrame Frame Signals the end of a complete LLM response Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) Next Steps Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
serializers_exotel_51ec3374.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/serializers/exotel#inputparams-configuration
|
2 |
+
Title: ExotelFrameSerializer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
ExotelFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers ExotelFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview ExotelFrameSerializer enables integration with Exotel’s WebSocket media streaming protocol, allowing your Pipecat application to handle phone calls via Exotel’s voice services. Features Bidirectional audio conversion between Pipecat and Exotel DTMF (touch-tone) event handling Installation The ExotelFrameSerializer does not require any additional dependencies beyond the core Pipecat library. Configuration Constructor Parameters stream_id str required The Stream ID for Exotel call_sid Optional[str] default: "None" The associated Exotel Call SID. params InputParams default: "InputParams()" Configuration parameters InputParams Configuration exotel_sample_rate int default: "8000" Sample rate used by Exotel (typically 8kHz) sample_rate int | None default: "None" Optional override for pipeline input sample rate Basic Usage Copy Ask AI from pipecat.serializers.exotel import ExotelFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Exotel WebSocket connection stream_id = call_data[ "stream_id" ] call_sid = call_data[ "start" ][ "call_sid" ] # Create serializer serializer = ExotelFrameSerializer( stream_id = stream_id, call_sid, ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) Server Code Example Here’s a complete example of handling a Exotel WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.exotel import ExotelFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read initial messages from Exotel start_data = websocket.iter_text() await start_data. __anext__ () # Skip first message # Parse the second message to get call details call_data = json.loads( await start_data. __anext__ ()) # Extract Exotel-specific IDs and encoding stream_id = call_data[ "stream_id" ] call_sid = call_data[ "start" ][ "call_sid" ] # Create serializer with API key for auto hang-up serializer = ExotelFrameSerializer( stream_id = stream_id, call_sid = call_sid, ) # Continue with transport and pipeline setup... Frame Serializer Overview PlivoFrameSerializer On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Server Code Example Assistant Responses are generated using AI and may contain mistakes.
|
serializers_plivo_97adcda6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/serializers/plivo#key-differences-from-twilio
|
2 |
+
Title: PlivoFrameSerializer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PlivoFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers PlivoFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PlivoFrameSerializer enables integration with Plivo’s Audio Streaming WebSocket protocol, allowing your Pipecat application to handle phone calls via Plivo’s voice services. Features Bidirectional audio conversion between Pipecat and Plivo DTMF (touch-tone) event handling Automatic call termination via Plivo’s REST API μ-law audio encoding/decoding Installation The PlivoFrameSerializer does not require any additional dependencies beyond the core Pipecat library. Configuration Constructor Parameters stream_id str required The Plivo Stream ID call_id Optional[str] default: "None" The associated Plivo Call ID (required for auto hang-up) auth_id Optional[str] default: "None" Plivo auth ID (required for auto hang-up) auth_token Optional[str] default: "None" Plivo auth token (required for auto hang-up) params InputParams default: "InputParams()" Configuration parameters InputParams Configuration plivo_sample_rate int default: "8000" Sample rate used by Plivo (typically 8kHz) sample_rate int | None default: "None" Optional override for pipeline input sample rate auto_hang_up bool default: "True" Whether to automatically terminate call on EndFrame Basic Usage Copy Ask AI from pipecat.serializers.plivo import PlivoFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Plivo WebSocket connection stream_id = start_message[ "start" ][ "streamId" ] call_id = start_message[ "start" ][ "callId" ] # Create serializer serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = "your_plivo_auth_id" , auth_token = "your_plivo_auth_token" ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) Hang-up Functionality When auto_hang_up is enabled, the serializer will automatically hang up the Plivo call when an EndFrame or CancelFrame is processed, using Plivo’s REST API: Copy Ask AI # Properly configured with hang-up support serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, # Required for auto hang-up auth_id = os.getenv( "PLIVO_AUTH_ID" ), # Required for auto hang-up auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), # Required for auto hang-up ) Server Code Example Here’s a complete example of handling a Plivo WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.plivo import PlivoFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read the start message from Plivo start_data = websocket.iter_text() start_message = json.loads( await start_data. __anext__ ()) # Extract Plivo-specific IDs from the start event start_info = start_message.get( "start" , {}) stream_id = start_info.get( "streamId" ) call_id = start_info.get( "callId" ) # Create serializer with authentication for auto hang-up serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = os.getenv( "PLIVO_AUTH_ID" ), auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), ) # Continue with transport and pipeline setup... Plivo XML Configuration To enable audio streaming with Plivo, you’ll need to configure your Plivo application to return appropriate XML: Copy Ask AI <? xml version = "1.0" encoding = "UTF-8" ?> < Response > < Stream keepCallAlive = "true" bidirectional = "true" contentType = "audio/x-mulaw;rate=8000" > wss://your-websocket-url/ws </ Stream > </ Response > The bidirectional="true" attribute is required for two-way audio communication, and keepCallAlive="true" prevents the call from being disconnected after XML execution. Key Differences from Twilio Stream Identifier : Plivo uses streamId instead of streamSid Call Identifier : Plivo uses callId instead of callSid XML Structure : Plivo uses <Stream> element directly instead of <Connect><Stream> Authentication : Plivo uses Auth ID and Auth Token instead of Account SID and Auth Token See the Plivo Chatbot example for a complete implementation. ExotelFrameSerializer TwilioFrameSerializer On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Hang-up Functionality Server Code Example Plivo XML Configuration Key Differences from Twilio Assistant Responses are generated using AI and may contain mistakes.
|
server_utilities_76f22f96.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities#param-producer
|
2 |
+
Title: Producer & Consumer Processors - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. Constructor Parameters filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. Constructor Parameters producer ProducerProcessor required The producer processor that will send frames to this consumer. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. Usage Examples Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
|
smart-turn_fal-smart-turn_960d4280.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/smart-turn/fal-smart-turn#constructor-parameters
|
2 |
+
Title: Fal Smart Turn - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Fal Smart Turn - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Smart Turn Detection Fal Smart Turn Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Smart Turn Overview Fal Smart Turn Local CoreML Smart Turn Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FalSmartTurnAnalyzer provides an easy way to use Smart Turn detection via Fal.ai’s cloud infrastructure. This implementation requires minimal setup - just an API key - and offers scalable inference without having to manage your own servers. Installation Copy Ask AI pip install "pipecat-ai[remote-smart-turn]" Requirements A Fal.ai account and API key (get one at Fal.ai ) Internet connectivity for making API calls Configuration Constructor Parameters api_key Optional[str] default: "None" Your Fal.ai API key for authentication (required unless using a custom deployment) url str default: "https://fal.run/fal-ai/smart-turn/raw" URL endpoint for the Smart Turn API (defaults to the official Fal deployment) aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection. See SmartTurnParams for details. Example Copy Ask AI import os import aiohttp from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.vad_analyzer import VADParams from pipecat.transports.base_transport import TransportParams async def setup_transport (): async with aiohttp.ClientSession() as session: transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = FalSmartTurnAnalyzer( api_key = os.getenv( "FAL_SMART_TURN_API_KEY" ), aiohttp_session = session ), ), ) # Continue with pipeline setup... Custom Deployment You can also deploy the Smart Turn model yourself on Fal.ai and point to your custom deployment: Copy Ask AI turn_analyzer = FalSmartTurnAnalyzer( url = "https://fal.run/your-username/your-deployment/raw" , api_key = os.getenv( "FAL_API_KEY" ), aiohttp_session = session ) Performance Considerations Latency : While Fal provides global infrastructure, there will be network latency compared to local inference Reliability : Depends on network connectivity and Fal.ai service availability Scalability : Handles scaling automatically based on your usage Notes Fal handles the model hosting, scaling, and infrastructure management The session timeout is controlled by the stop_secs parameter For high-throughput applications, consider deploying your own inference service Smart Turn Overview Local CoreML Smart Turn On this page Overview Installation Requirements Configuration Constructor Parameters Example Custom Deployment Performance Considerations Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_aws_8bf978a2.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/aws#param-text
|
2 |
+
Title: AWS Transcribe - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Transcribe - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AWS Transcribe Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWSTranscribeSTTService provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams. Installation To use AWSTranscribeSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account. Configuration Constructor Parameters api_key str Your AWS secret access key (can also use environment variable) aws_access_key_id str Your AWS access key ID (can also use environment variable) aws_session_token str Your AWS session token for temporary credentials (can also use environment variable) region str default: "us-east-1" AWS region to use for Transcribe service sample_rate int default: "16000" Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported) language Language default: "Language.EN" Language for transcription Default Settings Copy Ask AI { "sample_rate" : 16000 , "language" : Language. EN , "media_encoding" : "linear16" , # AWS expects raw PCM "number_of_channels" : 1 , "show_speaker_label" : False , "enable_channel_identification" : False } Input The service processes InputAudioRawFrame instances containing: Raw PCM audio data 16-bit depth 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided) Single channel (mono) Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Language used for transcription InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. Methods See the STT base class methods for additional functionality. Language Setting Copy Ask AI await service.set_language(Language. FR ) Usage Example Copy Ask AI from pipecat.services.aws.stt import AWSTranscribeSTTService # Configure service using environment variables for credentials stt = AWSTranscribeSTTService( region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Or provide credentials directly stt = AWSTranscribeSTTService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Language Support AWS Transcribe STT supports the following languages: Language Code Description Service Codes Language.EN English (US) en-US Language.ES Spanish es-US Language.FR French fr-FR Language.DE German de-DE Language.IT Italian it-IT Language.PT Portuguese (Brazil) pt-BR Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.ZH Chinese (Mandarin) zh-CN AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list. Frame Flow Metrics Support The service supports the following metrics: Time to First Byte (TTFB) Processing duration Notes Requires valid AWS credentials with access to Amazon Transcribe Supports real-time transcription with interim results Handles WebSocket connection management and reconnection Only supports mono audio (single channel) Automatically handles audio format conversion to PCM Manages connection lifecycle (start, stop, cancel) AssemblyAI Azure On this page Overview Installation Configuration Constructor Parameters Default Settings Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Usage Example Language Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_cartesia_fd324549.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/cartesia#transcriptionframe
|
2 |
+
Title: Cartesia - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview CartesiaSTTService provides real-time speech-to-text capabilities using Cartesia’s WebSocket API. It supports streaming transcription with both interim and final results using the ink-whisper model. Installation To use CartesiaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . You can obtain a Cartesia API key by signing up at Cartesia . Configuration Constructor Parameters api_key str required Your Cartesia API key base_url str default: "api.cartesia.ai" Custom Cartesia API endpoint URL sample_rate int default: "16000" Audio sample rate in Hz live_options CartesiaLiveOptions Custom transcription options CartesiaLiveOptions model str default: "ink-whisper" The Cartesia transcription model to use language str default: "en" Language code for transcription encoding str default: "pcm_s16le" Audio encoding format sample_rate int default: "16000" Audio sample rate in Hz Default Options Copy Ask AI CartesiaLiveOptions( model = "ink-whisper" , language = "en" , encoding = "pcm_s16le" , sample_rate = 16000 ) Input The service processes raw audio data with the following requirements: PCM audio format ( pcm_s16le ) 16-bit depth 16kHz sample rate (default) Single channel (mono) Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Final transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Detected or configured language InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. Methods See the STT base class methods for additional functionality. Language Setting The service supports language configuration through the CartesiaLiveOptions : Copy Ask AI live_options = CartesiaLiveOptions( language = "es" ) Model Selection Copy Ask AI live_options = CartesiaLiveOptions( model = "ink-whisper" ) Usage Example Copy Ask AI from pipecat.services.cartesia.stt import CartesiaSTTService, CartesiaLiveOptions from pipecat.transcriptions.language import Language # Basic configuration stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ) ) # Advanced configuration live_options = CartesiaLiveOptions( model = "ink-whisper" , language = Language. ES .value, sample_rate = 16000 , encoding = "pcm_s16le" ) stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ), live_options = live_options ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Frame Flow Connection Management The service automatically manages WebSocket connections: Auto-reconnect : Reconnects automatically when the connection is closed due to timeout Finalization : Sends a “finalize” command when user stops speaking to flush the transcription session Error handling : Gracefully handles connection errors and WebSocket exceptions Metrics Support The service supports comprehensive metrics collection: Time to First Byte (TTFB) Processing duration Speech detection events Connection status Notes Requires valid Cartesia API key Supports real-time streaming transcription Handles automatic WebSocket connection management Includes comprehensive error handling Manages connection lifecycle automatically Azure Deepgram On this page Overview Installation Configuration Constructor Parameters CartesiaLiveOptions Default Options Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Model Selection Usage Example Frame Flow Connection Management Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_gladia_a7cb568d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/gladia#param-confidence
|
2 |
+
Title: Gladia - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . Configuration Service Parameters api_key string required Your Gladia API key for authentication url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. sample_rate integer default: "None" Audio sample rate in Hz model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service GladiaInputParams encoding string default: "wav/pcm" Audio encoding format bit_depth integer default: "16" Audio bit depth channels integer default: "1" Number of audio channels custom_metadata Dict[str, Any] Additional metadata to include with requests endpointing float Silence duration in seconds to mark end of speech maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence language Language deprecated Primary language for transcription. Deprecated: use language_config instead. language_config LanguageConfig Detailed language configuration pre_processing PreProcessingConfig Audio pre-processing options realtime_processing RealtimeProcessingConfig Real-time processing features messages_config MessagesConfig WebSocket message filtering options LanguageConfig languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. PreProcessingConfig speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. CustomVocabularyConfig vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. CustomSpellingConfig spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. TranslationConfig target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) model string default: "base" Translation model to use. Options: “base” or “enhanced” match_original_utterances boolean default: "true" Align translated utterances with the original ones RealtimeProcessingConfig words_accurate_timestamps boolean Whether to provide per-word timestamps custom_vocabulary boolean Whether to enable custom vocabulary custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration custom_spelling boolean Whether to enable custom spelling custom_spelling_config CustomSpellingConfig Custom spelling configuration translation boolean Whether to enable translation translation_config TranslationConfig Translation configuration named_entity_recognition boolean Whether to enable named entity recognition sentiment_analysis boolean Whether to enable sentiment analysis MessagesConfig receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket receive_errors boolean default: "true" If true, errors will be sent via WebSocket receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) Output The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Transcription language InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ErrorFrame Generated when transcription errors occur, containing error details. Methods See the STT base class methods for additional functionality. Language Setting Copy Ask AI await service.set_language(Language. FR ) Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . Advanced Features Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Frame Flow Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_google_8b354826.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/google#param-enable-word-time-offsets
|
2 |
+
Title: Google - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GoogleSTTService provides real-time speech-to-text capabilities using Google Cloud’s Speech-to-Text V2 API. It supports interim results, multiple languages, and voice activity detection (VAD). Installation To use GoogleSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need Google Cloud credentials either as a JSON string or file. You can obtain Google Cloud credentials by creating a service account in the Google Cloud Console . Configuration Constructor Parameters credentials str Google Cloud service account credentials as JSON string credentials_path str Path to service account credentials JSON file location str default: "global" Google Cloud location for the service sample_rate int Audio sample rate in Hertz params InputParams Configuration parameters for the service InputParams The InputParams class provides configuration options for the Google STT service. languages Language | List[Language] default: "Language.EN_US" Single language or list of recognition languages. First language is primary. Examples: Language.EN_US [Language.EN_US, Language.ES_US] The first language in the list is considered primary. Recognition accuracy may vary with multiple languages. When using multiple languages, list them in order of expected usage frequency for optimal recognition results. model str default: "latest_long" Speech recognition model to use. use_separate_recognition_per_channel bool default: "False" Process each audio channel separately for multi-channel audio. enable_automatic_punctuation bool default: "True" Automatically add punctuation marks to transcriptions. enable_spoken_punctuation bool default: "False" Include spoken punctuation (e.g., “period”, “comma”) in transcript. enable_spoken_emojis bool default: "False" Include spoken emojis (e.g., “smiley face”) in transcript. profanity_filter bool default: "False" Filter profanity from transcriptions. enable_word_time_offsets bool default: "False" Include timing information for each word. enable_word_confidence bool default: "False" Include confidence scores for each word. enable_interim_results bool default: "True" Stream partial recognition results as they become available. enable_voice_activity_events bool default: "False" Enable voice activity detection events. Not all features are available for all models or languages Some combinations of options may affect latency or accuracy Model selection should match your use case for best results Input The service processes raw audio data with: Linear16 PCM encoding 16-bit depth Configurable sample rate Single channel (mono) Output Frames The service produces two types of frames: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Recognition language InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. Methods set_languages method Updates the service’s recognition language. Copy Ask AI async def set_languages ( language : List[Language]) -> None Example: Copy Ask AI await service.set_languages([Language. FR_FR ]) set_model method Updates the service’s recognition model. Copy Ask AI async def set_model ( model : str ) -> None Example: Copy Ask AI await service.set_model( "medical_dictation" ) update_options method Updates multiple service options dynamically. Copy Ask AI async def update_options ( * , languages : Optional[List[Language]] = None , model : Optional[ str ] = None , enable_automatic_punctuation : Optional[ bool ] = None , enable_spoken_punctuation : Optional[ bool ] = None , enable_spoken_emojis : Optional[ bool ] = None , profanity_filter : Optional[ bool ] = None , enable_word_time_offsets : Optional[ bool ] = None , enable_word_confidence : Optional[ bool ] = None , enable_interim_results : Optional[ bool ] = None , enable_voice_activity_events : Optional[ bool ] = None , location : Optional[ str ] = None , ) -> None Example: Copy Ask AI await service.update_options( languages = [Language. ES_ES , Language. EN_US ], enable_interim_results = True , profanity_filter = True ) See the STT base class methods for additional functionality. Usage Example Copy Ask AI from pipecat.services.google.stt import GoogleSTTService from pipecat.transcriptions.language import Language # Configure service stt = GoogleSTTService( credentials_path = "path/to/credentials.json" , params = GoogleSTTService.InputParams( languages = Language. EN_US , model = "latest_long" , enable_automatic_punctuation = True , enable_interim_results = True ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, ... ]) Regional Support Google Cloud Speech-to-Text V2 supports different regional endpoints for improved latency and data residency requirements. Available Regions See supported languages, models, and features for each region in Google’s Speech-to-Text documentation . Configuration Specify the region during service initialization: Copy Ask AI stt = GoogleSTTService( credentials = credentials, location = "us-central1" , # Use us-central1 endpoint params = GoogleSTTService.InputParams( model = "chirp_2" ) ) Dynamic Region Updates The region can be updated during runtime: Copy Ask AI await stt.update_options( location = "asia" ) Notes The global endpoint is used by default Regional endpoints may provide lower latency for users in those regions Some features or models might only be available in specific regions Regional selection may affect pricing Data residency requirements may dictate region selection Models Model Name Description Best For chirp_2 Google’s latest ASR model General use cases latest_long Latest model optimized for long-form speech Conversations, meetings latest_short Latest model optimized for short-form speech Short messages, notes telephony Optimized for phone calls Call centers medical_dictation Optimized for medical terminology Healthcare dictation medical_conversation Optimized for doctor-patient interactions Medical consultations See Google Cloud’s Speech-to-Text documentation for more details. Language Support Language Code Description Service Codes Language.AF Afrikaans af-ZA Language.SQ Albanian sq-AL Language.AM Amharic am-ET Language.AR Arabic (Default: Egypt) ar-EG Language.AR_AE Arabic (UAE) ar-AE Language.AR_BH Arabic (Bahrain) ar-BH Language.AR_DZ Arabic (Algeria) ar-DZ Language.AR_EG Arabic (Egypt) ar-EG Language.AR_IQ Arabic (Iraq) ar-IQ Language.AR_JO Arabic (Jordan) ar-JO Language.AR_KW Arabic (Kuwait) ar-KW Language.AR_LB Arabic (Lebanon) ar-LB Language.AR_MA Arabic (Morocco) ar-MA Language.AR_OM Arabic (Oman) ar-OM Language.AR_QA Arabic (Qatar) ar-QA Language.AR_SA Arabic (Saudi Arabia) ar-SA Language.AR_SY Arabic (Syria) ar-SY Language.AR_TN Arabic (Tunisia) ar-TN Language.AR_YE Arabic (Yemen) ar-YE Language.HY Armenian hy-AM Language.AZ Azerbaijani az-AZ Language.EU Basque eu-ES Language.BN Bengali (Default: India) bn-IN Language.BN_BD Bengali (Bangladesh) bn-BD Language.BN_IN Bengali (India) bn-IN Language.BS Bosnian bs-BA Language.BG Bulgarian bg-BG Language.MY Burmese my-MM Language.CA Catalan ca-ES Language.ZH Chinese (Default: Simplified) cmn-Hans-CN Language.ZH_CN Chinese (Simplified) cmn-Hans-CN Language.ZH_HK Chinese (Hong Kong) cmn-Hans-HK Language.ZH_TW Chinese (Traditional) cmn-Hant-TW Language.YUE Chinese (Cantonese) yue-Hant-HK Language.HR Croatian hr-HR Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Default: Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.NL_NL Dutch (Netherlands) nl-NL Language.EN English (Default: US) en-US Language.EN_AU English (Australia) en-AU Language.EN_CA English (Canada) en-CA Language.EN_GB English (UK) en-GB Language.EN_GH English (Ghana) en-GH Language.EN_HK English (Hong Kong) en-HK Language.EN_IN English (India) en-IN Language.EN_IE English (Ireland) en-IE Language.EN_KE English (Kenya) en-KE Language.EN_NG English (Nigeria) en-NG Language.EN_NZ English (New Zealand) en-NZ Language.EN_PH English (Philippines) en-PH Language.EN_SG English (Singapore) en-SG Language.EN_TZ English (Tanzania) en-TZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.ET Estonian et-EE Language.FIL Filipino fil-PH Language.FI Finnish fi-FI Language.FR French (Default: France) fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.FR_CH French (Switzerland) fr-CH Language.GL Galician gl-ES Language.KA Georgian ka-GE Language.DE German (Default: Germany) de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.EL Greek el-GR Language.GU Gujarati gu-IN Language.HE Hebrew iw-IL Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.IS Icelandic is-IS Language.ID Indonesian id-ID Language.IT Italian it-IT Language.IT_CH Italian (Switzerland) it-CH Language.JA Japanese ja-JP Language.JV Javanese jv-ID Language.KN Kannada kn-IN Language.KK Kazakh kk-KZ Language.KM Khmer km-KH Language.KO Korean ko-KR Language.LO Lao lo-LA Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MK Macedonian mk-MK Language.MS Malay ms-MY Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.MN Mongolian mn-MN Language.NE Nepali ne-NP Language.NO Norwegian no-NO Language.FA Persian fa-IR Language.PL Polish pl-PL Language.PT Portuguese (Default: Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.PA Punjabi pa-Guru-IN Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SR Serbian sr-RS Language.SI Sinhala si-LK Language.SK Slovak sk-SK Language.SL Slovenian sl-SI Language.ES Spanish (Default: Spain) es-ES Language.ES_AR Spanish (Argentina) es-AR Language.ES_BO Spanish (Bolivia) es-BO Language.ES_CL Spanish (Chile) es-CL Language.ES_CO Spanish (Colombia) es-CO Language.ES_CR Spanish (Costa Rica) es-CR Language.ES_DO Spanish (Dominican Republic) es-DO Language.ES_EC Spanish (Ecuador) es-EC Language.ES_GT Spanish (Guatemala) es-GT Language.ES_HN Spanish (Honduras) es-HN Language.ES_MX Spanish (Mexico) es-MX Language.ES_NI Spanish (Nicaragua) es-NI Language.ES_PA Spanish (Panama) es-PA Language.ES_PE Spanish (Peru) es-PE Language.ES_PR Spanish (Puerto Rico) es-PR Language.ES_PY Spanish (Paraguay) es-PY Language.ES_SV Spanish (El Salvador) es-SV Language.ES_US Spanish (US) es-US Language.ES_UY Spanish (Uruguay) es-UY Language.ES_VE Spanish (Venezuela) es-VE Language.SU Sundanese su-ID Language.SW Swahili (Default: Tanzania) sw-TZ Language.SW_KE Swahili (Kenya) sw-KE Language.SW_TZ Swahili (Tanzania) sw-TZ Language.SV Swedish sv-SE Language.TA Tamil (Default: India) ta-IN Language.TA_IN Tamil (India) ta-IN Language.TA_MY Tamil (Malaysia) ta-MY Language.TA_SG Tamil (Singapore) ta-SG Language.TA_LK Tamil (Sri Lanka) ta-LK Language.TE Telugu te-IN Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.UR Urdu (Default: India) ur-IN Language.UR_IN Urdu (India) ur-IN Language.UR_PK Urdu (Pakistan) ur-PK Language.UZ Uzbek uz-UZ Language.VI Vietnamese vi-VN Language.XH Xhosa xh-ZA Language.ZU Zulu zu-ZA Special Features Supports multiple languages simultaneously Provides regional variants for many languages Handles different Chinese scripts (simplified/traditional) Supports medical-specific models Frame Flow Notes Requires Google Cloud credentials Supports real-time transcription Handles streaming connection management Provides dynamic configuration updates Supports model switching Includes VAD capabilities Manages connection lifecycle Gladia Groq (Whisper) On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Usage Example Regional Support Available Regions Configuration Dynamic Region Updates Notes Models Language Support Special Features Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_groq_c74a892d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/groq#set-model
|
2 |
+
Title: Groq (Whisper) - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Groq (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Groq (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GroqSTTService provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. Installation To use GroqSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq API key from the Groq Console . Configuration Constructor Parameters model str default: "whisper-large-v3-turbo" Whisper model to use. Currently only “whisper-large-v3-turbo” is available. api_key str Your Groq API key. If not provided, will use environment variable. base_url str default: "https://api.groq.com/openai/v1" Custom API base URL for Groq API requests. language Language default: "Language.EN" Language of the audio input. Defaults to English. prompt str Optional text to guide the model’s style or continue a previous segment. temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Detected language (if available) ErrorFrame Generated when transcription errors occur, containing error details. Methods Set Model Copy Ask AI await service.set_model( "whisper-large-v3-turbo" ) See the STT base class methods for additional functionality. Language Support Groq’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy Groq’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the Groq documentation. Usage Example Copy Ask AI from pipecat.services.groq.stt import GroqSTTService from pipecat.transcriptions.language import Language # Configure service stt = GroqSTTService( model = "whisper-large-v3-turbo" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time Notes Requires valid Groq API key Uses Groq’s hosted Whisper model Requires VAD component in transport Processes complete utterances, not continuous audio Handles API rate limiting Automatic error handling Thread-safe processing Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Google NVIDIA Riva On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
stt_sambanova_7965bf86.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/sambanova#overview
|
2 |
+
Title: SambaNova (Whisper) - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SambaNova (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text SambaNova (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. Installation To use SambaNovaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[sambanova]" You need to set up your SambaNova API key as an environment variable: SAMBANOVA_API_KEY . Get your SambaNova API key here . Configuration Constructor Parameters model str default: "Whisper-Large-v3" Whisper model to use. Currently only “Whisper-Large-v3” is available. api_key str Your SambaNova API key. If not provided, will use environment variable. base_url str default: "https://api.sambanova.ai/v1" Custom API base URL for SambaNova API requests. language Language default: "Language.EN" Language of the audio input. Defaults to English. prompt str Optional text to guide the model’s style or continue a previous segment. temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. Input The service processes audio data with the following requirements: PCM audio format. 16-bit depth. Single channel (mono). Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Detected language (if available) ErrorFrame Generated when transcription errors occur, containing error details. Methods Set Model Copy Ask AI await service.set_model( "Whisper-Large-v3" ) See the STT base class methods for additional functionality. Language Support SambaNova’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy SambaNova’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the SambaNova’s docs . Usage Example Copy Ask AI from pipecat.services.sambanova.stt import SambaNovaSTTService from pipecat.transcriptions.language import Language # Configure service stt = SambaNovaSTTService( model = "Whisper-Large-v3" , api_key = "your-sambanova-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise. Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection. Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline. Only sends complete utterances to the API when speech has ended. Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. Metrics Support The service collects the following metrics: Time to First Byte (TTFB). Processing duration. API response time. Notes Requires valid SambaNova API key. Uses SambaNova’s hosted Whisper model. Requires VAD component in transport. Processes complete utterances, not continuous audio. Handles API rate limiting. Automatic error handling. Thread-safe processing. Error Handling The service handles common API errors including: Authentication errors. Rate limiting. Invalid audio format. Network connectivity issues. API timeouts. Errors are propagated through ErrorFrames with descriptive messages. OpenAI Speechmatics On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
|