from groq import Groq from pydantic import BaseModel, ValidationError from typing import List, Literal import os import tiktoken import json import re import tempfile import requests from bs4 import BeautifulSoup import subprocess import pyttsx3 from pydub import AudioSegment groq_client = Groq(api_key=os.environ["GROQ_API_KEY"]) tokenizer = tiktoken.get_encoding("cl100k_base") class DialogueItem(BaseModel): speaker: Literal["Maria", "Sarah"] text: str class Dialogue(BaseModel): dialogue: List[DialogueItem] def truncate_text(text, max_tokens=2048): tokens = tokenizer.encode(text) if len(tokens) > max_tokens: return tokenizer.decode(tokens[:max_tokens]) return text def extract_text_from_url(url): try: response = requests.get(url) response.raise_for_status() soup = BeautifulSoup(response.text, 'html.parser') for script in soup(["script", "style"]): script.decompose() text = soup.get_text() lines = (line.strip() for line in text.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = '\n'.join(chunk for chunk in chunks if chunk) return text except Exception as e: raise ValueError(f"Error extracting text from URL: {str(e)}") def generate_script(system_prompt: str, input_text: str, tone: str, target_length: str): input_text = truncate_text(input_text) word_limit = 300 if target_length == "Short (1-2 min)" else 750 prompt = f""" {system_prompt} TONE: {tone} TARGET LENGTH: {target_length} (approximately {word_limit} words) INPUT TEXT: {input_text} Generate a complete, well-structured podcast script that: 1. Starts with a proper introduction 2. Covers the main points from the input text 3. Has a natural flow of conversation between Maria and Sarah 4. Concludes with a summary and sign-off 5. Fits within the {word_limit} word limit for the target length of {target_length} Ensure the script is not abruptly cut off and forms a complete conversation. """ response = groq_client.chat.completions.create( messages=[ {"role": "system", "content": prompt}, ], model="llama-3.1-70b-versatile", max_tokens=2048, temperature=0.7 ) content = response.choices[0].message.content content = re.sub(r'```json\s*|\s*```', '', content) try: json_data = json.loads(content) dialogue = Dialogue.model_validate(json_data) except json.JSONDecodeError as json_error: match = re.search(r'\{.*\}', content, re.DOTALL) if match: try: json_data = json.loads(match.group()) dialogue = Dialogue.model_validate(json_data) except (json.JSONDecodeError, ValidationError) as e: raise ValueError(f"Failed to parse dialogue JSON: {e}\nContent: {content}") else: raise ValueError(f"Failed to find valid JSON in the response: {content}") except ValidationError as e: raise ValueError(f"Failed to validate dialogue structure: {e}\nContent: {content}") return dialogue def generate_audio_espeak(text: str, speaker: str) -> str: voice = "en-us+f3" if speaker == "Maria" else "en-gb+f3" with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_audio: subprocess.call(['espeak-ng', '-v', voice, '-w', temp_audio.name, text]) return temp_audio.name def generate_audio_pyttsx3(text: str, speaker: str) -> str: engine = pyttsx3.init() voices = engine.getProperty('voices') engine.setProperty('voice', voices[1].id if speaker == "Maria" else voices[0].id) with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_audio: engine.save_to_file(text, temp_audio.name) engine.runAndWait() return temp_audio.name def generate_audio(text: str, speaker: str) -> str: try: return generate_audio_espeak(text, speaker) except Exception: return generate_audio_pyttsx3(text, speaker)