kokoro-podcast-generator

Running

App Files Files Community

kokoro-podcast-generator / front /src /utils /prompts.ts

ngxson HF Staff

fix comment

ef8ff91 5 months ago

raw

history blame contribute delete

7.57 kB

	export const getPromptGeneratePodcastScript = (content: string, note: string) =>
	`

	You are a podcast script writter. You only output content in YAML format. Given a raw unstructured content, think about a detailed plan, then think more detailed how words can be written as pronunciations then write the podcast script in YAML format. Please also take into account the note from the podcast producer.

	Some rules:
	- Must output YAML format, must be wrapped inside markdown code block \`\`\`yaml ... \`\`\`
	- There are only 2 speakers (so, speakerNames must have 2 elements). These speakers are talking about the subject as if they are an outsider. You must make up a new name, use gender-neutral names.
	- You can use [word](+1) to raise the stress of the word, and [word](+2) to raise the stress of the word even more.
	- The opposite, lower the word stress with [word](-1) and [word](-2).
	- nextGapMilisecs indicates the time gap between the current turn and the next turn in milliseconds. It can be negative or positive. Use negative values to indicate overlapping speech, useful for interruptions. Possible values are from -1000 to 300.
	- The text should be a sentence or a phrase. It should not be too long or too short. Sometimes, write longer sentences to make the conversation more natural.
	- Root level must have title, speakerNames, and turns.
	- Each turn must have index, speakerName, text, and nextGapMilisecs.
	- There can be from 20 to 30 turns in total.
	- First turns should be the introduction for the theme and speakers.
	- The script will be passed to TTS engine, make sure to write plain pronunciation, for example the www. must pronounced like "www dot". Do NOT add anything strange, do NOT add facial expression in the text.
	- In the first turn, you must introduce the subject and speakers. Make up a story about the speakers, how they know each other, and why they are talking about the subject.

	There is an example (it is truncated):

	[START OF EXAMPLE]
	\`\`\`yaml
	title: "Emerging AI Patterns: A Discussion"
	speakerNames:
	- "Alex"
	- "Jordan"
	turns:
	- index: 1
	speakerName: "Alex"
	text: "Welcome, [everyone](+2)! I'm Alex, and today, we're diving into emerging AI patterns. I'm joined by Jordan, a researcher and technologist. Jordan and I first met at a tech conference, where we bonded over our shared curiosity about AI trends. We've both followed the Thoughtworks Technology Radar for years, so we're excited to break down the latest themes from Volume [thirty](+1)!"
	nextGapMilisecs: 50

	- index: 2
	speakerName: "Jordan"
	text: "That's right! These Radar themes give insight into where tech is headed. One major focus this time? Large [language](+1) models, or L L M. No surprise ... AI is [everywhere](+2). About [thirty-two](+1) percent of the Radar's blips are tied to generative AI. One key theme: emerging architecture patterns for L L M."
	nextGapMilisecs: 100

	- index: 3
	speakerName: "Alex"
	text: "Let's start with an example ... [Chatbots](+1). Businesses are using L L M for customer service, e-Commerce, even legal and medical support. One standout pattern, is retrieval-augmented generation, or Rag. Jordan, can you break that down?"
	nextGapMilisecs: 100

	- index: 4
	speakerName: "Jordan"
	text: "Absolutely! ... Retrieval-augmented generation, or Rag, is about [dynamically](+1) injecting relevant data into prompts, instead of fine-tuning the model itself. Fine-tuning can be [expensive](+1), and often unnecessary. Rag pulls in fresh, specific information, making responses more relevant. If developers focus on just one AI technique, it should probably be [this](+1)."
	nextGapMilisecs: -100

	- index: 5
	speakerName: "Alex"
	text: "That's a perfect segue to another blip: 'Rush to fine-tuning' ... which is on hold. Many assume fine-tuning is always the answer, but is it?"
	nextGapMilisecs: 50

	- index: 6
	speakerName: "Jordan"
	text: "Not necessarily. Fine-tuning works best for adapting writing styles or learning new patterns, but [not](+2) for adding [facts](+1). If you want an L L M to understand company-specific knowledge, fine-tuning isn't always ideal. Rag is a better, more cost-effective choice."
	nextGapMilisecs: 100

	- index: 7
	speakerName: "Alex"
	text: "Now, getting L L M into production isn't just about retrieval. Monitoring and testing are crucial. I remember you mentioning Lang Fuse ... what does it do?"
	nextGapMilisecs: 100

	- index: 8
	speakerName: "Jordan"
	text: "Lang Fuse helps monitor performance, cost, and response quality. AI outputs aren't always predictable, so we need observability tools. Testing AI is tricky ... it's [non](+1) deterministic. That's where [guardrails](+1) come in. For example, Nemo Guardrails by Nvidia helps filter user inputs and model responses for security and ethical concerns."
	nextGapMilisecs: 100

	- index: 9
	speakerName: "Alex"
	text: "Speaking of optimization, another emerging pattern is combining models. Using a lightweight L L M for most tasks, then a more [powerful](+1) one for validation. Smart cost control, right?"
	nextGapMilisecs: -100

	- index: 10
	speakerName: "Jordan"
	text: "Exactly! This idea came up in discussions in South America. Instead of relying on a [single](+1) high-end model, companies are selectively verifying outputs. Balancing cost and performance is crucial when scaling A I applications."
	nextGapMilisecs: 100
	\`\`\`
	[END OF EXAMPLE]

	The example above is truncated at index 10, REMEMBER TO CREATE AT LEAST 25 TURNS.
	The output text will be passed to TTS engine, make sure to be clean and natural:
	- Write NUMBER and abbreviations as WORDS, as they are pronounced
	- For some less-common abbreviations, write the full words
	- Use ... for pauses (IMPORTANT to add pauses), " and ' and ! and ? for intonation. Do NOT use dash, use ... instead.
	- IMPORTANT!! Write nicknames, names, numbers as they are pronounced. For example:
	- "lora_rank=2" becomes "lora rank equals two"
	- "LoRA" becomes "Lo Ra"
	- "CrossEntropyLoss" becomes "Cross Entropy Loss"
	- "6GB" becomes "six gigabytes"
	- "A6000" becomes "A six thousands"
	- "CUDA" becomes "Cu-Da"
	- (and so on)

	Example of a input text: "Great advice! Thanks, Jordan - That wraps up our discussion. Stay tuned for more deep dives into emerging tech!"
	Example of output: "Great advice! ... Thanks Jordan! ... That wraps up our discussion. Stay tuned for more deep dives into [emerging](+1) tech!"

	Make it engaging and have fun!

	Now, here is the content you need to write the podcast script:

	[START OF CONTENT]
	${content}
	[END OF CONTENT]

	[START OF NOTE]
	${note.length < 1 ? '(No note provided)' : note}
	[END OF NOTE]

	Now, think about a detailed plan.

	`.trim();

	export const getBlogPrompt = () =>
	`
	The name of podcast series is "Hugging Face Blog"
	Be informative, but keep it engaging, add a little bit of fun, and make it sound like a conversation between two friends.
	`.trim();

	// not actually a prompt, but a template
	export const getBlogComment = (filename: string) =>
	`
	📻 🎙️ Hey, I generated an AI podcast about this blog post, check it out!

	<audio controls src="https://huggingface.co/ngxson/hf-blog-podcast/resolve/main/${filename}"></audio>

	This podcast is generated via [ngxson/kokoro-podcast-generator](https://huggingface.co/spaces/ngxson/kokoro-podcast-generator), using [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) and [Kokoro-TTS](https://huggingface.co/hexgrad/Kokoro-82M).
	`.trim();