Spaces:

MiniMaxAI
/

MiniMax-Speech-Tech-Report

Running

App Files Files Community

sriting commited on May 14

Commit

184b86a

1 Parent(s): ee03a71

feat: update link in tech report

Browse files

Files changed (1) hide show

index.html +13 -9

index.html CHANGED Viewed

@@ -10,7 +10,7 @@
 	<meta name="keywords" content="latex.css,css library,class-less css,latex css" />
 	<meta property="og:title"
 		content="MiniMax-Speech Tech Report | Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder" />
-	<meta property="og:url" content="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report" />
 	<meta property="og:description"
 		content=" MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech" />
 	<meta property="og:type" content="website" />
@@ -28,9 +28,11 @@
 			Encoder</h4>
 		<p class="author">
 			MiniMax Team <span class="date">May 2025</span><br />
-			<a style="font-size: 1.1rem;" target="_blank"
-				href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
 				Report]</a>
 		</p>
 	</header>
@@ -57,13 +59,16 @@
 			control
 			via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
 			voice
-			cloning (PVC) by fine-tuning timbre features with additional data. Welcome to visit
-			<a href="https://www.minimax.io/audio">MiniMax Audio</a> and
-			explore our powerful TTS features.
 		</p>
 	</div>
 	<nav role="navigation" class="toc">
 		<h2>Contents</h2>
 		<ol>
 			<li>
@@ -232,9 +237,8 @@
 					features based
 					on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
 					rate,
-					emotions, etc.) demonstrated in the audio prompt (The additional input that OneShot has compared to ZeroShot,
-					see
-					technical report for details).
 				</p>
 				<div class="scroll-wrapper" style="margin-top: 2rem;">
 					<table style="width: 100%;">

 	<meta name="keywords" content="latex.css,css library,class-less css,latex css" />
 	<meta property="og:title"
 		content="MiniMax-Speech Tech Report | Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder" />
+	<meta property="og:url" content="https://minimax-ai.github.io/tts_tech_report" />
 	<meta property="og:description"
 		content=" MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech" />
 	<meta property="og:type" content="website" />
 			Encoder</h4>
 		<p class="author">
 			MiniMax Team <span class="date">May 2025</span><br />
+			<a style="font-size: 1.1rem;" target="_blank" href="https://arxiv.org/abs/2505.07916">[Tech
 				Report]</a>
+			<a style="font-size: 1.1rem; margin-left: 1rem;" target="_blank"
+				href="https://huggingface.co/datasets/MiniMaxAI/TTS-Multilingual-Test-Set">[Multilingual Test Set]</a>
+			<a style="font-size: 1.1rem; margin-left: 1rem;" target="_blank" href="https://github.com/MiniMax-AI">[GitHub]</a>
 		</p>
 	</header>
 			control
 			via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
 			voice
+			cloning (PVC) by fine-tuning timbre features with additional data.
 		</p>
 	</div>
 	<nav role="navigation" class="toc">
+		<h2>Explore MiniMax-Speech</h2>
+		<p>Welcome to visit
+			<a href="https://www.minimax.io/audio">MiniMax Audio</a> and
+			explore our powerful TTS features.
+		</p>
 		<h2>Contents</h2>
 		<ol>
 			<li>
 					features based
 					on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
 					rate,
+					emotions, etc.). For details of Zero-Shot and One-Shot, refer to the <a
+						href="https://arxiv.org/abs/2505.07916" target="_blank">technical report</a>.
 				</p>
 				<div class="scroll-wrapper" style="margin-top: 2rem;">
 					<table style="width: 100%;">