Spaces:
Running
Running
File size: 7,778 Bytes
92634e9 62ae11f 92634e9 62ae11f 92634e9 c423d00 92634e9 62ae11f 92634e9 c423d00 92634e9 62ae11f 92634e9 5daaa4f 92634e9 c423d00 92634e9 5daaa4f 92634e9 62ae11f 92634e9 c423d00 5daaa4f 92634e9 62ae11f 5daaa4f c423d00 5daaa4f 92634e9 c423d00 92634e9 5daaa4f 92634e9 c423d00 62ae11f c423d00 62ae11f c423d00 62ae11f c423d00 38b51b3 c423d00 92634e9 5daaa4f bb0170b 5daaa4f 373bffb 459d010 bb0170b 5432885 2bbadb6 4f668b7 bb0170b 459d010 38b51b3 bb0170b 4f668b7 c423d00 38b51b3 62ae11f 459d010 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 c423d00 38b51b3 62ae11f 38b51b3 f258b8d 38b51b3 c423d00 38b51b3 373bffb 38b51b3 f258b8d 5daaa4f 38b51b3 5daaa4f 92634e9 38b51b3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Speech-to-Speech Model Comparison</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<style>
body {
background-color: #f0f8ff;
font-family: 'Arial', sans-serif;
}
.container {
background-color: #fff;
border-radius: 15px;
box-shadow: 0 6px 15px rgba(0, 0, 0, 0.15);
padding: 40px;
max-width: 800px;
margin: 30px auto;
}
h3 {
font-size: 2rem;
font-weight: bold;
color: #333;
text-align: center;
margin-bottom: 20px;
}
p {
color: #555;
font-size: 1rem;
line-height: 1.8;
}
.btn {
border-radius: 25px;
font-size: 1.1rem;
padding: 12px 25px;
font-weight: bold;
transition: background-color 0.3s ease, transform 0.2s ease;
}
.btn-primary {
background-color: #007bff;
border: none;
}
.btn-primary:hover {
background-color: #0056b3;
transform: scale(1.05);
}
.icon {
color: #f39c12;
margin-right: 5px;
}
.section-title {
font-size: 1.2rem;
font-weight: bold;
color: #007bff;
display: flex;
align-items: center;
margin-top: 20px;
}
.section-title .fa {
margin-right: 10px;
}
.audio-container {
text-align: center;
margin-top: 20px;
}
.audio-container .audio-item {
display: flex;
justify-content: center;
align-items: center;
margin-bottom: 15px;
}
.audio-container .audio-item span {
margin-right: 10px;
font-weight: bold;
}
audio {
display: inline-block;
}
</style>
</head>
<body>
<div class="container py-5">
<h3 class="mb-4">⚖️ Speech-to-Speech Model Comparison</h3>
<div id="evaluation-info" class="mb-5">
<p class="text-start">
<span class="section-title"><i class="fas fa-info-circle"></i> Welcome to the Speech-to-Speech (S2S)
Model Evaluation! 👏</span>
In this evaluation, you will assess the performance of different S2S models, such as
<strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>,
<strong>Mini-Omni</strong>, <strong>Cascade</strong>, and <strong>LLaMA-Omni</strong>.
<br>
<span>🎯 <strong>Goal:</strong> Test how well these models handle speech tasks across different domains.<span>
<span class="section-title"><i class="fas fa-tasks"></i> How It Works</span>
Once you select a specific domain and task (e.g., <em>Educational Tutoring</em> and <em>Rhythm
Control</em>),
you will proceed to the evaluation stage. In each round, you will be presented with an audio input.
<span><strong>
<br>
🌰 Example:</strong></span>
<div class="audio-container">
<div class="audio-item">
<span>Audio Sample:</span>
<audio controls>
<source src="/static/audio/sample/input_audio.wav" type="audio/wav">
</audio>
</div>
</div>
The corresponding text is:
<em>"Say the following sentence at my speed first, then say it again very slowly:
'Artificial intelligence is changing the world in many ways.'" </em> 🧠
<small>(Note: the audio plays at 1.5x the normal speed.)</small>
<span class="section-title"><i class="fas fa-star"></i> Model Performance</span>
<div class="audio-container">
<div class="audio-item">
<span>ChatGPT-4o:</span>
<audio controls>
<source src="/static/audio/sample/4o_audio.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
missing
information.
</p>
<br>
<div class="audio-item">
<span>FunAudioLLM:</span>
<audio controls>
<source src="/static/audio/sample/FunAudio_audio.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
missing
information.
</p>
<br>
<div class="audio-item">
<span>SpeechGPT:</span>
<audio controls>
<source src="/static/audio/sample/SpeechGPT.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Partially followed the instruction, with minor semantic deviation and
missing information.
</p>
<br>
<div class="audio-item">
<span>Mini-Omni:</span>
<audio controls>
<source src="/static/audio/sample/mini-omni.wav" type="audio/wav">
</audio>
</div>
<p style="margin: 0; text-align: left;">
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
</p>
<p style="margin: 0; text-align: left;">
🧾 <strong>Semantics:</strong> Did not follow the instruction, with significant semantic deviation
and missing information.
</p>
</div>
<p class="text-start">
After making your choice, you'll proceed to the next round. 🔄
</p>
<p class="text-start">
<strong>Click the button below to start the evaluation! 🚀</strong>
</p>
</div>
<div class="text-center">
<a href="http://71.132.14.167:6002/" target="_blank" class="btn btn-primary"><i class="fas fa-play"></i>
Start Evaluation</a>
</div>
</div>
</body>
</html> |