File size: 7,778 Bytes
92634e9
 
 
 
 
 
 
 
62ae11f
92634e9
 
62ae11f
92634e9
 
c423d00
92634e9
62ae11f
 
 
 
 
 
92634e9
c423d00
92634e9
62ae11f
92634e9
 
5daaa4f
92634e9
 
c423d00
92634e9
5daaa4f
92634e9
62ae11f
92634e9
c423d00
5daaa4f
92634e9
62ae11f
 
5daaa4f
 
 
c423d00
5daaa4f
 
92634e9
 
c423d00
92634e9
 
5daaa4f
92634e9
c423d00
62ae11f
 
 
 
c423d00
62ae11f
 
 
 
 
 
 
 
c423d00
62ae11f
 
 
c423d00
 
 
 
 
 
38b51b3
 
 
 
 
 
 
 
 
 
 
 
 
c423d00
 
92634e9
 
 
 
5daaa4f
bb0170b
5daaa4f
373bffb
 
459d010
bb0170b
5432885
2bbadb6
 
4f668b7
bb0170b
459d010
38b51b3
 
bb0170b
4f668b7
 
 
c423d00
38b51b3
 
 
 
 
 
 
 
 
 
 
 
62ae11f
459d010
62ae11f
38b51b3
 
 
c423d00
 
 
 
38b51b3
 
 
 
 
 
 
 
 
62ae11f
38b51b3
 
c423d00
 
 
 
38b51b3
 
 
 
 
 
 
 
 
62ae11f
38b51b3
 
c423d00
 
 
 
38b51b3
 
 
 
 
 
 
62ae11f
38b51b3
f258b8d
38b51b3
 
c423d00
 
 
 
38b51b3
 
373bffb
38b51b3
 
 
 
 
 
 
 
 
f258b8d
 
 
5daaa4f
 
 
 
38b51b3
 
5daaa4f
92634e9
 
 
38b51b3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech-to-Speech Model Comparison</title>
    <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
    <style>
        body {
            background-color: #f0f8ff;
            font-family: 'Arial', sans-serif;
        }

        .container {
            background-color: #fff;
            border-radius: 15px;
            box-shadow: 0 6px 15px rgba(0, 0, 0, 0.15);
            padding: 40px;
            max-width: 800px;
            margin: 30px auto;
        }

        h3 {
            font-size: 2rem;
            font-weight: bold;
            color: #333;
            text-align: center;
            margin-bottom: 20px;
        }

        p {
            color: #555;
            font-size: 1rem;
            line-height: 1.8;
        }

        .btn {
            border-radius: 25px;
            font-size: 1.1rem;
            padding: 12px 25px;
            font-weight: bold;
            transition: background-color 0.3s ease, transform 0.2s ease;
        }

        .btn-primary {
            background-color: #007bff;
            border: none;
        }

        .btn-primary:hover {
            background-color: #0056b3;
            transform: scale(1.05);
        }

        .icon {
            color: #f39c12;
            margin-right: 5px;
        }

        .section-title {
            font-size: 1.2rem;
            font-weight: bold;
            color: #007bff;
            display: flex;
            align-items: center;
            margin-top: 20px;
        }

        .section-title .fa {
            margin-right: 10px;
        }

        .audio-container {
            text-align: center;
            margin-top: 20px;
        }

        .audio-container .audio-item {
            display: flex;
            justify-content: center;
            align-items: center;
            margin-bottom: 15px;
        }

        .audio-container .audio-item span {
            margin-right: 10px;
            font-weight: bold;
        }

        audio {
            display: inline-block;
        }
    </style>
</head>

<body>
    <div class="container py-5">
        <h3 class="mb-4">⚖️ Speech-to-Speech Model Comparison</h3>

        <div id="evaluation-info" class="mb-5">
            <p class="text-start">
                <span class="section-title"><i class="fas fa-info-circle"></i> Welcome to the Speech-to-Speech (S2S)
                    Model Evaluation! 👏</span>
                In this evaluation, you will assess the performance of different S2S models, such as
                <strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>,
                <strong>Mini-Omni</strong>, <strong>Cascade</strong>, and <strong>LLaMA-Omni</strong>.
                <br>
                <span>🎯 <strong>Goal:</strong> Test how well these models handle speech tasks across different domains.<span>
                <span class="section-title"><i class="fas fa-tasks"></i> How It Works</span>
                Once you select a specific domain and task (e.g., <em>Educational Tutoring</em> and <em>Rhythm
                    Control</em>),
                you will proceed to the evaluation stage. In each round, you will be presented with an audio input.
                <span><strong>
                <br>
                🌰 Example:</strong></span>

            <div class="audio-container">
                <div class="audio-item">
                    <span>Audio Sample:</span>
                    <audio controls>
                        <source src="/static/audio/sample/input_audio.wav" type="audio/wav">
                    </audio>
                </div>
            </div>
            The corresponding text is:
            <em>"Say the following sentence at my speed first, then say it again very slowly:
                'Artificial intelligence is changing the world in many ways.'" </em> 🧠
            <small>(Note: the audio plays at 1.5x the normal speed.)</small>

            <span class="section-title"><i class="fas fa-star"></i> Model Performance</span>

            <div class="audio-container">
                <div class="audio-item">
                    <span>ChatGPT-4o:</span>
                    <audio controls>
                        <source src="/static/audio/sample/4o_audio.wav" type="audio/wav">
                    </audio>
                </div>
                <p style="margin: 0; text-align: left;">
                    🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
                </p>
                <p style="margin: 0; text-align: left;">
                    🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
                    missing
                    information.
                </p>
                <br>

                <div class="audio-item">
                    <span>FunAudioLLM:</span>
                    <audio controls>
                        <source src="/static/audio/sample/FunAudio_audio.wav" type="audio/wav">
                    </audio>
                </div>
                <p style="margin: 0; text-align: left;">
                    🎙️ <strong>Speech:</strong> Partially followed the instruction on speed.
                </p>
                <p style="margin: 0; text-align: left;">
                    🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or
                    missing
                    information.
                </p>
                <br>

                <div class="audio-item">
                    <span>SpeechGPT:</span>
                    <audio controls>
                        <source src="/static/audio/sample/SpeechGPT.wav" type="audio/wav">
                    </audio>
                </div>
                <p style="margin: 0; text-align: left;">
                    🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
                </p>
                <p style="margin: 0; text-align: left;">
                    🧾 <strong>Semantics:</strong> Partially followed the instruction, with minor semantic deviation and
                    missing information.
                </p>

                <br>

                <div class="audio-item">
                    <span>Mini-Omni:</span>
                    <audio controls>
                        <source src="/static/audio/sample/mini-omni.wav" type="audio/wav">
                    </audio>
                </div>
                <p style="margin: 0; text-align: left;">
                    🎙️ <strong>Speech:</strong> Did not follow the instruction on speed.
                </p>
                <p style="margin: 0; text-align: left;">
                    🧾 <strong>Semantics:</strong> Did not follow the instruction, with significant semantic deviation
                    and missing information.
                </p>

            </div>

            <p class="text-start">
                After making your choice, you'll proceed to the next round. 🔄
            </p>
            <p class="text-start">
            <strong>Click the button below to start the evaluation! 🚀</strong>
            </p>
        </div>

        <div class="text-center">
            <a href="http://71.132.14.167:6002/" target="_blank" class="btn btn-primary"><i class="fas fa-play"></i>
                Start Evaluation</a>
        </div>
    </div>
</body>

</html>