File size: 2,250 Bytes
aa7cb02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Recording and Translation</title>
    <link rel="stylesheet" href="styles.css">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">
    <link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;500;700&display=swap">
</head>
<body>
    <div class="container">
        <header>
            <h1>Seamless Speech-to-Speech Translation with Voice Replication (S3TVR)</h1>
            <p class="description">S3TVR is an advanced AI cascaded framework designed for real-time speech-to-speech translation while maintaining the speaker's voice characteristics in a zero-shot fashion. This project balances latency and output quality, focusing on English and Spanish languages, and involves multiple open-source models and algorithms. The system is optimized for local execution, allowing for dynamic and efficient voice translation with an average latency of ~3 seconds per sentence. For the optimized model, check the Github Repo bellow.</p>
            <p class="description">NOTE: The local excution is streamed and fully optimized(unlike this Demo)</p>   
            <div class="links">
                <a href="https://github.com/yalsaffar/S3TVR" target="_blank"><i class="fab fa-github"></i></a>
                <a href="https://yousifalsaffar.com/" target="_blank"><i class="fas fa-globe"></i></a>
                <a href="https://www.linkedin.com/in/yousif-alsaffar-7621b5142/" target="_blank"><i class="fab fa-linkedin"></i></a>
                <a href="https://huggingface.co/yalsaffar" target="_blank"><i class="fas fa-robot"></i></a>
            </div>
        </header>
        <div class="circle-button" id="record">
            <i class="fas fa-microphone"></i>
        </div>
        <p id="label">Press and Hold till the sentence is not RED</p>
        <p id="status"> </p>
        <div id="transcription" class="text-output"></div>
        <div id="translation" class="text-output"></div>
        <audio id="audio" controls></audio>
    </div>
    <script src="app.js"></script>
</body>
</html>