File size: 12,679 Bytes
6f25f68 66fa322 6f25f68 66fa322 6f25f68 66fa322 6f25f68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Anachrovox</title>
<!-- Cloud assets -->
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,100;1,300;1,400;1,500;1,700;1,900&family=Meie+Script&family=Lekton:ital,wght@0,400&display=swap" rel="stylesheet">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/taproot.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/hey-buddy.min.js"></script>
<!-- Local assets -->
<link rel="stylesheet" href="index.css">
</head>
<body>
<main>
<section id="transcription">
<div id="bezel"></div>
<div class="reflection dodge"></div>
<div class="reflection"></div>
<div id="content">
<div id="welcome">Welcome to Anachrovox. Speak your command to 'Vox' or type it in below.</div>
<div id="history"></div>
<div id="input">
<textarea id="prompt" rows=3></textarea>
</div>
</div>
</section>
<section id="waveform">
<div class="circle-reflection dodge"></div>
<div class="circle-reflection"></div>
<div class="oscilloscope-grid"></div>
<canvas width=256 height=256></canvas>
<div class="circle-shading"></div>
</section>
<section id="voice">
<input
id="voice-id"
class="solari-board"
type="text"
value=""
maxlength="7"
/>
<input
id="voice-id-wheel",
class="wheel"
type="button"
/>
<input
id="volume"
class="dial"
type="number"
min="0"
max="1"
step="0.01"
value="1.0"
data-angle-start="-45"
data-angle-end="45"
data-labels='{"0":"Off", "1":"Max"}'
/>
<input
id="speed"
class="dial"
type="number"
min="0.75"
max="1.75"
step="0.01"
value="1.25"
data-angle-start="-90"
data-angle-end="90"
data-labels='{"0.75":"Slw", "1.25":"Nrm", "1.75":"Fst"}'
/>
</section>
<section id="brain">
<input
id="top-k-display"
type="number"
class="seven-segment"
min=1
max=100
value=40
/>
<input
id="top-k"
class="dial decoupled"
type="number"
min=1
max=100
step=1
value=40
/>
<input
id="min-p"
class="slider"
type="number"
min=0
max=1
step=0.01
value=0.5
data-labels=6
/>
<input
id="top-p"
class="slider"
type="number"
min=0
max=1
step=0.01
value=0.95
data-labels=6
/>
<input
id="temperature"
class="slider"
type="number"
min=0
max=1
step=0.01
value=0.20
data-labels=6
/>
</section>
<section id="microphone">
<div id="listen-button">
<button id="listen"></button>
<label>Call</label>
</div>
<div id="speaker"></div>
<div id="power-switch">
<input
id="power-switch-input"
type="checkbox"
class="power"
checked
/>
</div>
<div class="bulb-indicator" id="recording"><label>RECORD</label></div>
<div class="bulb-indicator" id="listening"><label>LISTEN</label></div>
<div class="bulb-indicator" id="power"><label>POWER</label></div>
</section>
<section id="logo">
Anachrovox
</section>
<section id="credits">
Released under the <a href="https://www.apache.org/licenses/LICENSE-2.0.txt" target="_blank">Apache License v2.0</a> in 2024 by <a href="mailto:[email protected]">Benjamin Paine</a>.
<div class="smaller">Anachrovox can make mistakes. Check all outputs.</div>
</section>
</main>
<aside id="info">
<button id="info-button">i</button>
<div id="info-content">
<h1 id="info-logo">Anachrovox</h1>
<h2 id="info-version">Alpha Release v0.1.3</h2>
<h3>Instructions</h3>
<p>To issue a voice command in a hands-free fashion, start your command with one of the supported wake phrases, then issue your command naturally (i.e. you do not need to pause.) There are many supported wake phrases but they all include 'Vox' - for example, 'Hey Vox', 'Hi Vox', or just 'Vox'.</p>
<p>There are numerous ways to shortcut the speech-to-speech workflow:</p>
<ul>
<li>Turning the volume all the way down will disable the text-to-speech step, effectively creating a text-only mode.</li>
<li>Directly type your query into the text box and press enter to issue commands without needing to speak.</li>
<li>Press and hold the call button to issue a voice command without needing to use a wake phrase.</li>
</ul>
<h3>About</h3>
<p>Anachrovox is a real-time voice assistant that combines two of my other projects:</p>
<ol>
<li><a href="https://github.com/painebenjamin/taproot" href="_blank">Taproot</a>, a scalable and lightning-fast task-centric backend inference engine, enabling easy installation and deployment of models for speech recognition, natural language understanding, and text-to-speech synthesis.</li>
<li><a href="https://github.com/painebenjamin/hey-buddy" href="_blank">Hey Buddy</a>, a real-time in-browser audio wake-word detector and training library to listen for wake phrases to trigger the assistant, enabling hands-free, always-on voice interaction without the need to use the backend until specifically requested.</li>
</ol>
<p>This is an <strong>alpha</strong> release of both Anachrovox and the underlying libraries, so bugs in all aspects of operation are expected. As it uses a large language model at it's heart, you should take the same precautions as you would with any other language model, such as not using it for sensitive tasks and not trusting it's output as fact without verification.</p>
<h3>High Availability</h3>
<p>In addition to Taproot's low-overhead design, it is designed to be clustered and highly available. This means that you can run multiple instances of Anachrovox and they will automatically load-balance and fail-over between each other, ensuring that the assistant is always available and responsive. You will need to perform some networking and configuration to enable this, see the GitHub repository for more information. If you are using Anachrovox on one of the official HuggingFace spaces, this is happening automatically between them.</p>
<h3>Features</h3>
<p>The main feature of Anachrovox is real-time, hands-free speech-to-speech large language model interaction. This is achieved through the following components, all of which are open-source and available for use in your own projects:</p>
<ol>
<li>Wake-word detection using <a href="https://github.com/painebenjamin/hey-buddy" target="_blank">Hey Buddy</a></li>
<li>Speech-to-text using <a href="https://huggingface.co/distil-whisper/distil-large-v3" target="_blank">Distil-Whisper Large</a></li>
<li>Text generation using <a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama 3.2 3B Instruct</a></li>
<li>Text-to-speech using <a href="https://huggingface.co/hexgrad/Kokoro-82M" target="_blank">Kokoro</a></li>
<li>Audio enhancement using <a href="https://github.com/Rikorose/DeepFilterNet" target="_blank">DeepFilterNet</a></li>
</ol>
<p>All backend models are ran through Taproot, which is made to be as low-overhead as possible, allowing for real-time operation on consumer hardware. These are just a small selection of the supported model set, but were chosen for their balance of speed, size and capability. Visit <a href="https://github.com/painebenjamin/anachrovox" target="_blank">the Anachrovox GitHub</a> to see how to build with different supported components and/or visit <a href="https://github.com/painebenjamin/taproot" target="_blank">the Taproot GitHub</a> to request a new supported component or learn how to build your own (and hopefully contribute it back to the community!)</p>
<p>To improve the usefulness of the assistant, the following tools are available to use. These are invoked conversationally, either when you ask directly or sometimes when the assistant thinks it can help.</p>
<ul>
<li>DuckDuckGo news headlines, blurb search, and deep-dive.</li>
<li>Wikipedia search</li>
<li>Fandom (formerly Wikia) search</li>
<li>Date and time by timezone or location</li>
<li>Current and forecasted weather by location</li>
</ul>
<h3>Citations</h3>
<cite>@misc{gandhi2023distilwhisper,
title = {Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling},
author = {Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
year = {2023},
eprint = {2311.00430},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}</cite>
<cite>@misc{dubey2024llama3herdmodels,
title = {The Llama 3 Herd of Models},
author = {Llama Team, AI @ Meta},
year = {2024}
eprint = {2407.21783},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2407.21783}
}</cite>
<cite>@misc{kokoro82m,
title = {Kokoro-82M}
author = {@rzvzn}
year = {2024}
url = {https://huggingface.co/hexgrad/Kokoro-82M}
}</cite>
<cite>@inproceedings{schroeter2023deepfilternet3,
title = {{DeepFilterNet}: Perceptually Motivated Real-Time Speech Enhancement},
author = {Schröter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
booktitle = {INTERSPEECH},
year = {2023},
}</cite>
</div>
</aside>
<a id="github" href="https://github.com/painebenjamin/anachrovox" target="_blank" title="GitHub"><img src="static/github.svg" alt="GitHub"></a>
</body>
<script type="module" src="inputs.js"></script>
<script type="module" src="index.js"></script>
</html>
|