First project of 2025: Vision Transformer Explorer
I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! π€―
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! π Faster and more accurate than Whisper π Privacy-focused (no data leaves your device) β‘οΈ WebGPU accelerated (w/ WASM fallback) π₯ Powered by ONNX Runtime Web and Transformers.js
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! π₯ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. π€ Try it out yourself!
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! π€― Let's take a look: π Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text) ποΈ Qwen2-VL from Qwen for dynamic-resolution image understanding π’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings π LLaVA-OneVision from ByteDance for Image-Text-to-Text generation π€ΈββοΈ ViTPose for pose estimation π MGP-STR for optical character recognition (OCR) π PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! π₯ Huge for privacy!
Have you tried out π€ Transformers.js v3? Here are the new features: β‘ WebGPU support (up to 100x faster than WASM) π’ New quantization formats (dtypes) π 120 supported architectures in total π 25 new example projects and templates π€ Over 1200 pre-converted models π Node.js (ESM + CJS), Deno, and Bun compatibility π‘ A new home on GitHub and NPM