NTU Speech Processing & Machine Learning Lab

university

https://twitter.com/ntu_spml

Activity Feed Request to join this org

AI & ML interests

Speech Processing, Self-Supervised Learning, ASR, TTS, Voice Conversion, Spoken Question Answering

Recent Activity

dcml0714 authored a paper 27 days ago

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

dcml0714 authored a paper 27 days ago

STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models

kehanlu authored a paper about 1 month ago

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

View all activity

dcml0714

authored 2 papers 27 days ago

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Paper • 2507.02768 • Published Jul 3 • 3

STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models

Paper • 2507.15375 • Published 28 days ago • 25

kehanlu

authored 2 papers about 1 month ago

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Paper • 2507.02768 • Published Jul 3 • 3

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

Paper • 2505.17496 • Published May 23

vectominist

authored 3 papers about 2 months ago

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Paper • 2110.01900 • Published Oct 5, 2021

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Paper • 2210.00705 • Published Oct 3, 2022

USAD: Universal Speech and Audio Representation via Distillation

Paper • 2506.18843 • Published Jun 23 • 11

andybi7676

authored a paper 2 months ago

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Paper • 2506.11130 • Published Jun 10 • 6

dcml0714

authored a paper 2 months ago

Audio-Aware Large Language Models as Judges for Speaking Styles

Paper • 2506.05984 • Published Jun 6 • 15

kehanlu

authored 4 papers 3 months ago

Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Paper • 2401.00273 • Published Dec 30, 2023

A context-aware knowledge transferring strategy for CTC-based ASR

Paper • 2210.06244 • Published Oct 12, 2022

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Paper • 2409.20007 • Published Sep 30, 2024 • 1

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 1

Splend1dchan

authored 7 papers 3 months ago

Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results

Paper • 2303.04715 • Published Mar 8, 2023

Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite

Paper • 2309.08448 • Published Sep 15, 2023

Breeze-7B Technical Report

Paper • 2403.02712 • Published Mar 5, 2024

Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition

Paper • 2405.14259 • Published May 23, 2024 • 2

Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation

Paper • 2412.01130 • Published Dec 2, 2024 • 1

The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities

Paper • 2501.13921 • Published Jan 23 • 3

BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights

Paper • 2501.17790 • Published Jan 29 • 3

AI & ML interests

Recent Activity

Team members 9

ntu-spml's activity