Yifan Peng's picture

Yifan Peng

pyf98

·

https://pyf98.github.io

pyf98

AI & ML interests

Multimodal LLMs, Speech-to-Speech, Speech Recognition

Recent Activity

liked a dataset 5 days ago

nvidia/Llama-Nemotron-Post-Training-Dataset

new activity 15 days ago

nvidia/Nemotron-H-8B-Reasoning-128K:Errors in HybridMambaAttentionDynamicCache

upvoted an article about 2 months ago

Gotchas in Tokenizer Behavior Every Developer Should Know

View all activity

Organizations

upvoted an article about 2 months ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

By

•

Apr 18

• 40

upvoted a collection about 2 months ago

OLMo 2

Artifacts for the OLMo 2 release. • 35 items • Updated May 1 • 138

upvoted a paper 3 months ago

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published May 31 • 10

upvoted 2 collections 6 months ago

OWSM-CTC: Ultra-Fast Speech Foundation Models

CTC-based models from the OWSM project, designed for fast non-autoregressive inference: https://www.wavlab.org/activities/2024/owsm/ • 2 items • Updated Mar 8 • 1

OWSM: Fully Open Speech Recognition and Translation Models

A collection of models related to the Open Whisper-style Speech Models (OWSM) project from CMU: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated Mar 8 • 2

upvoted a paper 6 months ago

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Paper • 2210.00077 • Published Sep 30, 2022 • 2

upvoted a collection 12 months ago

Open Whisper-style Speech Models (OWSM)

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated Jun 3 • 6

upvoted a collection about 1 year ago

Magpie-Llama3.1 Datasets

Dataset built with Meta Llama 3.1 70B. • 6 items • Updated Jan 13 • 4

upvoted 2 papers over 1 year ago

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Paper • 2402.12654 • Published Feb 20, 2024 • 1

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Paper • 2401.16658 • Published Jan 30, 2024 • 14