Apply for community grant: Academic project (gpu)

#1
by ASLP-lab - opened

We have conducted the following work: OSUM (Open Speech Understanding Model), a novel approach to developing Speech Understanding Language Models (SULMs) with limited academic resources. OSUM aims to bridge the gap between industry-scale models and academic research by providing a transparent and efficient framework for training SULMs. It utilizes tens of thousands of hours of multi-task data and employs a multi-stage training process to develop a model capable of multi-level audio understanding. This model lays the foundation for the eventual realization of comprehensive audio understanding capabilities.

We hope to have a public demo page where people can intuitively experience the performance of OSUM. Its inference requires approximately 18GB of GPU memory, as its base model is a 7B-parameter LLM. For more detailed information, please refer to the following paper: https://arxiv.org/abs/2501.13306.

Another contribution of this project is the comprehensive open-sourcing of training and inference code to facilitate further development in this field. The official code repository can be found at: https://github.com/ASLP-lab/OSUM.

Hi @ASLP-lab , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment