lf
lfnothing
AI & ML interests
None yet
Recent Activity
reacted
to
KaiChen1998's
post
with š
16 days ago
š¢ Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
š¤ EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
āØ EMOVA Highlights
ā
State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
ā
Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
ā
Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
š„ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: https://huggingface.co/spaces/Emova-ollm/EMOVA-demo
liked
a dataset
5 months ago
Skylion007/openwebtext
liked
a model
5 months ago
Salesforce/blip2-opt-2.7b
Organizations
None yet
Collections
1
models
7
lfnothing/kapai-man
Text-to-Image
ā¢
Updated
ā¢
4
lfnothing/audio-diffusion-electronic
Updated
ā¢
3
lfnothing/sd-class-butterflies-32
Unconditional Image Generation
ā¢
Updated
ā¢
5
lfnothing/whisper-small-dv
Automatic Speech Recognition
ā¢
Updated
ā¢
12
lfnothing/opt-125m-gptq
Text Generation
ā¢
Updated
ā¢
5
lfnothing/distilbert-base-uncased-finetuned-imdb
Updated
lfnothing/code-search-net-tokenizer
Updated