1 15 3

Baifeng Shi

bfshi

https://bfshi.github.io

AI & ML interests

computer vision

Recent Activity

liked a model 7 days ago

Efficient-Large-Model/NVILA-15B

liked a model 7 days ago

Efficient-Large-Model/NVILA-8B

upvoted a collection 7 days ago

NVILA

View all activity

Organizations

bfshi's activity

liked 2 models 7 days ago

Efficient-Large-Model/NVILA-15B

Updated 22 days ago • 14.3k • 1

Efficient-Large-Model/NVILA-8B

Updated 16 days ago • 1.07k • 3

upvoted a collection 7 days ago

NVILA

Collection

7 items • Updated 7 days ago • 3

liked a Space 10 days ago

Running

🏆

VILA

VILA Playground.

authored a paper 17 days ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published 21 days ago • 54

upvoted a paper 20 days ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published 21 days ago • 54

upvoted a paper about 2 months ago

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29 • 10

upvoted a paper 2 months ago

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21 • 65

upvoted a paper 3 months ago

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Paper • 2410.01680 • Published Oct 2 • 32

upvoted 3 papers 4 months ago

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23 • 25

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 124

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

upvoted 3 papers 5 months ago

upvoted a paper 6 months ago

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13 • 36

updated 3 models 8 months ago

bfshi/mm_projector

Updated May 3

bfshi/llava-v1.5-13b-s2-lora

Text Generation • Updated May 3 • 10 • 2

bfshi/llava-v1.5-7b-s2-lora

Text Generation • Updated May 3 • 71 • 1

authored a paper 9 months ago

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19 • 25