7 5 8

Shizhe Diao

shizhediao

https://shizhediao.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

Hymba: A Hybrid-head Architecture for Small Language Models

View all activity

Organizations

shizhediao's activity

authored a paper about 1 month ago

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20 • 39

authored a paper 3 months ago

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Paper • 2410.03290 • Published Oct 4 • 6

updated a dataset 3 months ago

Post-training-Data-Flywheel/function-calling-1.0

Updated Sep 20 • 39

updated a collection 4 months ago

flywheel

Collection

2 items • Updated Aug 29

updated a Space 4 months ago

Running

📊

README

upvoted a paper 4 months ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 57

updated a model 5 months ago

shizhediao/hf-lora

Updated Aug 4

liked a Space 5 months ago

Configuration error

🏃

Berkeley Function Calling Leaderboard

liked a model 5 months ago

nvidia/Minitron-4B-Base

Updated Aug 22 • 54 • 127

upvoted a paper 5 months ago

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19 • 38

upvoted an article 5 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 292

liked a Space 5 months ago

Running

💻

Merging Competition

upvoted a paper 6 months ago

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Paper • 2407.03203 • Published Jul 3 • 11

New activity in shizhediao/lmflow-sft 6 months ago

Dataset Viewer issue: JobManagerCrashedError

#1 opened 6 months ago by

shizhediao

liked a dataset 6 months ago

sablo/oasst2_curated

Viewer • Updated Jan 12 • 4.94k • 63 • 15

authored 3 papers 6 months ago

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Paper • 2306.12420 • Published Jun 21, 2023 • 2

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Paper • 2304.06767 • Published Apr 13, 2023 • 2

DetGPT: Detect What You Need via Reasoning

Paper • 2305.14167 • Published May 23, 2023