Donkey Small

DonkeySmall

AI & ML interests

None yet

Recent Activity

liked a Space about 2 months ago

ZhengPeng7/BiRefNet_demo

reacted to lorraine2's post with 👍 about 2 months ago

New NVIDIA paper: ⚡ Multi-student Diffusion Distillation for Better One-step Generators ⚡ Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators. Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/ Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/

reacted to rwightman's post with 🚀 2 months ago

New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models. There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180. * AdamW for 250 epochs - https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k * Adopt for 180 epochs - https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k * AdamW for 180 epochs - https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

View all activity

Organizations

DonkeySmall's activity

liked a Space about 2 months ago

Running on Zero

173

🐠

BiRefNet Demo

reacted to lorraine2's post with 👍 about 2 months ago

Post

1210

New NVIDIA paper: ⚡ Multi-student Diffusion Distillation for Better One-step Generators ⚡

Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators.

Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/

Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas

Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/

1 reply

reacted to rwightman's post with 🚀 2 months ago

Post

1595

New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.