sd-diffusers-pipelines-library (Diffusers Pipelines Library for Stable Diffusion)

posted an update about 1 month ago

Post

1277

Introducing Virtual Try-Off (VTOFF), a novel task focused on generating standardized garment images from single photos of clothed individuals. Unlike traditional Virtual Try-On (VTON), which digitally dresses models, VTOFF aims to extract a canonical garment image, posing unique challenges in capturing garment shape, texture, and intricate patterns.

Try it out: rizavelioglu/tryoffdiff
Paper: TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models (2411.18350)
Project page: https://rizavelioglu.github.io/tryoffdiff

1 reply

·

rizavelioglu

authored a paper about 1 month ago

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 23

annadeichler

authored 2 papers about 2 months ago

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

Paper • 2404.19622 • Published Apr 30, 2024 • 2

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Paper • 2410.00253 • Published Sep 30, 2024

jasonren

authored a paper 3 months ago

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published Sep 27, 2024 • 26

shunk031

authored a paper 3 months ago

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model

Paper • 2409.16689 • Published Sep 25, 2024 • 1

AgainstEntropy

authored a paper 5 months ago

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 42

susunghong

authored a paper 5 months ago

Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Paper • 2408.00760 • Published Aug 1, 2024 • 6

Royir

authored a paper 6 months ago

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Paper • 2407.19474 • Published Jul 28, 2024 • 23

annadeichler

authored a paper 6 months ago

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

Paper • 2309.05455 • Published Sep 11, 2023

Royir

authored a paper 7 months ago

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14, 2024 • 77

A-suozhang

authored a paper 7 months ago

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Paper • 2406.02540 • Published Jun 4, 2024 • 2

A-suozhang

authored 2 papers 8 months ago

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Paper • 2307.08209 • Published Jul 17, 2023 • 1

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Paper • 2405.17873 • Published May 28, 2024 • 2

pcuenq

posted an update 9 months ago

Post

4786

OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!

4 replies

·

smangrul

posted an update 9 months ago

Post

3355

Unlocking the Power of locally running Llama-3 8B Model Agents with Chat-UI! 🔥🚀✨

I'm thrilled to share my hackathon-style side project:
1. Finetuning Llama-8B for function calling using PEFT QLoRA as the instruct Llama-3 model doesn't support this. The colab notebook for it is here: https://lnkd.in/ggJMzqh2. 🛠️
2. Finetuned model along with the 4-bit quants here: https://lnkd.in/gNpFKY6V ✨
3. Clone Hugging Face https://lnkd.in/gKBKuUBQ and make it compatible for function calling by building upon the PR https://lnkd.in/gnqFuAd4 for my model and local inferencing usecase using Ollama. This was a steep learning curve wherein I stayed awake the whole night to get it working. 💪🏽
4. Above, I used SerpAPI for web browsing and Mongo DB Atlas free tier for persistence of conversations and assistant configs. 🔎
5. More work is required to switch between using tools and responding directly wherein I see the model breaks. 🧐

How cool is this wherein we are approaching experience akin to ChatGPT while using local hosted agent model running on your laptop! 💻

1 reply

·

rizavelioglu

authored a paper 9 months ago

FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation

Paper • 2404.08582 • Published Apr 12, 2024 • 1

annaebair

authored a paper 9 months ago

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Paper • 2306.14306 • Published Jun 25, 2023

smangrul

posted an update 10 months ago

Post

2968

🤗 PEFT v0.10.0 release! 🔥🚀✨

Some highli📝ghts:
1. FSDP+QLoRA and DeepSpeed Stage-3+QLoRA
2. Layer expansion + LoRA
3. DoRA support for Conv2D layers and quantized bitsandbytes layers
4. New LoftQ utility
5. Batched inference for mixed LoRA adapters.

http://Answer.AI team in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost https://lnkd.in/g6jgfXyv. This is now integrated into Hugging Face ecosystem.

For an end-to-end example on FSDP+QLoRA, please refer https://lnkd.in/gT3yY-Rx.

For an end-to-end example on DeepSpeed Stage-3+QLoRA, please refer https://lnkd.in/gkt-xZRE.

With the PR https://lnkd.in/g5F348MN these changes are now upstreamed in https://lnkd.in/g5_MxYtY thanks to Wing Lian ! 🚀

Kudos to http://Answer.AI team, Titus von Köller , Younes Belkada, Benjamin Bossan and Zachary Mueller for all the help without which this couldn't have been possible. 🤗

For efficient depthwise layer expansion akin to passthrough method of mergekit but without using additional memory and attaching LoRAs to it, refer to the details below! 🔥https://lnkd.in/ge95ztjA

Now DoRA is supported for Conv2D layers as well as bitsandbytes quantized layers ✨. For more details, please refer the below thread.
https://lnkd.in/gsJbuWPD

Now you can mix different LoRA adapters in a batch during inference which speeds-up the inference by avoiding computation of base model multiple times which would be the case for adaptive inference with batch_size=1! ⚡️.
Details below. https://lnkd.in/gD-pcX_B

LoftQ reduces quantization error by appropriately initializing the LoRA adapter weights. Normally, this is a two-step process. Benjamin Bossan
added new util replace_lora_weights_loftq for LoftQ to use it on the fly with bnb.

For more details, refer to the release notes. 📝
https://lnkd.in/gg7-AmHA. As always, make sure losses go down and be happy to watch your model train!

1 reply

·

smangrul

posted an update 10 months ago

Post

🚨 Now you can run Starcoder- 2 models locally on your Mac M1 Pro Apple Silicon with 16GB memory! 🧑🏽‍💻 ⚡️✨

Below is the UX with Twinny extension using bigcode/starcoder2-3b for FIM and codellama/CodeLlama-7b-Instruct-hf for chat. Dev tools is showing the prompt being sent to ollama server.

Starcoder-2 is now supported in llama.cpp https://github.com/ggerganov/llama.cpp/pull/5795!

cd llama.cpp
python convert-hf-to-gguf.py ../starcoder2-3b/ --outfile models/starcoder2-3b.gguf --outtype "f16"
./quantize models/starcoder2-3b.gguf models/starcoder2-3b-Q4_K_M.gguf Q4_K_M

For more details, please go through the following tweet thread: https://x.com/sourab_m/status/1764583139798823235?s=20

Diffusers Pipelines Library for Stable Diffusion

AI & ML interests

Recent Activity

sd-diffusers-pipelines-library's activity

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

AI & ML interests

Recent Activity

Team members 134

sd-diffusers-pipelines-library's activity