7 342 606

xziayro

xziayro

AI & ML interests

None yet

Recent Activity

upvoted a paper about 7 hours ago

Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

upvoted a paper about 7 hours ago

TPDiff: Temporal Pyramid Video Diffusion Model

liked a model about 24 hours ago

Beckham808/LightGen

View all activity

Organizations

None yet

xziayro's activity

upvoted 2 papers about 7 hours ago

Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Paper • 2503.09419 • Published about 24 hours ago • 2

TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published about 20 hours ago • 29

liked a model about 24 hours ago

Beckham808/LightGen

Text-to-Image • Updated about 7 hours ago • 3

upvoted 3 papers 1 day ago

upvoted a paper 2 days ago

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published 3 days ago • 21

upvoted 2 papers 3 days ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 6 days ago • 103

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Paper • 2503.05638 • Published 6 days ago • 16

reacted to freddyaboulton's post with 🚀 15 days ago

Post

3150

Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc

upvoted 4 papers 15 days ago

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper • 2502.18461 • Published 16 days ago • 15

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published 16 days ago • 34

KV-Edit: Training-Free Image Editing for Precise Background Preservation

Paper • 2502.17363 • Published 17 days ago • 33

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published 16 days ago • 53

liked a model 17 days ago

ZhengPeng7/BiRefNet_HR

Image Segmentation • Updated 17 days ago • 28.7k • 62

liked a model 18 days ago

fffiloni/deep-blue-v2

Text-to-Image • Updated 20 days ago • 549 • • 8

upvoted a paper 21 days ago

Craw4LLM: Efficient Web Crawling for LLM Pretraining

Paper • 2502.13347 • Published 23 days ago • 27

liked a model 22 days ago

fffiloni/sweet-brush

Text-to-Image • Updated 17 days ago • 775 • • 5

liked a dataset 22 days ago

google/smol

Viewer • Updated 10 days ago • 811k • 5.09k • 44

reacted to sayakpaul's post with 🔥 23 days ago

Post

3083

Inference-time scaling meets Flux.1-Dev (and others) 🔥

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" 🤗

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ 🤗