Infrastructure - a zzfive Collection

Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

zzfive 's Collections

3d

image

LLMs

video

agent

cv

audio

robot

Infrastructure

updated about 2 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Paper • 2405.14477 • Published May 23 • 16
Thermodynamic Natural Gradient Descent

Paper • 2405.13817 • Published May 22 • 13
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Paper • 2405.14866 • Published May 23 • 5
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27 • 51
2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28 • 23
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

Paper • 2405.17991 • Published May 28 • 11
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 30
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

Paper • 2406.00392 • Published Jun 1 • 12
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 36
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Paper • 2406.02886 • Published Jun 5 • 7
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5 • 10
GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6 • 19
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7 • 26
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Paper • 2406.04391 • Published Jun 6 • 6
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Paper • 2406.04594 • Published Jun 7 • 4
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6 • 53
DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published Jun 12 • 22
Interpreting the Weight Space of Customized Diffusion Models

Paper • 2406.09413 • Published Jun 13 • 18
Cognitively Inspired Energy-Based World Models

Paper • 2406.08862 • Published Jun 13 • 9
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

Paper • 2406.09297 • Published Jun 13 • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression

Paper • 2406.11430 • Published Jun 17 • 23
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20 • 32
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Paper • 2406.14563 • Published Jun 20 • 29
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 36
Efficient World Models with Context-Aware Tokenization

Paper • 2406.19320 • Published Jun 27 • 7
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1 • 76
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Paper • 2407.00468 • Published Jun 29 • 35
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Paper • 2407.01791 • Published Jul 1 • 5
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Paper • 2407.02687 • Published Jul 2 • 22
Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Paper • 2407.04620 • Published Jul 5 • 27
HEMM: Holistic Evaluation of Multimodal Foundation Models

Paper • 2407.03418 • Published Jul 3 • 8
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5 • 52
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8 • 20
An accurate detection is not all you need to combat label noise in web-noisy datasets

Paper • 2407.05528 • Published Jul 8 • 3
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging

Paper • 2407.07315 • Published Jul 10 • 6
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Paper • 2407.07523 • Published Jul 10 • 4
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16 • 43
Efficient Training with Denoised Neural Weights

Paper • 2407.11966 • Published Jul 16 • 8
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

Paper • 2407.11239 • Published Jul 15 • 7
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8
NNsight and NDIF: Democratizing Access to Foundation Model Internals

Paper • 2407.14561 • Published Jul 18 • 34
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Paper • 2407.15754 • Published Jul 22 • 19
KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 40
SIGMA: Sinkhorn-Guided Masked Video Modeling

Paper • 2407.15447 • Published Jul 22 • 6
PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24 • 17
Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 16
VSSD: Vision Mamba with Non-Casual State Space Duality

Paper • 2407.18559 • Published Jul 26 • 17
Diffusion Feedback Helps CLIP See Better

Paper • 2407.20171 • Published Jul 29 • 34
Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31 • 13
POA: Pre-training Once for Models of All Sizes

Paper • 2408.01031 • Published Aug 2 • 26
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

Paper • 2408.01050 • Published Aug 2 • 8
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9 • 22
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Paper • 2408.06327 • Published Aug 12 • 13
Heavy Labels Out! Dataset Distillation with Label Space Lightening

Paper • 2408.08201 • Published Aug 15 • 17
Towards flexible perception with visual memory

Paper • 2408.08172 • Published Aug 15 • 19
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23 • 25
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Paper • 2408.14468 • Published Aug 26 • 33
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 41
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Paper • 2409.06633 • Published Sep 10 • 14
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16 • 34
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published 29 days ago • 46
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published 29 days ago • 23
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published 28 days ago • 35
Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published 24 days ago • 29
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Paper • 2409.16040 • Published 24 days ago • 10
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published 23 days ago • 25
Differential Transformer

Paper • 2410.05258 • Published 11 days ago • 152
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published 17 days ago • 131
Selective Attention Improves Transformer

Paper • 2410.02703 • Published 15 days ago • 22
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond

Paper • 2410.02362 • Published 15 days ago • 16
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published 14 days ago • 3
MLP-KAN: Unifying Deep Representation and Function Learning

Paper • 2410.03027 • Published 14 days ago • 28
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Paper • 2410.05363 • Published 11 days ago • 43
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Paper • 2410.06373 • Published 10 days ago • 33
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paper • 2410.07170 • Published 9 days ago • 15
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published 15 days ago • 45
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published 15 days ago • 29
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

Paper • 2409.19291 • Published 20 days ago • 18
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published 11 days ago • 29
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper • 2410.09732 • Published 5 days ago • 53
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published 4 days ago • 39
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published about 17 hours ago • 37
Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published about 17 hours ago • 16

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs