An open source real-time AI inference engine for seamless scaling

About

Taproot is a seamlessly scalable AI/ML inference engine designed for deployment across hardware clusters with disparate capabilities.

Why Taproot?

Most AI/ML inference engines are built for either large-scale cloud infrastructures or constrained edge devices - Taproot is designed for medium-scale deployments, offering flexible and distributed on-premise or PAYG setups. It efficiently uses older or consumer-grade hardware, making it suitable for small networks or ad-hoc clusters, without relying on centralized, hyperscale architectures.

Available Models

There are more than 150 models available across 18 task categories. See the Task Catalog for the complete list, licenses, requirements and citations. Despite the large number of models available, there are many more yet to be added - if you're looking for a particular enhancement, don't hesitate to make an issue on this repository to request it.

Roadmap

  1. IP Adapter Models for Diffusers Image Generation Pipelines
  2. ControlNet Models for Diffusers Image Generation Pipelines
  3. Additional quantization backends for large models
    • Currently BitsandBytes (Int8/NF4) and GGUF (through llama.cpp) are supported with pre-quantized checkpoints available.
    • FP8 support through Optimum-Quanto, TorchAO and custom kernels is in development.
  4. Improved multi-GPU support
    • This is currently supported through manual configuration, but usability can be improved.
  5. Additional annotators/detectors for image and video
    • E.g. Marigold, SAM2
  6. Additional audio generation models
    • E.g. Stable Audio, AudioLDM, MusicGen

Installation

pip install taproot

Some additional packages are available to install with the square-bracket syntax (e.g. pip install taproot[a,b,c]), these are:

  • tools - Additional packages for LLM tools like DuckDuckGo Search, BeautifulSoup (for web scraping), etc.
  • console - Additional packages for prettifying console output.
  • av - Additional packages for reading and writing video.

Installing Tasks

Some tasks are available immediately, but most tasks required additional packages and files. Install these tasks with taproot install [task:model]+, e.g:

taproot install image-generation:stable-diffusion-xl

Usage

Command-Line

Introspecting Tasks

From the command line, execute taproot tasks to see all tasks and their availability status, or taproot info for individual task information. For example:

taproot info image-generation stable-diffusion-xl

Stable Diffusion XL Image Generation (image-generation:stable-diffusion-xl, available)
    Generate an image from text and/or images using a stable diffusion XL model.
Hardware Requirements:                  
    GPU Required for Optimal Performance                                           
    Floating Point Precision: half                                                 
    Minimum Memory (CPU RAM) Required: 231.71 MB     
    Minimum Memory (GPU VRAM) Required: 7.58 GB               
Author:                          
    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952                                               
License:
    OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    ✅ Attribution Required
    ✅ Derivatives Allowed
    ✅ Redistribution Allowed
    ✅ Copyleft (Share-Alike) Required
    ✅ Commercial Use Allowed
    ✅ Hosting Allowed
Files:                                                                             
    image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) [downloaded]
    image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) [downloaded]
    text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) [downloaded]
    text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) [downloaded]
    text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) [downloaded]
    text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) [downloaded]
    text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) [downloaded]
    Total File Size: 7.11 GB
Required packages:
    pil~=9.5 [installed]
    torch<2.5,>=2.4 [installed]
    numpy~=1.22 [installed]
    diffusers>=0.29 [installed]
    torchvision<0.20,>=0.19 [installed]
    transformers>=4.41 [installed]
    safetensors~=0.4 [installed]
    accelerate~=1.0 [installed]
    sentencepiece~=0.2 [installed]
    compel~=2.0 [installed]
    peft~=0.13 [installed]
Signature:
    prompt: Union[str, List[str]], required
    prompt_2: Union[str, List[str]], default: None
    negative_prompt: Union[str, List[str]], default: None
    negative_prompt_2: Union[str, List[str]], default: None
    image: ImageType, default: None
    mask_image: ImageType, default: None
    guidance_scale: float, default: 5.0
    guidance_rescale: float, default: 0.0
    num_inference_steps: int, default: 20
    num_images_per_prompt: int, default: 1
    height: int, default: None
    width: int, default: None
    timesteps: List[int], default: None
    sigmas: List[float], default: None
    denoising_end: float, default: None
    strength: float, default: None
    latents: torch.Tensor, default: None
    prompt_embeds: torch.Tensor, default: None
    negative_prompt_embeds: torch.Tensor, default: None
    pooled_prompt_embeds: torch.Tensor, default: None
    negative_pooled_prompt_embeds: torch.Tensor, default: None
    clip_skip: int, default: None
    seed: SeedType, default: None
    pag_scale: float, default: None
    pag_adaptive_scale: float, default: None
    scheduler: Literal[ddim, ddpm, ddpm_wuerstchen, deis_multistep, dpm_cogvideox, dpmsolver_multistep, dpmsolver_multistep_karras, dpmsolver_sde, dpmsolver_sde_multistep, dpmsolver_sde_multistep_karras, dpmsolver_singlestep, dpmsolver_singlestep_karras, edm_dpmsolver_multistep, edm_euler, euler_ancestral_discrete, euler_discrete, euler_discrete_karras, flow_match_euler_discrete, flow_match_heun_discrete, heun_discrete, ipndm, k_dpm_2_ancestral_discrete, k_dpm_2_ancestral_discrete_karras, k_dpm_2_discrete, k_dpm_2_discrete_karras, lcm, lms_discrete, lms_discrete_karras, pndm, tcd, unipc], default: None
    output_format: Literal[png, jpeg, float, int, latent], default: png
    output_upload: bool, default: False
    highres_fix_factor: float, default: 1.0
    highres_fix_strength: float, default: None
    spatial_prompts: SpatialPromptInputType, default: None
Returns:
    ImageResultType

Invoking Tasks

Run taproot invoke to run any task from the command line. All parameters to the task can be passed as flags to the call using kebab-case, e.g.:

taproot invoke image-generation:stable-diffusion-xl \
    --prompt "a photograph of a golden retriever at the park" \
    --negative-prompt "fall, autumn, blurry, out-of-focus" \
    --seed 12345
Loading task.
100%|███████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  2.27it/s]
Task loaded in 4.0 s.
Invoking task.
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.34it/s]
Task invoked in 6.5 s. Result:
8940aa12-66a7-4233-bfd6-f19da339b71b.png

Python

Direct Task Usage

from taproot import Task
sdxl = Task.get("image-generation", "stable-diffusion-xl")
pipeline = sdxl()
pipeline.load()
pipeline(prompt="Hello, world!").save("./output.png")

With a Remote Server

from taproot import Tap
tap = Tap()
tap.remote_address = "ws://127.0.0.1:32189"
result = tap.call("image-generation", model="stable-diffusion-xl", prompt="Hello, world!")
result.save("./output.png")

With a Local Server

Also shows asynchronous usage.

import asyncio
from taproot import Tap
with Tap.local() as tap:
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(tap("image-generation", model="stable-diffusion-xl", prompt="Hello, world!"))
    result.save("./output.png")

Running Servers

Taproot uses a three-roled cluster structure:

  1. Overseers are entry points into clusters, routing requests to one or more dispatchers.
  2. Dispatchers are machines capable of running tasks by spawning executors.
  3. Executors are servers ready to execute a task.

The simplest way to run a server is to run an overseer simultaneously with a local dispatcher like so:

taproot overseer --local

This will run on the default address of ws://127.0.0.1:32189, suitable for interaction from python or the browser.

There are many deployment possibilities across networks, with configuration available for encryption, listening addresses, and more. See the wiki for details (coming soon.)

Outside Python

  • taproot.js - for the browser and node.js, available in ESM, UMD and IIFE
  • taproot.php - coming soon

Task Catalog

18 tasks available with 171 models.

echo

NameEcho
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

image-similarity

(default)

NameTraditional Image Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

inception-v3

NameInception Image Similarity (FID)
AuthorChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna
Google Research and University College London
Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015
https://arxiv.org/abs/1512.00567
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-similarity-inception.fp16.safetensors
Minimum VRAM50.28 MB

text-similarity

NameTraditional Text Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

speech-enhancement

deep-filter-net-v3 (default)

NameDeepFilterNet V3 Speech Enhancement
AuthorHendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier
Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023
https://arxiv.org/abs/2305.08227
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesspeech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM87.89 MB

image-interpolation

film (default)

NameFrame Interpolation for Large Motion (FiLM) Image Interpolation
AuthorFitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless
Google Research and University of Washington
Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022
https://arxiv.org/abs/2202.04901
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-interpolation-film-net.fp16.pt
Minimum VRAM70.00 MB

rife

NameReal-Time Intermediate Flow Estimation (RIFE) Image Interpolation
AuthorZhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou
Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence
Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022
https://arxiv.org/abs/2011.06294
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesimage-interpolation-rife-flownet.safetensors
Minimum VRAM22.68 MB

background-removal

backgroundremover (default)

NameBackgroundRemover
AuthorJohnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis
https://github.com/nadermx/backgroundremover
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesbackground-removal-u2net.safetensors
Minimum VRAM217.62 MB

super-resolution

aura

NameAura Super Resolution
Authorfal.ai
Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024
https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura.fp16.safetensors
Minimum VRAM1.24 GB

aura-v2 (default)

NameAura Super Resolution V2
Authorfal.ai
Published in fal.ai blog, “AuraSR V2”, 2024
https://blog.fal.ai/aurasr-v2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura-v2.fp16.safetensors
Minimum VRAM1.24 GB

speech-synthesis

xtts-v2 (default)

NameXTTS2 Speech Synthesis
AuthorCoqui AI
Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023
https://coqui.ai/blog/tts/open_xtts
LicenseMozilla Public License 2.0 (https://www.mozilla.org/en-US/MPL/2.0/)
Files
  1. speech-synthesis-xtts-v2.safetensors (1.87 GB)
  2. speech-synthesis-xtts-v2-speakers.pth (7.75 MB)
  3. speech-synthesis-xtts-v2-vocab.json (361.22 KB)

Total Size: 1.88 GB

Minimum VRAM1.91 GB

f5tts

NameF5TTS Speech Synthesis
AuthorYushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen
Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024
https://arxiv.org/abs/2410.06885
LicenseCC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
Files
  1. speech-synthesis-f5tts.safetensors (1.35 GB)
  2. speech-synthesis-f5tts-vocab.txt (11.26 KB)
  3. audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB)
  4. audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B)

Total Size: 1.40 GB

Minimum VRAM3.94 GB

audio-transcription

whisper-tiny

NameWhisper Tiny Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-tiny.safetensors (151.06 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 154.92 MB

Minimum VRAM147.85 MB

whisper-base

NameWhisper Base Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-base.safetensors (290.40 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 294.27 MB

Minimum VRAM285.74 MB

whisper-small

NameWhisper Small Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-small.safetensors (967.00 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 970.86 MB

Minimum VRAM945.03 MB

whisper-medium

NameWhisper Medium Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-medium.safetensors (3.06 GB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 3.06 GB

Minimum VRAM3.06 GB

whisper-large-v3

NameWhisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 3.09 GB

Minimum VRAM3.09 GB

distilled-whisper-small-english

NameDistilled Whisper Small (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 336.21 MB

Minimum VRAM649.01 MB

distilled-whisper-medium-english

NameDistilled Whisper Medium (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 792.71 MB

Minimum VRAM1.58 GB

distilled-whisper-large-v3 (default)

NameDistilled Whisper Large V3 Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.52 GB

Minimum VRAM1.51 GB

turbo-whisper-large-v3

NameTurbo Whisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.62 GB

Minimum VRAM1.62 GB

depth-detection

midas (default)

NameMiDaS Depth Detection
AuthorRené Ranftl, Alexey Bochkovskiy and Vladlen Koltun
Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021
https://arxiv.org/abs/2103.13413
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesdepth-detection-midas.fp16.safetensors
Minimum VRAM255.65 MB

line-detection

informative-drawings (default)

NameInformative Drawings Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings.fp16.safetensors
Minimum VRAM8.58 MB

informative-drawings-coarse

NameInformative Drawings Coarse Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM8.58 MB

informative-drawings-anime

NameInformative Drawings Anime Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM108.81 MB

mlsd

NameMobile Line Segment Detection
AuthorGeonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin
NAVER/LINE Vision
Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022
https://arxiv.org/abs/2106.00186
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesline-detection-mlsd.fp16.safetensors
Minimum VRAM3.22 MB

edge-detection

canny (default)

NameCanny Edge Detection
AuthorJohn Canny
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986
https://ieeexplore.ieee.org/document/4767851
Implementation by OpenCV (https://opencv.org/)
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

hed

NameHolistically-Nested Edge Detection
AuthorSaining Xieand Zhuowen Tu
University of California, San Diego
Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015
https://arxiv.org/abs/1504.06375
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesedge-detection-hed.fp16.safetensors
Minimum VRAM29.44 MB

pidi

NameSoft Edge (PIDI) Detection
AuthorZhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu
Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021
LicenseMIT License with Non-Commercial Clause (https://github.com/hellozhuo/pidinet/blob/master/LICENSE)
Filesedge-detection-pidi.fp16.safetensors
Minimum VRAM1.40 MB

pose-detection

openpose

NameOpenPose Pose Detection
AuthorZhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh
Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018
https://arxiv.org/abs/1812.08008
LicenseOpenPose Academic or Non-Profit Non-Commercial Research License (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/LICENSE)
Filespose-detection-openpose.fp16.safetensors
Minimum VRAM259.96 MB

dwpose (default)

NameDWPose Pose Detection
AuthorZhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li
Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA)
Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023
https://arxiv.org/abs/2307.15880
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. pose-detection-dwpose-estimation.safetensors (134.65 MB)
  2. pose-detection-dwpose-detection.safetensors (217.20 MB)

Total Size: 351.85 MB

Minimum VRAM354.64 MB

image-generation

stable-diffusion-v1-5

NameStable Diffusion v1.5 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-abyssorange-mix-v3

NameAbyssOrange Mix V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by liudinglin (https://civitai.com/user/liudinglin)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/17233)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-chillout-mix-ni

NameChillout Mix Ni Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Dreamlike Art (https://dreamlike.art)
LicenseOpenRAIL-M License with Restrictions (https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-clarity-v3

NameClarity V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by ndimensional (https://civitai.com/user/ndimensional)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/142125)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-dark-sushi-mix-v2-25d

NameDark Sushi Mix V2 2.5D Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Aitasai (https://civitai.com/user/Aitasai)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/93208)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-divine-elegance-mix-v10

NameDivine Elegance Mix V10 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by TroubleDarkness (https://civitai.com/user/TroubleDarkness)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/432048)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-dreamshaper-v8

NameDreamShaper V8 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Lykon (https://civitai.com/user/Lykon)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/128713)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-epicrealism-v5

NameepiCRealism V5 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by epinikion (https://civitai.com/user/epinikion)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/143906)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

NameepiCPhotoGasm Ultimate Fidelity Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by epinikion (https://civitai.com/user/epinikion)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/429454)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-ghostmix-v2

NameGhostMix V2 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by _GhostInShell_ (https://civitai.com/user/_GhostInShell_)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/76907)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-lyriel-v1-6

NameLyriel V1.6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Lyriel (https://civitai.com/user/Lyriel)
LicenseOpenRAIL-M License (https://civitai.com/models/license/72396)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-majicmix-realistic-v7

NameMajicMix Realistic V7 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Merjic (https://civitai.com/user/Merjic)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/176425)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-meinamix-v12

NameMeinaMix V12 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Meina (https://civitai.com/user/Meina)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/948574)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-mistoon-anime-v3

NameMistoon Anime V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Inzaniak (https://civitai.com/user/Inzaniak)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/348981)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-perfect-world-v6

NamePerfect World V6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Bloodsuga (https://civitai.com/user/Bloodsuga)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/179446)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-photon-v1

NamePhoton V1 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Photographer (https://civitai.com/user/Photographer)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/900072)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-realcartoon3d-v17

NameRealCartoon3D V17 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by 7whitefire7 (https://civitai.com/user/7whitefire7)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/637156)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-realistic-vision-v5-1

NameRealistic Vision V5.1 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/130072)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-realistic-vision-v6-0

NameRealistic Vision V6.0 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/245592)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-rev-animated-v2

NameReV Animated V2 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Zovya (https://civitai.com/user/Zovya)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/425083)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-v1-5-toonyou-beta-v6

NameToonYou Beta V6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Bradcatt (https://civitai.com/user/Bradcatt)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/125771)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB

stable-diffusion-xl

NameStable Diffusion XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-albedobase-v3-1

NameAlbedoBase XL V3.1 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/1041855)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-anything

NameAnything XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-animagine-v3-1

NameAnimagine XL V3.1 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/403131)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-copax-timeless-v13

NameCopax TimeLess V13 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/724334)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-counterfeit-v2-5

NameCounterfeitXL V2.5 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/265012)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-dreamshaper-alpha-v2

NameDreamShaper XL Alpha V2 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/126688)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-helloworld-v7

NameLEOSAM's HelloWorld XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/570138)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-juggernaut-v11 (default)

NameJuggernaut XL V11 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/782002)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-lightning-8-step

NameStable Diffusion XL Lightning (8-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-lightning-4-step

NameStable Diffusion XL Lightning (4-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-lightning-2-step

NameStable Diffusion XL Lightning (2-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-nightvision-v9

NameNightVision XL V9 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/577919)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-realvis-v5

NameRealVisXL V5 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/789646)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-stoiqo-newreality-pro

NameStoiqo New Reality XL Pro Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/690310)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-turbo

NameStable Diffusion XL Turbo Image Generation
AuthorAxel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach
Stability AI
Published in Stability AI Blog, vol. 2307.01952, “Adversarial Diffusion Distillation”, 2024
https://stability.ai/research/adversarial-diffusion-distillation
LicenseStability AI Community License (https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-unstable-diffusers-nihilmania

NameSDXL Unstable Diffusers NihilMania Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/395107)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-xl-zavychroma-v10

NameZavyChromaXL V10 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/916744)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB

stable-diffusion-v3-medium

NameStable Diffusion V3 (Medium) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.50 GB

Minimum VRAM17.86 GB

stable-diffusion-v3-5-medium

NameStable Diffusion V3.5 (Medium) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 16.27 GB

Minimum VRAM18.36 GB

stable-diffusion-v3-5-large

NameStable Diffusion V3.5 (Large) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB)
  3. image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB)
  4. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  5. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  6. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  7. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  8. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  9. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  10. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  11. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  12. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  13. text-encoding-t5-xxl-vocab.model (791.66 KB)
  14. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 27.62 GB

Minimum VRAM31.36 GB

stable-diffusion-v3-5-large-int8

NameStable Diffusion V3.5 (Large) Image Generation (Int8)
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.96 GB

Minimum VRAM16.85 GB

stable-diffusion-v3-5-large-nf4

NameStable Diffusion 3.5 (Large) Image Generation (NF4)
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 12.85 GB

Minimum VRAM12.99 GB

flux-v1-dev

NameFluxDev
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB)

Total Size: 33.74 GB

Minimum VRAM29.50 GB

flux-v1-dev-int8

NameFluxDevInt8
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM21.22 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-int8

NameStoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM21.22 GB

flux-v1-dev-nf4

NameFluxDevNF4
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

NameStoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB

flux-v1-schnell

NameFluxSchnell
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB)

Total Size: 33.72 GB

Minimum VRAM29.50 GB

flux-v1-schnell-int8

NameFluxSchnellInt8
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB)

Total Size: 18.23 GB

Minimum VRAM21.22 GB

flux-v1-schnell-nf4

NameFluxSchnellNF4
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB

video-generation

cogvideox-2b

NameCogVideoX 2B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 13.34 GB

Minimum VRAM13.48 GB

cogvideox-2b-int8

NameCogVideoX 2B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 8.04 GB

Minimum VRAM11.48 GB

cogvideox-5b

NameCogVideoX 5B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB

cogvideox-5b-int8

NameCogVideoX 5B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB

cogvideox-5b-nf4

NameCogVideoX 5B Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB

cogvideox-i2v-5b

NameCogVideoX 5B Image-to-Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.21 GB

Minimum VRAM21.48 GB

cogvideox-i2v-5b-int8

NameCogVideoX 5B Image-to-Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 17.59 GB

Minimum VRAM17.48 GB

cogvideox-i2v-5b-nf4

NameCogVideoX 5B Image-to-Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 10.01 GB

Minimum VRAM12.48 GB

cogvideox-v1-5-5b

NameCogVideoX V1.5 5B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB

cogvideox-v1-5-5b-int8

NameCogVideoX V1.5 5B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB

cogvideox-v1-5-5b-nf4

NameCogVideoX V1.5 5B Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB

cogvideox-v1-5-i2v-5b

NameCogVideoX V1.5 5B Image-to-Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB

cogvideox-v1-5-i2v-5b-int8

NameCogVideoX V1.5 5B Image-to-Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB

cogvideox-v1-5-i2v-5b-nf4

NameCogVideoX V1.5 5B Image-to-Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB

hunyuan

NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 41.90 GB

Minimum VRAM38.30 GB

hunyuan-int8

NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 22.13 GB

Minimum VRAM23.30 GB

hunyuan-nf4

NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 13.45 GB

Minimum VRAM14.78 GB

ltx (default)

NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-ltx-transformer.bf16.safetensors (3.85 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 15.24 GB

Minimum VRAM15.28 GB

ltx-int8

NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.70 GB

Minimum VRAM9.72 GB

ltx-nf4

NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.28 GB

Minimum VRAM7.29 GB

mochi-v1

NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 30.50 GB

Minimum VRAM22.95 GB

mochi-v1-int8

NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 16.87 GB

Minimum VRAM15.95 GB

mochi-v1-nf4

NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 12.89 GB

Minimum VRAM12.41 GB

text-generation

llama-v3-8b

NameLlama V3.0 8B Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q8-0.gguf
Minimum VRAM9.64 GB

llama-v3-8b-q6-k

NameLlama V3.0 8B Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q6-k.gguf
Minimum VRAM8.10 GB

llama-v3-8b-q5-k-m

NameLlama V3.0 8B Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q5-k-m.gguf
Minimum VRAM7.30 GB

llama-v3-8b-q4-k-m

NameLlama V3.0 8B Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q4-k-m.gguf
Minimum VRAM6.56 GB

llama-v3-8b-q3-k-m

NameLlama V3.0 8B Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q3-k-m.gguf
Minimum VRAM5.72 GB

llama-v3-8b-instruct

NameLlama V3.0 8B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q8-0.gguf
Minimum VRAM9.64 GB

llama-v3-8b-instruct-q6-k

NameLlama V3.0 8B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q6-k.gguf
Minimum VRAM8.10 GB

llama-v3-8b-instruct-q5-k-m

NameLlama V3.0 8B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q5-k-m.gguf
Minimum VRAM7.30 GB

llama-v3-8b-instruct-q4-k-m

NameLlama V3.0 8B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q4-k-m.gguf
Minimum VRAM6.56 GB

llama-v3-8b-instruct-q3-k-m

NameLlama V3.0 8B Instruct Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q3-k-m.gguf
Minimum VRAM5.72 GB

llama-v3-1-8b-instruct

NameLlama V3.1 8B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q8-0.gguf
Minimum VRAM9.64 GB

llama-v3-1-8b-instruct-q6-k (default)

NameLlama V3.1 8B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q6-k.gguf
Minimum VRAM8.10 GB

llama-v3-1-8b-instruct-q5-k-m

NameLlama V3.1 8B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
Minimum VRAM7.30 GB

llama-v3-1-8b-instruct-q4-k-m

NameLlama V3.1 8B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
Minimum VRAM6.56 GB

llama-v3-1-8b-instruct-q3-k-m

NameLlama V3.1 8B Instruct Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
Minimum VRAM5.72 GB

llama-v3-2-3b-instruct

NameLlama V3.2 3B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-f16.gguf
Minimum VRAM8.04 GB

llama-v3-2-3b-instruct-q8-0

NameLlama V3.2 3B Instruct Text Generation (Q8-0)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q8-0.gguf
Minimum VRAM5.02 GB

llama-v3-2-3b-instruct-q6-k

NameLlama V3.2 3B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q6-k.gguf
Minimum VRAM4.20 GB

llama-v3-2-3b-instruct-q5-k-m

NameLlama V3.2 3B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
Minimum VRAM3.90 GB

llama-v3-2-3b-instruct-q4-k-m

NameLlama V3.2 3B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
Minimum VRAM3.50 GB

llama-v3-2-3b-instruct-q3-k-l

NameLlama V3.2 3B Instruct Text Generation (Q3-K-L)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
Minimum VRAM3.10 GB

llama-v3-2-1b-instruct

NameLlama V3.2 1B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-f16.gguf
Minimum VRAM3.60 GB

llama-v3-2-1b-instruct-q8-0

NameLlama V3.2 1B Instruct Text Generation (Q8-0)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q8-0.gguf
Minimum VRAM2.43 GB

llama-v3-2-1b-instruct-q6-k

NameLlama V3.2 1B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q6-k.gguf
Minimum VRAM2.15 GB

llama-v3-2-1b-instruct-q5-k-m

NameLlama V3.2 1B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
Minimum VRAM2.02 GB

llama-v3-2-1b-instruct-q4-k-m

NameLlama V3.2 1B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
Minimum VRAM1.64 GB

llama-v3-2-1b-instruct-q3-k-l

NameLlama V3.2 1B Instruct Text Generation (Q3-K-L)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
Minimum VRAM1.58 GB

zephyr-7b-alpha

NameZephyr 7B α Text Generation (Q8)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q8-0.gguf
Minimum VRAM9.40 GB

zephyr-7b-alpha-q6-k

NameZephyr 7B α Text Generation (Q6-K)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q6-k.gguf
Minimum VRAM8.20 GB

zephyr-7b-alpha-q5-k-m

NameZephyr 7B α Text Generation (Q5-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q5-k-m.gguf
Minimum VRAM7.25 GB

zephyr-7b-alpha-q4-k-m

NameZephyr 7B α Text Generation (Q4-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q4-k-m.gguf
Minimum VRAM6.30 GB

zephyr-7b-alpha-q3-k-m

NameZephyr 7B α Text Generation (Q3-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q3-k-m.gguf
Minimum VRAM5.35 GB

zephyr-7b-beta

NameZephyr 7B β Text Generation
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q8-0.gguf
Minimum VRAM9.40 GB

zephyr-7b-beta-q6-k

NameZephyr 7B β Text Generation (Q6-K)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q6-k.gguf
Minimum VRAM8.20 GB

zephyr-7b-beta-q5-k-m

NameZephyr 7B β Text Generation (Q5-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q5-k-m.gguf
Minimum VRAM7.25 GB

zephyr-7b-beta-q4-k-m

NameZephyr 7B β Text Generation (Q4-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q4-k-m.gguf
Minimum VRAM6.30 GB

zephyr-7b-beta-q3-k-m

NameZephyr 7B β Text Generation (Q3-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q3-k-m.gguf
Minimum VRAM5.35 GB

visual-question-answering

llava-v1-5-7b

NameLLaVA V1.5 7B Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM15.80 GB

llava-v1-5-7b-q8

NameLLaVA V1.5 7B (Q8-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM9.90 GB

llava-v1-5-7b-q6-k

NameLLaVA V1.5 7B (Q6-K) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM8.40 GB

llava-v1-5-7b-q5-k-m

NameLLaVA V1.5 7B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM7.71 GB

llava-v1-5-7b-q4-k-m

NameLLaVA V1.5 7B (Q4-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM7.04 GB

llava-v1-5-7b-q3-k-m

NameLLaVA V1.5 7B (Q3-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM6.33 GB

llava-v1-5-13b

NameLLaVA V1.51 13B (Q8-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM17.51 GB

llava-v1-5-13b-q6-k

NameLLaVA V1.51 13B (Q6-K) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM14.54 GB

llava-v1-5-13b-q5-k-m

NameLLaVA V1.51 13B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM13.17 GB

llava-v1-5-13b-q4-0

NameLLaVA V1.51 13B (Q4-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM11.48 GB

llava-v1-6-34b-q5-k-m

NameLLaVA V1.6 34B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM24.96 GB

llava-v1-6-34b-q4-k-m

NameLLaVA V1.6 34B (Q4-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM21.88 GB

llava-v1-6-34b-q3-k-m

NameLLaVA V1.6 34B (Q3-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM18.06 GB

moondream-v2 (default)

NameMoondream V2 Visual Question Answering
AuthorVikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM4.44 GB

image-captioning

llava-v1-5-7b

NameLLaVA V1.5 7B Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM15.80 GB

llava-v1-5-7b-q8

NameLLaVA V1.5 7B (Q8-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM9.90 GB

llava-v1-5-7b-q6-k

NameLLaVA V1.5 7B (Q6-K) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM8.40 GB

llava-v1-5-7b-q5-k-m

NameLLaVA V1.5 7B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM7.71 GB

llava-v1-5-7b-q4-k-m

NameLLaVA V1.5 7B (Q4-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM7.04 GB

llava-v1-5-7b-q3-k-m

NameLLaVA V1.5 7B (Q3-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM6.33 GB

llava-v1-5-13b

NameLLaVA V1.51 13B (Q8-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM17.51 GB

llava-v1-5-13b-q6-k

NameLLaVA V1.51 13B (Q6-K) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM14.54 GB

llava-v1-5-13b-q5-k-m

NameLLaVA V1.51 13B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM13.17 GB

llava-v1-5-13b-q4-0

NameLLaVA V1.51 13B (Q4-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM11.48 GB

llava-v1-6-34b-q5-k-m

NameLLaVA V1.6 34B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM24.96 GB

llava-v1-6-34b-q4-k-m

NameLLaVA V1.6 34B (Q4-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM21.88 GB

llava-v1-6-34b-q3-k-m

NameLLaVA V1.6 34B (Q3-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM18.06 GB

moondream-v2 (default)

NameMoondream V2 Image Captioning
AuthorVikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM4.44 GB
Downloads last month
974
GGUF
Model size
322M params
Architecture
clip

16-bit

Inference API
Unable to determine this model's library. Check the docs .