--- license: apache-2.0 ---

An open source real-time AI inference engine for seamless scaling

# About Taproot is a seamlessly scalable AI/ML inference engine designed for deployment across hardware clusters with disparate capabilities. ## Why Taproot? Most AI/ML inference engines are built for either large-scale cloud infrastructures or constrained edge devices - Taproot is designed for **medium-scale deployments**, offering flexible and distributed on-premise or PAYG setups. It efficiently uses older or consumer-grade hardware, making it suitable for small networks or ad-hoc clusters, without relying on centralized, hyperscale architectures. ## Available Models There are more than 150 models available across 18 task categories. See the [Task Catalog](#task-catalog) for the complete list, licenses, requirements and citations. Despite the large number of models available, there are many more yet to be added - if you're looking for a particular enhancement, don't hesitate to make an issue on this repository to request it. ### Roadmap 1. IP Adapter Models for Diffusers Image Generation Pipelines 2. ControlNet Models for Diffusers Image Generation Pipelines 3. Additional quantization backends for large models - Currently BitsandBytes (Int8/NF4) and GGUF (through llama.cpp) are supported with pre-quantized checkpoints available. - FP8 support through Optimum-Quanto, TorchAO and custom kernels is in development. 4. Improved multi-GPU support - This is currently supported through manual configuration, but usability can be improved. 5. Additional annotators/detectors for image and video - E.g. Marigold, SAM2 6. Additional audio generation models - E.g. Stable Audio, AudioLDM, MusicGen # Installation ```sh pip install taproot ``` Some additional packages are available to install with the square-bracket syntax (e.g. `pip install taproot[a,b,c]`), these are: - **tools** - Additional packages for LLM tools like DuckDuckGo Search, BeautifulSoup (for web scraping), etc. - **console** - Additional packages for prettifying console output. - **av** - Additional packages for reading and writing video. ## Installing Tasks Some tasks are available immediately, but most tasks required additional packages and files. Install these tasks with `taproot install [task:model]+`, e.g: ```sh taproot install image-generation:stable-diffusion-xl ``` # Usage ## Command-Line ### Introspecting Tasks From the command line, execute `taproot tasks` to see all tasks and their availability status, or `taproot info` for individual task information. For example: ```sh taproot info image-generation stable-diffusion-xl Stable Diffusion XL Image Generation (image-generation:stable-diffusion-xl, available) Generate an image from text and/or images using a stable diffusion XL model. Hardware Requirements: GPU Required for Optimal Performance Floating Point Precision: half Minimum Memory (CPU RAM) Required: 231.71 MB Minimum Memory (GPU VRAM) Required: 7.58 GB Author: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952 License: OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) ✅ Attribution Required ✅ Derivatives Allowed ✅ Redistribution Allowed ✅ Copyleft (Share-Alike) Required ✅ Commercial Use Allowed ✅ Hosting Allowed Files: image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) [downloaded] image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) [downloaded] text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) [downloaded] text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) [downloaded] text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) [downloaded] text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) [downloaded] text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) [downloaded] text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) [downloaded] text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) [downloaded] text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) [downloaded] Total File Size: 7.11 GB Required packages: pil~=9.5 [installed] torch<2.5,>=2.4 [installed] numpy~=1.22 [installed] diffusers>=0.29 [installed] torchvision<0.20,>=0.19 [installed] transformers>=4.41 [installed] safetensors~=0.4 [installed] accelerate~=1.0 [installed] sentencepiece~=0.2 [installed] compel~=2.0 [installed] peft~=0.13 [installed] Signature: prompt: Union[str, List[str]], required prompt_2: Union[str, List[str]], default: None negative_prompt: Union[str, List[str]], default: None negative_prompt_2: Union[str, List[str]], default: None image: ImageType, default: None mask_image: ImageType, default: None guidance_scale: float, default: 5.0 guidance_rescale: float, default: 0.0 num_inference_steps: int, default: 20 num_images_per_prompt: int, default: 1 height: int, default: None width: int, default: None timesteps: List[int], default: None sigmas: List[float], default: None denoising_end: float, default: None strength: float, default: None latents: torch.Tensor, default: None prompt_embeds: torch.Tensor, default: None negative_prompt_embeds: torch.Tensor, default: None pooled_prompt_embeds: torch.Tensor, default: None negative_pooled_prompt_embeds: torch.Tensor, default: None clip_skip: int, default: None seed: SeedType, default: None pag_scale: float, default: None pag_adaptive_scale: float, default: None scheduler: Literal[ddim, ddpm, ddpm_wuerstchen, deis_multistep, dpm_cogvideox, dpmsolver_multistep, dpmsolver_multistep_karras, dpmsolver_sde, dpmsolver_sde_multistep, dpmsolver_sde_multistep_karras, dpmsolver_singlestep, dpmsolver_singlestep_karras, edm_dpmsolver_multistep, edm_euler, euler_ancestral_discrete, euler_discrete, euler_discrete_karras, flow_match_euler_discrete, flow_match_heun_discrete, heun_discrete, ipndm, k_dpm_2_ancestral_discrete, k_dpm_2_ancestral_discrete_karras, k_dpm_2_discrete, k_dpm_2_discrete_karras, lcm, lms_discrete, lms_discrete_karras, pndm, tcd, unipc], default: None output_format: Literal[png, jpeg, float, int, latent], default: png output_upload: bool, default: False highres_fix_factor: float, default: 1.0 highres_fix_strength: float, default: None spatial_prompts: SpatialPromptInputType, default: None Returns: ImageResultType ``` ### Invoking Tasks Run `taproot invoke` to run any task from the command line. All parameters to the task can be passed as flags to the call using kebab-case, e.g.: ```sh taproot invoke image-generation:stable-diffusion-xl \ --prompt "a photograph of a golden retriever at the park" \ --negative-prompt "fall, autumn, blurry, out-of-focus" \ --seed 12345 Loading task. 100%|███████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.27it/s] Task loaded in 4.0 s. Invoking task. 100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00, 4.34it/s] Task invoked in 6.5 s. Result: 8940aa12-66a7-4233-bfd6-f19da339b71b.png ``` ## Python ### Direct Task Usage ```py from taproot import Task sdxl = Task.get("image-generation", "stable-diffusion-xl") pipeline = sdxl() pipeline.load() pipeline(prompt="Hello, world!").save("./output.png") ``` ### With a Remote Server ```py from taproot import Tap tap = Tap() tap.remote_address = "ws://127.0.0.1:32189" result = tap.call("image-generation", model="stable-diffusion-xl", prompt="Hello, world!") result.save("./output.png") ``` ### With a Local Server Also shows asynchronous usage. ```py import asyncio from taproot import Tap with Tap.local() as tap: loop = asyncio.get_event_loop() result = loop.run_until_complete(tap("image-generation", model="stable-diffusion-xl", prompt="Hello, world!")) result.save("./output.png") ``` ## Running Servers Taproot uses a three-roled cluster structure: 1. **Overseers** are entry points into clusters, routing requests to one or more dispatchers. 2. **Dispatchers** are machines capable of running tasks by spawning executors. 3. **Executors** are servers ready to execute a task. The simplest way to run a server is to run an overseer simultaneously with a local dispatcher like so: ```sh taproot overseer --local ``` This will run on the default address of `ws://127.0.0.1:32189`, suitable for interaction from python or the browser. There are many deployment possibilities across networks, with configuration available for encryption, listening addresses, and more. See the wiki for details (coming soon.) ## Outside Python - [taproot.js](https://github.com/painebenjamin/taproot.js) - for the browser and node.js, available in ESM, UMD and IIFE - taproot.php - coming soon

Task Catalog

18 tasks available with 171 models.

echo: 1 model
image-similarity: 2 models
text-similarity: 1 model
speech-enhancement: 1 model
image-interpolation: 2 models
background-removal: 1 model
super-resolution: 2 models
speech-synthesis: 2 models
audio-transcription: 9 models
depth-detection: 1 model
line-detection: 4 models
edge-detection: 3 models
pose-detection: 2 models
image-generation: 52 models
video-generation: 23 models
text-generation: 37 models
visual-question-answering: 14 models
image-captioning: 14 models

echo

Name	Echo
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	N/A
Minimum VRAM	N/A

image-similarity

(default)

Name	Traditional Image Similarity
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	N/A
Minimum VRAM	N/A

inception-v3

Name	Inception Image Similarity (FID)
Author	Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna Google Research and University College London Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015 https://arxiv.org/abs/1512.00567
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	image-similarity-inception.fp16.safetensors
Minimum VRAM	50.28 MB

text-similarity

Name	Traditional Text Similarity
Author	Benjamin Paine Taproot https://github.com/painebenjamin/taproot
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	N/A
Minimum VRAM	N/A

speech-enhancement

deep-filter-net-v3 (default)

Name	DeepFilterNet V3 Speech Enhancement
Author	Hendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023 https://arxiv.org/abs/2305.08227
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	speech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM	87.89 MB

image-interpolation

film (default)

Name	Frame Interpolation for Large Motion (FiLM) Image Interpolation
Author	Fitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless Google Research and University of Washington Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022 https://arxiv.org/abs/2202.04901
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	image-interpolation-film-net.fp16.pt
Minimum VRAM	70.00 MB

rife

Name	Real-Time Intermediate Flow Estimation (RIFE) Image Interpolation
Author	Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022 https://arxiv.org/abs/2011.06294
License	MIT License (https://opensource.org/licenses/MIT)
Files	image-interpolation-rife-flownet.safetensors
Minimum VRAM	22.68 MB

background-removal

backgroundremover (default)

Name	BackgroundRemover
Author	Johnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis https://github.com/nadermx/backgroundremover
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	background-removal-u2net.safetensors
Minimum VRAM	217.62 MB

super-resolution

aura

Name	Aura Super Resolution
Author	fal.ai Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024 https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
License	CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Files	super-resolution-aura.fp16.safetensors
Minimum VRAM	1.24 GB

aura-v2 (default)

Name	Aura Super Resolution V2
Author	fal.ai Published in fal.ai blog, “AuraSR V2”, 2024 https://blog.fal.ai/aurasr-v2/
License	CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Files	super-resolution-aura-v2.fp16.safetensors
Minimum VRAM	1.24 GB

speech-synthesis

xtts-v2 (default)

Name	XTTS2 Speech Synthesis
Author	Coqui AI Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023 https://coqui.ai/blog/tts/open_xtts
License	Mozilla Public License 2.0 (https://www.mozilla.org/en-US/MPL/2.0/)
Files	speech-synthesis-xtts-v2.safetensors (1.87 GB) speech-synthesis-xtts-v2-speakers.pth (7.75 MB) speech-synthesis-xtts-v2-vocab.json (361.22 KB) Total Size: 1.88 GB
Minimum VRAM	1.91 GB

f5tts

Name	F5TTS Speech Synthesis
Author	Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024 https://arxiv.org/abs/2410.06885
License	CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
Files	speech-synthesis-f5tts.safetensors (1.35 GB) speech-synthesis-f5tts-vocab.txt (11.26 KB) audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB) audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B) Total Size: 1.40 GB
Minimum VRAM	3.94 GB

audio-transcription

whisper-tiny

Name	Whisper Tiny Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-tiny.safetensors (151.06 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 154.92 MB
Minimum VRAM	147.85 MB

whisper-base

Name	Whisper Base Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-base.safetensors (290.40 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 294.27 MB
Minimum VRAM	285.74 MB

whisper-small

Name	Whisper Small Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-small.safetensors (967.00 MB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 970.86 MB
Minimum VRAM	945.03 MB

whisper-medium

Name	Whisper Medium Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-medium.safetensors (3.06 GB) audio-transcription-whisper-tokenizer-vocab.json (835.55 KB) audio-transcription-whisper-tokenizer-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer.json (2.48 MB) Total Size: 3.06 GB
Minimum VRAM	3.06 GB

whisper-large-v3

Name	Whisper Large V3 Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 3.09 GB
Minimum VRAM	3.09 GB

distilled-whisper-small-english

Name	Distilled Whisper Small (English) Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB) audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB) audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB) audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB) audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB) Total Size: 336.21 MB
Minimum VRAM	649.01 MB

distilled-whisper-medium-english

Name	Distilled Whisper Medium (English) Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB) audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB) audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB) audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB) audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB) Total Size: 792.71 MB
Minimum VRAM	1.58 GB

distilled-whisper-large-v3 (default)

Name	Distilled Whisper Large V3 Audio Transcription
Author	Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 1.52 GB
Minimum VRAM	1.51 GB

turbo-whisper-large-v3

Name	Turbo Whisper Large V3 Audio Transcription
Author	Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB) audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB) audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB) audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB) audio-transcription-whisper-tokenizer-v3.json (2.48 MB) Total Size: 1.62 GB
Minimum VRAM	1.62 GB

depth-detection

midas (default)

Name	MiDaS Depth Detection
Author	René Ranftl, Alexey Bochkovskiy and Vladlen Koltun Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021 https://arxiv.org/abs/2103.13413
License	MIT License (https://opensource.org/licenses/MIT)
Files	depth-detection-midas.fp16.safetensors
Minimum VRAM	255.65 MB

line-detection

informative-drawings (default)

Name	Informative Drawings Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License (https://opensource.org/licenses/MIT)
Files	line-detection-informative-drawings.fp16.safetensors
Minimum VRAM	8.58 MB

informative-drawings-coarse

Name	Informative Drawings Coarse Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License (https://opensource.org/licenses/MIT)
Files	line-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM	8.58 MB

informative-drawings-anime

Name	Informative Drawings Anime Line Art Detection
Author	Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691
License	MIT License (https://opensource.org/licenses/MIT)
Files	line-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM	108.81 MB

mlsd

Name	Mobile Line Segment Detection
Author	Geonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin NAVER/LINE Vision Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022 https://arxiv.org/abs/2106.00186
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	line-detection-mlsd.fp16.safetensors
Minimum VRAM	3.22 MB

edge-detection

canny (default)

Name	Canny Edge Detection
Author	John Canny Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986 https://ieeexplore.ieee.org/document/4767851 Implementation by OpenCV (https://opencv.org/)
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	N/A
Minimum VRAM	N/A

hed

Name	Holistically-Nested Edge Detection
Author	Saining Xieand Zhuowen Tu University of California, San Diego Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015 https://arxiv.org/abs/1504.06375
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	edge-detection-hed.fp16.safetensors
Minimum VRAM	29.44 MB

pidi

Name	Soft Edge (PIDI) Detection
Author	Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021
License	MIT License with Non-Commercial Clause (https://github.com/hellozhuo/pidinet/blob/master/LICENSE)
Files	edge-detection-pidi.fp16.safetensors
Minimum VRAM	1.40 MB

pose-detection

openpose

Name	OpenPose Pose Detection
Author	Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018 https://arxiv.org/abs/1812.08008
License	OpenPose Academic or Non-Profit Non-Commercial Research License (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/LICENSE)
Files	pose-detection-openpose.fp16.safetensors
Minimum VRAM	259.96 MB

dwpose (default)

Name	DWPose Pose Detection
Author	Zhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA) Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023 https://arxiv.org/abs/2307.15880
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	pose-detection-dwpose-estimation.safetensors (134.65 MB) pose-detection-dwpose-detection.safetensors (217.20 MB) Total Size: 351.85 MB
Minimum VRAM	354.64 MB

image-generation

stable-diffusion-v1-5

Name	Stable Diffusion v1.5 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752
License	OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-abyssorange-mix-v3

Name	AbyssOrange Mix V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by liudinglin (https://civitai.com/user/liudinglin)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/17233)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-chillout-mix-ni

Name	Chillout Mix Ni Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Dreamlike Art (https://dreamlike.art)
License	OpenRAIL-M License with Restrictions (https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-clarity-v3

Name	Clarity V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by ndimensional (https://civitai.com/user/ndimensional)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/142125)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-dark-sushi-mix-v2-25d

Name	Dark Sushi Mix V2 2.5D Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Aitasai (https://civitai.com/user/Aitasai)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/93208)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-divine-elegance-mix-v10

Name	Divine Elegance Mix V10 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by TroubleDarkness (https://civitai.com/user/TroubleDarkness)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/432048)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-dreamshaper-v8

Name	DreamShaper V8 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Lykon (https://civitai.com/user/Lykon)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/128713)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-epicrealism-v5

Name	epiCRealism V5 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by epinikion (https://civitai.com/user/epinikion)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/143906)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

Name	epiCPhotoGasm Ultimate Fidelity Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by epinikion (https://civitai.com/user/epinikion)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/429454)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-ghostmix-v2

Name	GhostMix V2 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by _GhostInShell_ (https://civitai.com/user/_GhostInShell_)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/76907)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-lyriel-v1-6

Name	Lyriel V1.6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Lyriel (https://civitai.com/user/Lyriel)
License	OpenRAIL-M License (https://civitai.com/models/license/72396)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-majicmix-realistic-v7

Name	MajicMix Realistic V7 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Merjic (https://civitai.com/user/Merjic)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/176425)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-meinamix-v12

Name	MeinaMix V12 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Meina (https://civitai.com/user/Meina)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/948574)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-mistoon-anime-v3

Name	Mistoon Anime V3 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Inzaniak (https://civitai.com/user/Inzaniak)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/348981)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-perfect-world-v6

Name	Perfect World V6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Bloodsuga (https://civitai.com/user/Bloodsuga)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/179446)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-photon-v1

Name	Photon V1 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Photographer (https://civitai.com/user/Photographer)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/900072)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realcartoon3d-v17

Name	RealCartoon3D V17 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by 7whitefire7 (https://civitai.com/user/7whitefire7)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/637156)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realistic-vision-v5-1

Name	Realistic Vision V5.1 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/130072)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-realistic-vision-v6-0

Name	Realistic Vision V6.0 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/245592)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-rev-animated-v2

Name	ReV Animated V2 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Zovya (https://civitai.com/user/Zovya)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/425083)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-v1-5-toonyou-beta-v6

Name	ToonYou Beta V6 Image Generation
Author	Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Bradcatt (https://civitai.com/user/Bradcatt)
License	OpenRAIL-M License with Restrictions (https://civitai.com/models/license/125771)
Files	image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB) image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB) Total Size: 2.13 GB
Minimum VRAM	2.58 GB

stable-diffusion-xl

Name	Stable Diffusion XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-albedobase-v3-1

Name	AlbedoBase XL V3.1 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/1041855)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-anything

Name	Anything XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-animagine-v3-1

Name	Animagine XL V3.1 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/403131)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-copax-timeless-v13

Name	Copax TimeLess V13 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/724334)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-counterfeit-v2-5

Name	CounterfeitXL V2.5 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/265012)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-dreamshaper-alpha-v2

Name	DreamShaper XL Alpha V2 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/126688)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-helloworld-v7

Name	LEOSAM's HelloWorld XL Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/570138)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-juggernaut-v11 (default)

Name	Juggernaut XL V11 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/782002)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-8-step

Name	Stable Diffusion XL Lightning (8-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-4-step

Name	Stable Diffusion XL Lightning (4-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-lightning-2-step

Name	Stable Diffusion XL Lightning (2-Step)
Author	Shanchuan Lin, Anran Wang and Xiao Yang ByteDance Inc. Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024 https://arxiv.org/abs/2402.13929
License	OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-nightvision-v9

Name	NightVision XL V9 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/577919)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-realvis-v5

Name	RealVisXL V5 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/789646)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-stoiqo-newreality-pro

Name	Stoiqo New Reality XL Pro Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/690310)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-turbo

Name	Stable Diffusion XL Turbo Image Generation
Author	Axel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach Stability AI Published in Stability AI Blog, vol. 2307.01952, “Adversarial Diffusion Distillation”, 2024 https://stability.ai/research/adversarial-diffusion-distillation
License	Stability AI Community License (https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-unstable-diffusers-nihilmania

Name	SDXL Unstable Diffusers NihilMania Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/395107)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-xl-zavychroma-v10

Name	ZavyChromaXL V10 Image Generation
Author	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 https://arxiv.org/abs/2307.01952
License	OpenRAIL++-M License with Restrictions (https://civitai.com/models/license/916744)
Files	image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB) image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB) image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) Total Size: 7.11 GB
Minimum VRAM	7.06 GB

stable-diffusion-v3-medium

Name	Stable Diffusion V3 (Medium) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.50 GB
Minimum VRAM	17.86 GB

stable-diffusion-v3-5-medium

Name	Stable Diffusion V3.5 (Medium) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 16.27 GB
Minimum VRAM	18.36 GB

stable-diffusion-v3-5-large

Name	Stable Diffusion V3.5 (Large) Image Generation
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB) image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 27.62 GB
Minimum VRAM	31.36 GB

stable-diffusion-v3-5-large-int8

Name	Stable Diffusion V3.5 (Large) Image Generation (Int8)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 15.96 GB
Minimum VRAM	16.85 GB

stable-diffusion-v3-5-large-nf4

Name	Stable Diffusion 3.5 (Large) Image Generation (NF4)
Author	Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach Stability AI Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024 https://arxiv.org/abs/2403.03206
License	Stability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files	image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB) image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) Total Size: 12.85 GB
Minimum VRAM	12.99 GB

flux-v1-dev

Name	FluxDev
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB) Total Size: 33.74 GB
Minimum VRAM	29.50 GB

flux-v1-dev-int8

Name	FluxDevInt8
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB) Total Size: 18.24 GB
Minimum VRAM	21.22 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-int8

Name	Stoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB) Total Size: 18.24 GB
Minimum VRAM	21.22 GB

flux-v1-dev-nf4

Name	FluxDevNF4
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

Name	Stoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

flux-v1-schnell

Name	FluxSchnell
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB) Total Size: 33.72 GB
Minimum VRAM	29.50 GB

flux-v1-schnell-int8

Name	FluxSchnellInt8
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB) Total Size: 18.23 GB
Minimum VRAM	21.22 GB

flux-v1-schnell-nf4

Name	FluxSchnellNF4
Author	Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/
License	FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files	image-generation-flux-v1-vae.bf16.safetensors (167.67 MB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB) Total Size: 13.44 GB
Minimum VRAM	14.36 GB

video-generation

cogvideox-2b

Name	CogVideoX 2B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 13.34 GB
Minimum VRAM	13.48 GB

cogvideox-2b-int8

Name	CogVideoX 2B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 8.04 GB
Minimum VRAM	11.48 GB

cogvideox-5b

Name	CogVideoX 5B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-5b-int8

Name	CogVideoX 5B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-5b-nf4

Name	CogVideoX 5B Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

cogvideox-i2v-5b

Name	CogVideoX 5B Image-to-Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.21 GB
Minimum VRAM	21.48 GB

cogvideox-i2v-5b-int8

Name	CogVideoX 5B Image-to-Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 17.59 GB
Minimum VRAM	17.48 GB

cogvideox-i2v-5b-nf4

Name	CogVideoX 5B Image-to-Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 10.01 GB
Minimum VRAM	12.48 GB

cogvideox-v1-5-5b

Name	CogVideoX V1.5 5B Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-v1-5-5b-int8

Name	CogVideoX V1.5 5B Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-v1-5-5b-nf4

Name	CogVideoX V1.5 5B Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

cogvideox-v1-5-i2v-5b

Name	CogVideoX V1.5 5B Image-to-Video Generation
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 21.10 GB
Minimum VRAM	21.48 GB

cogvideox-v1-5-i2v-5b-int8

Name	CogVideoX V1.5 5B Image-to-Video Generation (Int8)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 11.92 GB
Minimum VRAM	17.48 GB

cogvideox-v1-5-i2v-5b-nf4

Name	CogVideoX V1.5 5B Image-to-Video Generation (NF4)
Author	Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072
License	CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB) video-generation-cog-vae.bf16.safetensors (431.22 MB) Total Size: 9.90 GB
Minimum VRAM	12.48 GB

hunyuan

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 41.90 GB
Minimum VRAM	38.30 GB

hunyuan-int8

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 22.13 GB
Minimum VRAM	23.30 GB

hunyuan-nf4

Name	Hunyuan Video Generation
Author	Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603
License	Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files	video-generation-hunyuan-vae.safetensors (985.94 MB) video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB) text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB) text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B) text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB) text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) Total Size: 13.45 GB
Minimum VRAM	14.78 GB

ltx (default)

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-ltx-transformer.bf16.safetensors (3.85 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 15.24 GB
Minimum VRAM	15.28 GB

ltx-int8

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 9.70 GB
Minimum VRAM	9.72 GB

ltx-nf4

Name	LTX Video Generation
Author	Lightricks https://github.com/Lightricks/LTX-Video
License	OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB) video-generation-ltx-vae.safetensors (1.87 GB) Total Size: 9.28 GB
Minimum VRAM	7.29 GB

mochi-v1

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.bf16.safetensors (9.52 GB) video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 30.50 GB
Minimum VRAM	22.95 GB

mochi-v1-int8

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB) video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 16.87 GB
Minimum VRAM	15.95 GB

mochi-v1-nf4

Name	Mochi Video Generation
Author	Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog
License
Files	text-encoding-t5-xxl-vocab.model (791.66 KB) text-encoding-t5-xxl-special-tokens-map.json (2.54 KB) text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB) video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB) video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB) Total Size: 12.89 GB
Minimum VRAM	12.41 GB

text-generation

llama-v3-8b

Name	Llama V3.0 8B Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-8b-q6-k

Name	Llama V3.0 8B Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-8b-q5-k-m

Name	Llama V3.0 8B Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-8b-q4-k-m

Name	Llama V3.0 8B Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-8b-q3-k-m

Name	Llama V3.0 8B Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-8b-instruct

Name	Llama V3.0 8B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-instruct-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-8b-instruct-q6-k

Name	Llama V3.0 8B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-instruct-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-8b-instruct-q5-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-instruct-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-8b-instruct-q4-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-instruct-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-8b-instruct-q3-k-m

Name	Llama V3.0 8B Instruct Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-8b-instruct-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-1-8b-instruct

Name	Llama V3.1 8B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-1-8b-instruct-q8-0.gguf
Minimum VRAM	9.64 GB

llama-v3-1-8b-instruct-q6-k (default)

Name	Llama V3.1 8B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-1-8b-instruct-q6-k.gguf
Minimum VRAM	8.10 GB

llama-v3-1-8b-instruct-q5-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
Minimum VRAM	7.30 GB

llama-v3-1-8b-instruct-q4-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
Minimum VRAM	6.56 GB

llama-v3-1-8b-instruct-q3-k-m

Name	Llama V3.1 8B Instruct Text Generation (Q3-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
Minimum VRAM	5.72 GB

llama-v3-2-3b-instruct

Name	Llama V3.2 3B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-f16.gguf
Minimum VRAM	8.04 GB

llama-v3-2-3b-instruct-q8-0

Name	Llama V3.2 3B Instruct Text Generation (Q8-0)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-q8-0.gguf
Minimum VRAM	5.02 GB

llama-v3-2-3b-instruct-q6-k

Name	Llama V3.2 3B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-q6-k.gguf
Minimum VRAM	4.20 GB

llama-v3-2-3b-instruct-q5-k-m

Name	Llama V3.2 3B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
Minimum VRAM	3.90 GB

llama-v3-2-3b-instruct-q4-k-m

Name	Llama V3.2 3B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
Minimum VRAM	3.50 GB

llama-v3-2-3b-instruct-q3-k-l

Name	Llama V3.2 3B Instruct Text Generation (Q3-K-L)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
Minimum VRAM	3.10 GB

llama-v3-2-1b-instruct

Name	Llama V3.2 1B Instruct Text Generation
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-f16.gguf
Minimum VRAM	3.60 GB

llama-v3-2-1b-instruct-q8-0

Name	Llama V3.2 1B Instruct Text Generation (Q8-0)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-q8-0.gguf
Minimum VRAM	2.43 GB

llama-v3-2-1b-instruct-q6-k

Name	Llama V3.2 1B Instruct Text Generation (Q6-K)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-q6-k.gguf
Minimum VRAM	2.15 GB

llama-v3-2-1b-instruct-q5-k-m

Name	Llama V3.2 1B Instruct Text Generation (Q5-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
Minimum VRAM	2.02 GB

llama-v3-2-1b-instruct-q4-k-m

Name	Llama V3.2 1B Instruct Text Generation (Q4-K-M)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
Minimum VRAM	1.64 GB

llama-v3-2-1b-instruct-q3-k-l

Name	Llama V3.2 1B Instruct Text Generation (Q3-K-L)
Author	Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783
License	Meta Llama 3 Community License (https://www.llama.com/llama3/license/)
Files	text-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
Minimum VRAM	1.58 GB

zephyr-7b-alpha

Name	Zephyr 7B α Text Generation (Q8)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-alpha-7b-q8-0.gguf
Minimum VRAM	9.40 GB

zephyr-7b-alpha-q6-k

Name	Zephyr 7B α Text Generation (Q6-K)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-alpha-7b-q6-k.gguf
Minimum VRAM	8.20 GB

zephyr-7b-alpha-q5-k-m

Name	Zephyr 7B α Text Generation (Q5-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-alpha-7b-q5-k-m.gguf
Minimum VRAM	7.25 GB

zephyr-7b-alpha-q4-k-m

Name	Zephyr 7B α Text Generation (Q4-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-alpha-7b-q4-k-m.gguf
Minimum VRAM	6.30 GB

zephyr-7b-alpha-q3-k-m

Name	Zephyr 7B α Text Generation (Q3-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-alpha-7b-q3-k-m.gguf
Minimum VRAM	5.35 GB

zephyr-7b-beta

Name	Zephyr 7B β Text Generation
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-beta-7b-q8-0.gguf
Minimum VRAM	9.40 GB

zephyr-7b-beta-q6-k

Name	Zephyr 7B β Text Generation (Q6-K)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-beta-7b-q6-k.gguf
Minimum VRAM	8.20 GB

zephyr-7b-beta-q5-k-m

Name	Zephyr 7B β Text Generation (Q5-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-beta-7b-q5-k-m.gguf
Minimum VRAM	7.25 GB

zephyr-7b-beta-q4-k-m

Name	Zephyr 7B β Text Generation (Q4-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-beta-7b-q4-k-m.gguf
Minimum VRAM	6.30 GB

zephyr-7b-beta-q3-k-m

Name	Zephyr 7B β Text Generation (Q3-K-M)
Author	Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944
License	MIT License (https://opensource.org/licenses/MIT)
Files	text-generation-zephyr-beta-7b-q3-k-m.gguf
Minimum VRAM	5.35 GB

visual-question-answering

llava-v1-5-7b

Name	LLaVA V1.5 7B Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 14.10 GB
Minimum VRAM	15.80 GB

llava-v1-5-7b-q8

Name	LLaVA V1.5 7B (Q8-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 7.79 GB
Minimum VRAM	9.90 GB

llava-v1-5-7b-q6-k

Name	LLaVA V1.5 7B (Q6-K) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 6.15 GB
Minimum VRAM	8.40 GB

llava-v1-5-7b-q5-k-m

Name	LLaVA V1.5 7B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 5.41 GB
Minimum VRAM	7.71 GB

llava-v1-5-7b-q4-k-m

Name	LLaVA V1.5 7B (Q4-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 4.71 GB
Minimum VRAM	7.04 GB

llava-v1-5-7b-q3-k-m

Name	LLaVA V1.5 7B (Q3-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB) image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB) Total Size: 3.92 GB
Minimum VRAM	6.33 GB

llava-v1-5-13b

Name	LLaVA V1.51 13B (Q8-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 14.48 GB
Minimum VRAM	17.51 GB

llava-v1-5-13b-q6-k

Name	LLaVA V1.51 13B (Q6-K) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 11.32 GB
Minimum VRAM	14.54 GB

llava-v1-5-13b-q5-k-m

Name	LLaVA V1.51 13B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 9.88 GB
Minimum VRAM	13.17 GB

llava-v1-5-13b-q4-0

Name	LLaVA V1.51 13B (Q4-0) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB) image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB) Total Size: 8.01 GB
Minimum VRAM	11.48 GB

llava-v1-6-34b-q5-k-m

Name	LLaVA V1.6 34B (Q5-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 25.02 GB
Minimum VRAM	24.96 GB

llava-v1-6-34b-q4-k-m

Name	LLaVA V1.6 34B (Q4-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 21.36 GB
Minimum VRAM	21.88 GB

llava-v1-6-34b-q3-k-m

Name	LLaVA V1.6 34B (Q3-K-M) Visual Question Answering
Author	Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744
License	Meta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files	visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB) image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB) Total Size: 17.35 GB
Minimum VRAM	18.06 GB

moondream-v2 (default)

Name	Moondream V2 Visual Question Answering
Author	Vikhyat Korrapati Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024 https://huggingface.co/vikhyatk/moondream2
License	Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files	visual-question-answering-moondream-v2.fp16.gguf (2.84 GB) image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB) Total Size: 3.75 GB
Minimum VRAM	4.44 GB

image-captioning