diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -2,8 +2,607 @@ license: apache-2.0 --- -![image/png](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/LQw-K4ankMferB5QuE0x_.png) +
+
+An open source real-time AI inference engine for seamless scaling +
+
+

+ painebenjamin - taproot + stars - taproot + forks - taproot
+ License + PyPI - Version + PyPI - Downloads +

-A collection of AI models supported by Taproot, a massively parallel open-source inference engine tailored for building real-time experiences on consumer hardware. +# About -Coming soon! \ No newline at end of file +Taproot is a seamlessly scalable AI/ML inference engine designed for deployment across hardware clusters with disparate capabilities. + +## Why Taproot? + +Most AI/ML inference engines are built for either large-scale cloud infrastructures or constrained edge devices - Taproot is designed for **medium-scale deployments**, offering flexible and distributed on-premise or PAYG setups. It efficiently uses older or consumer-grade hardware, making it suitable for small networks or ad-hoc clusters, without relying on centralized, hyperscale architectures. + +## Available Models + +There are more than 150 models available across 18 task categories. See the [Task Catalog](#task-catalog) for the complete list, licenses, requirements and citations. Despite the large number of models available, there are many more yet to be added - if you're looking for a particular enhancement, don't hesitate to make an issue on this repository to request it. + +### Roadmap + +1. IP Adapter Models for Diffusers Image Generation Pipelines +2. ControlNet Models for Diffusers Image Generation Pipelines +3. Additional quantization backends for large models + - Currently BitsandBytes (Int8/NF4) and GGUF (through llama.cpp) are supported with pre-quantized checkpoints available. + - FP8 support through Optimum-Quanto, TorchAO and custom kernels is in development. +4. Improved multi-GPU support + - This is currently supported through manual configuration, but usability can be improved. +5. Additional annotators/detectors for image and video + - E.g. Marigold, SAM2 +6. Additional audio generation models + - E.g. Stable Audio, AudioLDM, MusicGen + +# Installation + +```sh +pip install taproot +``` + +Some additional packages are available to install with the square-bracket syntax (e.g. `pip install taproot[a,b,c]`), these are: +- **tools** - Additional packages for LLM tools like DuckDuckGo Search, BeautifulSoup (for web scraping), etc. +- **console** - Additional packages for prettifying console output. +- **av** - Additional packages for reading and writing video. + +## Installing Tasks + +Some tasks are available immediately, but most tasks required additional packages and files. Install these tasks with `taproot install [task:model]+`, e.g: + +```sh +taproot install image-generation:stable-diffusion-xl +``` + +# Usage + +## Command-Line + +### Introspecting Tasks + +From the command line, execute `taproot tasks` to see all tasks and their availability status, or `taproot info` for individual task information. For example: + +```sh +taproot info image-generation stable-diffusion-xl + +Stable Diffusion XL Image Generation (image-generation:stable-diffusion-xl, available) + Generate an image from text and/or images using a stable diffusion XL model. +Hardware Requirements: + GPU Required for Optimal Performance + Floating Point Precision: half + Minimum Memory (CPU RAM) Required: 231.71 MB + Minimum Memory (GPU VRAM) Required: 7.58 GB +Author: + Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach + Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023 + https://arxiv.org/abs/2307.01952 +License: + OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) + ✅ Attribution Required + ✅ Derivatives Allowed + ✅ Redistribution Allowed + ✅ Copyleft (Share-Alike) Required + ✅ Commercial Use Allowed + ✅ Hosting Allowed +Files: + image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) [downloaded] + image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) [downloaded] + text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) [downloaded] + text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) [downloaded] + text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) [downloaded] + text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) [downloaded] + text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) [downloaded] + text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) [downloaded] + text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) [downloaded] + text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) [downloaded] + Total File Size: 7.11 GB +Required packages: + pil~=9.5 [installed] + torch<2.5,>=2.4 [installed] + numpy~=1.22 [installed] + diffusers>=0.29 [installed] + torchvision<0.20,>=0.19 [installed] + transformers>=4.41 [installed] + safetensors~=0.4 [installed] + accelerate~=1.0 [installed] + sentencepiece~=0.2 [installed] + compel~=2.0 [installed] + peft~=0.13 [installed] +Signature: + prompt: Union[str, List[str]], required + prompt_2: Union[str, List[str]], default: None + negative_prompt: Union[str, List[str]], default: None + negative_prompt_2: Union[str, List[str]], default: None + image: ImageType, default: None + mask_image: ImageType, default: None + guidance_scale: float, default: 5.0 + guidance_rescale: float, default: 0.0 + num_inference_steps: int, default: 20 + num_images_per_prompt: int, default: 1 + height: int, default: None + width: int, default: None + timesteps: List[int], default: None + sigmas: List[float], default: None + denoising_end: float, default: None + strength: float, default: None + latents: torch.Tensor, default: None + prompt_embeds: torch.Tensor, default: None + negative_prompt_embeds: torch.Tensor, default: None + pooled_prompt_embeds: torch.Tensor, default: None + negative_pooled_prompt_embeds: torch.Tensor, default: None + clip_skip: int, default: None + seed: SeedType, default: None + pag_scale: float, default: None + pag_adaptive_scale: float, default: None + scheduler: Literal[ddim, ddpm, ddpm_wuerstchen, deis_multistep, dpm_cogvideox, dpmsolver_multistep, dpmsolver_multistep_karras, dpmsolver_sde, dpmsolver_sde_multistep, dpmsolver_sde_multistep_karras, dpmsolver_singlestep, dpmsolver_singlestep_karras, edm_dpmsolver_multistep, edm_euler, euler_ancestral_discrete, euler_discrete, euler_discrete_karras, flow_match_euler_discrete, flow_match_heun_discrete, heun_discrete, ipndm, k_dpm_2_ancestral_discrete, k_dpm_2_ancestral_discrete_karras, k_dpm_2_discrete, k_dpm_2_discrete_karras, lcm, lms_discrete, lms_discrete_karras, pndm, tcd, unipc], default: None + output_format: Literal[png, jpeg, float, int, latent], default: png + output_upload: bool, default: False + highres_fix_factor: float, default: 1.0 + highres_fix_strength: float, default: None + spatial_prompts: SpatialPromptInputType, default: None +Returns: + ImageResultType +``` + +### Invoking Tasks + +Run `taproot invoke` to run any task from the command line. All parameters to the task can be passed as flags to the call using kebab-case, e.g.: + +```sh +taproot invoke image-generation:stable-diffusion-xl \ + --prompt "a photograph of a golden retriever at the park" \ + --negative-prompt "fall, autumn, blurry, out-of-focus" \ + --seed 12345 +Loading task. +100%|███████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.27it/s] +Task loaded in 4.0 s. +Invoking task. +100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00, 4.34it/s] +Task invoked in 6.5 s. Result: +8940aa12-66a7-4233-bfd6-f19da339b71b.png +``` + +## Python + +### Direct Task Usage + +```py +from taproot import Task +sdxl = Task.get("image-generation", "stable-diffusion-xl") +pipeline = sdxl() +pipeline.load() +pipeline(prompt="Hello, world!").save("./output.png") +``` + +### With a Remote Server + +```py +from taproot import Tap +tap = Tap() +tap.remote_address = "ws://127.0.0.1:32189" +result = tap.call("image-generation", model="stable-diffusion-xl", prompt="Hello, world!") +result.save("./output.png") +``` + +### With a Local Server + +Also shows asynchronous usage. + +```py +import asyncio +from taproot import Tap +with Tap.local() as tap: + loop = asyncio.get_event_loop() + result = loop.run_until_complete(tap("image-generation", model="stable-diffusion-xl", prompt="Hello, world!")) + result.save("./output.png") +``` + +## Running Servers + +Taproot uses a three-roled cluster structure: +1. **Overseers** are entry points into clusters, routing requests to one or more dispatchers. +2. **Dispatchers** are machines capable of running tasks by spawning executors. +3. **Executors** are servers ready to execute a task. + +The simplest way to run a server is to run an overseer simultaneously with a local dispatcher like so: + +```sh +taproot overseer --local +``` + +This will run on the default address of `ws://127.0.0.1:32189`, suitable for interaction from python or the browser. + +There are many deployment possibilities across networks, with configuration available for encryption, listening addresses, and more. See the wiki for details (coming soon.) + +## Outside Python + +- [taproot.js](https://github.com/painebenjamin/taproot.js) - for the browser and node.js, available in ESM, UMD and IIFE +- taproot.php - coming soon + +

Task Catalog

+

18 tasks available with 171 models.

+ +

echo

+
NameEcho
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A
+

image-similarity

+

(default)

+
NameTraditional Image Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A
+

inception-v3

+
NameInception Image Similarity (FID)
AuthorChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna
Google Research and University College London
Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015
https://arxiv.org/abs/1512.00567
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-similarity-inception.fp16.safetensors
Minimum VRAM50.28 MB
+

text-similarity

+
NameTraditional Text Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A
+

speech-enhancement

+

deep-filter-net-v3 (default)

+
NameDeepFilterNet V3 Speech Enhancement
AuthorHendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier
Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023
https://arxiv.org/abs/2305.08227
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesspeech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM87.89 MB
+

image-interpolation

+

film (default)

+
NameFrame Interpolation for Large Motion (FiLM) Image Interpolation
AuthorFitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless
Google Research and University of Washington
Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022
https://arxiv.org/abs/2202.04901
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-interpolation-film-net.fp16.pt
Minimum VRAM70.00 MB
+

rife

+
NameReal-Time Intermediate Flow Estimation (RIFE) Image Interpolation
AuthorZhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou
Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence
Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022
https://arxiv.org/abs/2011.06294
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesimage-interpolation-rife-flownet.safetensors
Minimum VRAM22.68 MB
+

background-removal

+

backgroundremover (default)

+
NameBackgroundRemover
AuthorJohnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis
https://github.com/nadermx/backgroundremover
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesbackground-removal-u2net.safetensors
Minimum VRAM217.62 MB
+

super-resolution

+

aura

+
NameAura Super Resolution
Authorfal.ai
Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024
https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura.fp16.safetensors
Minimum VRAM1.24 GB
+

aura-v2 (default)

+
NameAura Super Resolution V2
Authorfal.ai
Published in fal.ai blog, “AuraSR V2”, 2024
https://blog.fal.ai/aurasr-v2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura-v2.fp16.safetensors
Minimum VRAM1.24 GB
+

speech-synthesis

+

xtts-v2 (default)

+
NameXTTS2 Speech Synthesis
AuthorCoqui AI
Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023
https://coqui.ai/blog/tts/open_xtts
LicenseMozilla Public License 2.0 (https://www.mozilla.org/en-US/MPL/2.0/)
Files
  1. speech-synthesis-xtts-v2.safetensors (1.87 GB)
  2. speech-synthesis-xtts-v2-speakers.pth (7.75 MB)
  3. speech-synthesis-xtts-v2-vocab.json (361.22 KB)

Total Size: 1.88 GB

Minimum VRAM1.91 GB
+

f5tts

+
NameF5TTS Speech Synthesis
AuthorYushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen
Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024
https://arxiv.org/abs/2410.06885
LicenseCC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
Files
  1. speech-synthesis-f5tts.safetensors (1.35 GB)
  2. speech-synthesis-f5tts-vocab.txt (11.26 KB)
  3. audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB)
  4. audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B)

Total Size: 1.40 GB

Minimum VRAM3.94 GB
+

audio-transcription

+

whisper-tiny

+
NameWhisper Tiny Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-tiny.safetensors (151.06 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 154.92 MB

Minimum VRAM147.85 MB
+

whisper-base

+
NameWhisper Base Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-base.safetensors (290.40 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 294.27 MB

Minimum VRAM285.74 MB
+

whisper-small

+
NameWhisper Small Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-small.safetensors (967.00 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 970.86 MB

Minimum VRAM945.03 MB
+

whisper-medium

+
NameWhisper Medium Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-medium.safetensors (3.06 GB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 3.06 GB

Minimum VRAM3.06 GB
+

whisper-large-v3

+
NameWhisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 3.09 GB

Minimum VRAM3.09 GB
+

distilled-whisper-small-english

+
NameDistilled Whisper Small (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 336.21 MB

Minimum VRAM649.01 MB
+

distilled-whisper-medium-english

+
NameDistilled Whisper Medium (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 792.71 MB

Minimum VRAM1.58 GB
+

distilled-whisper-large-v3 (default)

+
NameDistilled Whisper Large V3 Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.52 GB

Minimum VRAM1.51 GB
+

turbo-whisper-large-v3

+
NameTurbo Whisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.62 GB

Minimum VRAM1.62 GB
+

depth-detection

+

midas (default)

+
NameMiDaS Depth Detection
AuthorRené Ranftl, Alexey Bochkovskiy and Vladlen Koltun
Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021
https://arxiv.org/abs/2103.13413
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesdepth-detection-midas.fp16.safetensors
Minimum VRAM255.65 MB
+

line-detection

+

informative-drawings (default)

+
NameInformative Drawings Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings.fp16.safetensors
Minimum VRAM8.58 MB
+

informative-drawings-coarse

+
NameInformative Drawings Coarse Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM8.58 MB
+

informative-drawings-anime

+
NameInformative Drawings Anime Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM108.81 MB
+

mlsd

+
NameMobile Line Segment Detection
AuthorGeonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin
NAVER/LINE Vision
Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022
https://arxiv.org/abs/2106.00186
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesline-detection-mlsd.fp16.safetensors
Minimum VRAM3.22 MB
+

edge-detection

+

canny (default)

+
NameCanny Edge Detection
AuthorJohn Canny
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986
https://ieeexplore.ieee.org/document/4767851
Implementation by OpenCV (https://opencv.org/)
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A
+

hed

+
NameHolistically-Nested Edge Detection
AuthorSaining Xieand Zhuowen Tu
University of California, San Diego
Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015
https://arxiv.org/abs/1504.06375
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesedge-detection-hed.fp16.safetensors
Minimum VRAM29.44 MB
+

pidi

+
NameSoft Edge (PIDI) Detection
AuthorZhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu
Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021
LicenseMIT License with Non-Commercial Clause (https://github.com/hellozhuo/pidinet/blob/master/LICENSE)
Filesedge-detection-pidi.fp16.safetensors
Minimum VRAM1.40 MB
+

pose-detection

+

openpose

+
NameOpenPose Pose Detection
AuthorZhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh
Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018
https://arxiv.org/abs/1812.08008
LicenseOpenPose Academic or Non-Profit Non-Commercial Research License (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/LICENSE)
Filespose-detection-openpose.fp16.safetensors
Minimum VRAM259.96 MB
+

dwpose (default)

+
NameDWPose Pose Detection
AuthorZhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li
Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA)
Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023
https://arxiv.org/abs/2307.15880
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. pose-detection-dwpose-estimation.safetensors (134.65 MB)
  2. pose-detection-dwpose-detection.safetensors (217.20 MB)

Total Size: 351.85 MB

Minimum VRAM354.64 MB
+

image-generation

+

stable-diffusion-v1-5

+
NameStable Diffusion v1.5 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-abyssorange-mix-v3

+
NameAbyssOrange Mix V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by liudinglin (https://civitai.com/user/liudinglin)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/17233)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-chillout-mix-ni

+
NameChillout Mix Ni Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Dreamlike Art (https://dreamlike.art)
LicenseOpenRAIL-M License with Restrictions (https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-clarity-v3

+
NameClarity V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by ndimensional (https://civitai.com/user/ndimensional)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/142125)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-dark-sushi-mix-v2-25d

+
NameDark Sushi Mix V2 2.5D Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Aitasai (https://civitai.com/user/Aitasai)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/93208)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-divine-elegance-mix-v10

+
NameDivine Elegance Mix V10 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by TroubleDarkness (https://civitai.com/user/TroubleDarkness)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/432048)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-dreamshaper-v8

+
NameDreamShaper V8 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Lykon (https://civitai.com/user/Lykon)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/128713)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-epicrealism-v5

+
NameepiCRealism V5 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by epinikion (https://civitai.com/user/epinikion)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/143906)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

+
NameepiCPhotoGasm Ultimate Fidelity Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by epinikion (https://civitai.com/user/epinikion)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/429454)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-ghostmix-v2

+
NameGhostMix V2 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by _GhostInShell_ (https://civitai.com/user/_GhostInShell_)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/76907)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-lyriel-v1-6

+
NameLyriel V1.6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Lyriel (https://civitai.com/user/Lyriel)
LicenseOpenRAIL-M License (https://civitai.com/models/license/72396)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-majicmix-realistic-v7

+
NameMajicMix Realistic V7 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Merjic (https://civitai.com/user/Merjic)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/176425)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-meinamix-v12

+
NameMeinaMix V12 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Meina (https://civitai.com/user/Meina)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/948574)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-mistoon-anime-v3

+
NameMistoon Anime V3 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Inzaniak (https://civitai.com/user/Inzaniak)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/348981)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-perfect-world-v6

+
NamePerfect World V6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Bloodsuga (https://civitai.com/user/Bloodsuga)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/179446)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-photon-v1

+
NamePhoton V1 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Photographer (https://civitai.com/user/Photographer)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/900072)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-realcartoon3d-v17

+
NameRealCartoon3D V17 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by 7whitefire7 (https://civitai.com/user/7whitefire7)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/637156)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-realistic-vision-v5-1

+
NameRealistic Vision V5.1 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/130072)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-realistic-vision-v6-0

+
NameRealistic Vision V6.0 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/245592)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-rev-animated-v2

+
NameReV Animated V2 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Zovya (https://civitai.com/user/Zovya)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/425083)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-v1-5-toonyou-beta-v6

+
NameToonYou Beta V6 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
Finetuned by Bradcatt (https://civitai.com/user/Bradcatt)
LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/125771)
Files
  1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  2. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB)
  3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  6. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB)

Total Size: 2.13 GB

Minimum VRAM2.58 GB
+

stable-diffusion-xl

+
NameStable Diffusion XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-albedobase-v3-1

+
NameAlbedoBase XL V3.1 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/1041855)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-anything

+
NameAnything XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-animagine-v3-1

+
NameAnimagine XL V3.1 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/403131)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-copax-timeless-v13

+
NameCopax TimeLess V13 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/724334)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-counterfeit-v2-5

+
NameCounterfeitXL V2.5 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/265012)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-dreamshaper-alpha-v2

+
NameDreamShaper XL Alpha V2 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/126688)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-helloworld-v7

+
NameLEOSAM's HelloWorld XL Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/570138)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-juggernaut-v11 (default)

+
NameJuggernaut XL V11 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/782002)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-lightning-8-step

+
NameStable Diffusion XL Lightning (8-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-lightning-4-step

+
NameStable Diffusion XL Lightning (4-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-lightning-2-step

+
NameStable Diffusion XL Lightning (2-Step)
AuthorShanchuan Lin, Anran Wang and Xiao Yang
ByteDance Inc.
Published in arXiv, vol. 2402.13929, “SDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
https://arxiv.org/abs/2402.13929
LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-nightvision-v9

+
NameNightVision XL V9 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/577919)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-realvis-v5

+
NameRealVisXL V5 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/789646)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-stoiqo-newreality-pro

+
NameStoiqo New Reality XL Pro Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/690310)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-turbo

+
NameStable Diffusion XL Turbo Image Generation
AuthorAxel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach
Stability AI
Published in Stability AI Blog, vol. 2307.01952, “Adversarial Diffusion Distillation”, 2024
https://stability.ai/research/adversarial-diffusion-distillation
LicenseStability AI Community License (https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-unstable-diffusers-nihilmania

+
NameSDXL Unstable Diffusers NihilMania Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/395107)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-xl-zavychroma-v10

+
NameZavyChromaXL V10 Image Generation
AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/916744)
Files
  1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
  2. image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB)
  3. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB)
  4. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

Total Size: 7.11 GB

Minimum VRAM7.06 GB
+

stable-diffusion-v3-medium

+
NameStable Diffusion V3 (Medium) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.50 GB

Minimum VRAM17.86 GB
+

stable-diffusion-v3-5-medium

+
NameStable Diffusion V3.5 (Medium) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 16.27 GB

Minimum VRAM18.36 GB
+

stable-diffusion-v3-5-large

+
NameStable Diffusion V3.5 (Large) Image Generation
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB)
  3. image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB)
  4. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  5. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  6. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  7. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  8. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  9. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  10. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  11. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  12. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  13. text-encoding-t5-xxl-vocab.model (791.66 KB)
  14. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 27.62 GB

Minimum VRAM31.36 GB
+

stable-diffusion-v3-5-large-int8

+
NameStable Diffusion V3.5 (Large) Image Generation (Int8)
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 15.96 GB

Minimum VRAM16.85 GB
+

stable-diffusion-v3-5-large-nf4

+
NameStable Diffusion 3.5 (Large) Image Generation (NF4)
AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
Stability AI
Published in arXiv, vol. 2403.03206, “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
https://arxiv.org/abs/2403.03206
LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
Files
  1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
  2. image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB)
  3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
  5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
  10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
  11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
  12. text-encoding-t5-xxl-vocab.model (791.66 KB)
  13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

Total Size: 12.85 GB

Minimum VRAM12.99 GB
+

flux-v1-dev

+
NameFluxDev
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB)

Total Size: 33.74 GB

Minimum VRAM29.50 GB
+

flux-v1-dev-int8

+
NameFluxDevInt8
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM21.22 GB
+

flux-v1-dev-stoiqo-newreality-alpha-v2-int8

+
NameStoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB)

Total Size: 18.24 GB

Minimum VRAM21.22 GB
+

flux-v1-dev-nf4

+
NameFluxDevNF4
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB
+

flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

+
NameStoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB
+

flux-v1-schnell

+
NameFluxSchnell
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB)

Total Size: 33.72 GB

Minimum VRAM29.50 GB
+

flux-v1-schnell-int8

+
NameFluxSchnellInt8
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB)

Total Size: 18.23 GB

Minimum VRAM21.22 GB
+

flux-v1-schnell-nf4

+
NameFluxSchnellNF4
AuthorBlack Forest Labs
Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024
https://blackforestlabs.ai/announcing-black-forest-labs/
LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Files
  1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
  2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  7. text-encoding-t5-xxl-vocab.model (791.66 KB)
  8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  9. image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB)

Total Size: 13.44 GB

Minimum VRAM14.36 GB
+

video-generation

+

cogvideox-2b

+
NameCogVideoX 2B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 13.34 GB

Minimum VRAM13.48 GB
+

cogvideox-2b-int8

+
NameCogVideoX 2B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 8.04 GB

Minimum VRAM11.48 GB
+

cogvideox-5b

+
NameCogVideoX 5B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB
+

cogvideox-5b-int8

+
NameCogVideoX 5B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB
+

cogvideox-5b-nf4

+
NameCogVideoX 5B Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB
+

cogvideox-i2v-5b

+
NameCogVideoX 5B Image-to-Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.21 GB

Minimum VRAM21.48 GB
+

cogvideox-i2v-5b-int8

+
NameCogVideoX 5B Image-to-Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 17.59 GB

Minimum VRAM17.48 GB
+

cogvideox-i2v-5b-nf4

+
NameCogVideoX 5B Image-to-Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 10.01 GB

Minimum VRAM12.48 GB
+

cogvideox-v1-5-5b

+
NameCogVideoX V1.5 5B Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB
+

cogvideox-v1-5-5b-int8

+
NameCogVideoX V1.5 5B Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB
+

cogvideox-v1-5-5b-nf4

+
NameCogVideoX V1.5 5B Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB
+

cogvideox-v1-5-i2v-5b

+
NameCogVideoX V1.5 5B Image-to-Video Generation
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 21.10 GB

Minimum VRAM21.48 GB
+

cogvideox-v1-5-i2v-5b-int8

+
NameCogVideoX V1.5 5B Image-to-Video Generation (Int8)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 11.92 GB

Minimum VRAM17.48 GB
+

cogvideox-v1-5-i2v-5b-nf4

+
NameCogVideoX V1.5 5B Image-to-Video Generation (NF4)
AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
Zhipu AI and Tsinghua University
Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
https://arxiv.org/abs/2408.06072
LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB)
  5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

Total Size: 9.90 GB

Minimum VRAM12.48 GB
+

hunyuan

+
NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 41.90 GB

Minimum VRAM38.30 GB
+

hunyuan-int8

+
NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 22.13 GB

Minimum VRAM23.30 GB
+

hunyuan-nf4

+
NameHunyuan Video Generation
AuthorHunyuan Foundation Model Team
Tencent
Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
https://arxiv.org/abs/2412.03603
LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
Files
  1. video-generation-hunyuan-vae.safetensors (985.94 MB)
  2. video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB)
  3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
  4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
  5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  8. text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB)
  9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

Total Size: 13.45 GB

Minimum VRAM14.78 GB
+

ltx (default)

+
NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-ltx-transformer.bf16.safetensors (3.85 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 15.24 GB

Minimum VRAM15.28 GB
+

ltx-int8

+
NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.70 GB

Minimum VRAM9.72 GB
+

ltx-nf4

+
NameLTX Video Generation
AuthorLightricks
https://github.com/Lightricks/LTX-Video
LicenseOpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m)
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB)
  5. video-generation-ltx-vae.safetensors (1.87 GB)

Total Size: 9.28 GB

Minimum VRAM7.29 GB
+

mochi-v1

+
NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  4. video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 30.50 GB

Minimum VRAM22.95 GB
+

mochi-v1-int8

+
NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  4. video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 16.87 GB

Minimum VRAM15.95 GB
+

mochi-v1-nf4

+
NameMochi Video Generation
AuthorGenmo AI
Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024
https://www.genmo.ai/blog
License
Files
  1. text-encoding-t5-xxl-vocab.model (791.66 KB)
  2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  4. video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB)
  5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

Total Size: 12.89 GB

Minimum VRAM12.41 GB
+

text-generation

+

llama-v3-8b

+
NameLlama V3.0 8B Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q8-0.gguf
Minimum VRAM9.64 GB
+

llama-v3-8b-q6-k

+
NameLlama V3.0 8B Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q6-k.gguf
Minimum VRAM8.10 GB
+

llama-v3-8b-q5-k-m

+
NameLlama V3.0 8B Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q5-k-m.gguf
Minimum VRAM7.30 GB
+

llama-v3-8b-q4-k-m

+
NameLlama V3.0 8B Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q4-k-m.gguf
Minimum VRAM6.56 GB
+

llama-v3-8b-q3-k-m

+
NameLlama V3.0 8B Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-q3-k-m.gguf
Minimum VRAM5.72 GB
+

llama-v3-8b-instruct

+
NameLlama V3.0 8B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q8-0.gguf
Minimum VRAM9.64 GB
+

llama-v3-8b-instruct-q6-k

+
NameLlama V3.0 8B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q6-k.gguf
Minimum VRAM8.10 GB
+

llama-v3-8b-instruct-q5-k-m

+
NameLlama V3.0 8B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q5-k-m.gguf
Minimum VRAM7.30 GB
+

llama-v3-8b-instruct-q4-k-m

+
NameLlama V3.0 8B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q4-k-m.gguf
Minimum VRAM6.56 GB
+

llama-v3-8b-instruct-q3-k-m

+
NameLlama V3.0 8B Instruct Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-8b-instruct-q3-k-m.gguf
Minimum VRAM5.72 GB
+

llama-v3-1-8b-instruct

+
NameLlama V3.1 8B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q8-0.gguf
Minimum VRAM9.64 GB
+

llama-v3-1-8b-instruct-q6-k (default)

+
NameLlama V3.1 8B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q6-k.gguf
Minimum VRAM8.10 GB
+

llama-v3-1-8b-instruct-q5-k-m

+
NameLlama V3.1 8B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
Minimum VRAM7.30 GB
+

llama-v3-1-8b-instruct-q4-k-m

+
NameLlama V3.1 8B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
Minimum VRAM6.56 GB
+

llama-v3-1-8b-instruct-q3-k-m

+
NameLlama V3.1 8B Instruct Text Generation (Q3-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
Minimum VRAM5.72 GB
+

llama-v3-2-3b-instruct

+
NameLlama V3.2 3B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-f16.gguf
Minimum VRAM8.04 GB
+

llama-v3-2-3b-instruct-q8-0

+
NameLlama V3.2 3B Instruct Text Generation (Q8-0)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q8-0.gguf
Minimum VRAM5.02 GB
+

llama-v3-2-3b-instruct-q6-k

+
NameLlama V3.2 3B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q6-k.gguf
Minimum VRAM4.20 GB
+

llama-v3-2-3b-instruct-q5-k-m

+
NameLlama V3.2 3B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
Minimum VRAM3.90 GB
+

llama-v3-2-3b-instruct-q4-k-m

+
NameLlama V3.2 3B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
Minimum VRAM3.50 GB
+

llama-v3-2-3b-instruct-q3-k-l

+
NameLlama V3.2 3B Instruct Text Generation (Q3-K-L)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
Minimum VRAM3.10 GB
+

llama-v3-2-1b-instruct

+
NameLlama V3.2 1B Instruct Text Generation
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-f16.gguf
Minimum VRAM3.60 GB
+

llama-v3-2-1b-instruct-q8-0

+
NameLlama V3.2 1B Instruct Text Generation (Q8-0)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q8-0.gguf
Minimum VRAM2.43 GB
+

llama-v3-2-1b-instruct-q6-k

+
NameLlama V3.2 1B Instruct Text Generation (Q6-K)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q6-k.gguf
Minimum VRAM2.15 GB
+

llama-v3-2-1b-instruct-q5-k-m

+
NameLlama V3.2 1B Instruct Text Generation (Q5-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
Minimum VRAM2.02 GB
+

llama-v3-2-1b-instruct-q4-k-m

+
NameLlama V3.2 1B Instruct Text Generation (Q4-K-M)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
Minimum VRAM1.64 GB
+

llama-v3-2-1b-instruct-q3-k-l

+
NameLlama V3.2 1B Instruct Text Generation (Q3-K-L)
AuthorMeta AI
Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024
https://arxiv.org/abs/2407.21783
LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
Filestext-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
Minimum VRAM1.58 GB
+

zephyr-7b-alpha

+
NameZephyr 7B α Text Generation (Q8)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q8-0.gguf
Minimum VRAM9.40 GB
+

zephyr-7b-alpha-q6-k

+
NameZephyr 7B α Text Generation (Q6-K)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q6-k.gguf
Minimum VRAM8.20 GB
+

zephyr-7b-alpha-q5-k-m

+
NameZephyr 7B α Text Generation (Q5-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q5-k-m.gguf
Minimum VRAM7.25 GB
+

zephyr-7b-alpha-q4-k-m

+
NameZephyr 7B α Text Generation (Q4-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q4-k-m.gguf
Minimum VRAM6.30 GB
+

zephyr-7b-alpha-q3-k-m

+
NameZephyr 7B α Text Generation (Q3-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-alpha-7b-q3-k-m.gguf
Minimum VRAM5.35 GB
+

zephyr-7b-beta

+
NameZephyr 7B β Text Generation
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q8-0.gguf
Minimum VRAM9.40 GB
+

zephyr-7b-beta-q6-k

+
NameZephyr 7B β Text Generation (Q6-K)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q6-k.gguf
Minimum VRAM8.20 GB
+

zephyr-7b-beta-q5-k-m

+
NameZephyr 7B β Text Generation (Q5-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q5-k-m.gguf
Minimum VRAM7.25 GB
+

zephyr-7b-beta-q4-k-m

+
NameZephyr 7B β Text Generation (Q4-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q4-k-m.gguf
Minimum VRAM6.30 GB
+

zephyr-7b-beta-q3-k-m

+
NameZephyr 7B β Text Generation (Q3-K-M)
AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023
https://arxiv.org/abs/2310.16944
LicenseMIT License (https://opensource.org/licenses/MIT)
Filestext-generation-zephyr-beta-7b-q3-k-m.gguf
Minimum VRAM5.35 GB
+

visual-question-answering

+

llava-v1-5-7b

+
NameLLaVA V1.5 7B Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM15.80 GB
+

llava-v1-5-7b-q8

+
NameLLaVA V1.5 7B (Q8-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM9.90 GB
+

llava-v1-5-7b-q6-k

+
NameLLaVA V1.5 7B (Q6-K) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM8.40 GB
+

llava-v1-5-7b-q5-k-m

+
NameLLaVA V1.5 7B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM7.71 GB
+

llava-v1-5-7b-q4-k-m

+
NameLLaVA V1.5 7B (Q4-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM7.04 GB
+

llava-v1-5-7b-q3-k-m

+
NameLLaVA V1.5 7B (Q3-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM6.33 GB
+

llava-v1-5-13b

+
NameLLaVA V1.51 13B (Q8-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM17.51 GB
+

llava-v1-5-13b-q6-k

+
NameLLaVA V1.51 13B (Q6-K) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM14.54 GB
+

llava-v1-5-13b-q5-k-m

+
NameLLaVA V1.51 13B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM13.17 GB
+

llava-v1-5-13b-q4-0

+
NameLLaVA V1.51 13B (Q4-0) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM11.48 GB
+

llava-v1-6-34b-q5-k-m

+
NameLLaVA V1.6 34B (Q5-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM24.96 GB
+

llava-v1-6-34b-q4-k-m

+
NameLLaVA V1.6 34B (Q4-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM21.88 GB
+

llava-v1-6-34b-q3-k-m

+
NameLLaVA V1.6 34B (Q3-K-M) Visual Question Answering
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM18.06 GB
+

moondream-v2 (default)

+
NameMoondream V2 Visual Question Answering
AuthorVikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM4.44 GB
+

image-captioning

+

llava-v1-5-7b

+
NameLLaVA V1.5 7B Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 14.10 GB

Minimum VRAM15.80 GB
+

llava-v1-5-7b-q8

+
NameLLaVA V1.5 7B (Q8-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 7.79 GB

Minimum VRAM9.90 GB
+

llava-v1-5-7b-q6-k

+
NameLLaVA V1.5 7B (Q6-K) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 6.15 GB

Minimum VRAM8.40 GB
+

llava-v1-5-7b-q5-k-m

+
NameLLaVA V1.5 7B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 5.41 GB

Minimum VRAM7.71 GB
+

llava-v1-5-7b-q4-k-m

+
NameLLaVA V1.5 7B (Q4-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 4.71 GB

Minimum VRAM7.04 GB
+

llava-v1-5-7b-q3-k-m

+
NameLLaVA V1.5 7B (Q3-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

Total Size: 3.92 GB

Minimum VRAM6.33 GB
+

llava-v1-5-13b

+
NameLLaVA V1.51 13B (Q8-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 14.48 GB

Minimum VRAM17.51 GB
+

llava-v1-5-13b-q6-k

+
NameLLaVA V1.51 13B (Q6-K) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 11.32 GB

Minimum VRAM14.54 GB
+

llava-v1-5-13b-q5-k-m

+
NameLLaVA V1.51 13B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 9.88 GB

Minimum VRAM13.17 GB
+

llava-v1-5-13b-q4-0

+
NameLLaVA V1.51 13B (Q4-0) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
  2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

Total Size: 8.01 GB

Minimum VRAM11.48 GB
+

llava-v1-6-34b-q5-k-m

+
NameLLaVA V1.6 34B (Q5-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 25.02 GB

Minimum VRAM24.96 GB
+

llava-v1-6-34b-q4-k-m

+
NameLLaVA V1.6 34B (Q4-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 21.36 GB

Minimum VRAM21.88 GB
+

llava-v1-6-34b-q3-k-m

+
NameLLaVA V1.6 34B (Q3-K-M) Image Captioning
AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023
https://arxiv.org/abs/2310.03744
LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
Files
  1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
  2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

Total Size: 17.35 GB

Minimum VRAM18.06 GB
+

moondream-v2 (default)

+
NameMoondream V2 Image Captioning
AuthorVikhyat Korrapati
Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024
https://huggingface.co/vikhyatk/moondream2
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
  2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

Total Size: 3.75 GB

Minimum VRAM4.44 GB
\ No newline at end of file