An open source real-time AI inference engine for seamless scaling
About
Taproot is a seamlessly scalable AI/ML inference engine designed for deployment across hardware clusters with disparate capabilities.
Why Taproot?
Most AI/ML inference engines are built for either large-scale cloud infrastructures or constrained edge devices - Taproot is designed for medium-scale deployments, offering flexible and distributed on-premise or PAYG setups. It efficiently uses older or consumer-grade hardware, making it suitable for small networks or ad-hoc clusters, without relying on centralized, hyperscale architectures.
Available Models
There are more than 150 models available across 18 task categories. See the Task Catalog for the complete list, licenses, requirements and citations. Despite the large number of models available, there are many more yet to be added - if you're looking for a particular enhancement, don't hesitate to make an issue on this repository to request it.
Roadmap
- IP Adapter Models for Diffusers Image Generation Pipelines
- ControlNet Models for Diffusers Image Generation Pipelines
- Additional quantization backends for large models
- Currently BitsandBytes (Int8/NF4) and GGUF (through llama.cpp) are supported with pre-quantized checkpoints available.
- FP8 support through Optimum-Quanto, TorchAO and custom kernels is in development.
- Improved multi-GPU support
- This is currently supported through manual configuration, but usability can be improved.
- Additional annotators/detectors for image and video
- E.g. Marigold, SAM2
- Additional audio generation models
- E.g. Stable Audio, AudioLDM, MusicGen
Installation
pip install taproot
Some additional packages are available to install with the square-bracket syntax (e.g. pip install taproot[a,b,c]
), these are:
- tools - Additional packages for LLM tools like DuckDuckGo Search, BeautifulSoup (for web scraping), etc.
- console - Additional packages for prettifying console output.
- av - Additional packages for reading and writing video.
Installing Tasks
Some tasks are available immediately, but most tasks required additional packages and files. Install these tasks with taproot install [task:model]+
, e.g:
taproot install image-generation:stable-diffusion-xl
Usage
Command-Line
Introspecting Tasks
From the command line, execute taproot tasks
to see all tasks and their availability status, or taproot info
for individual task information. For example:
taproot info image-generation stable-diffusion-xl
Stable Diffusion XL Image Generation (image-generation:stable-diffusion-xl, available)
Generate an image from text and/or images using a stable diffusion XL model.
Hardware Requirements:
GPU Required for Optimal Performance
Floating Point Precision: half
Minimum Memory (CPU RAM) Required: 231.71 MB
Minimum Memory (GPU VRAM) Required: 7.58 GB
Author:
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach
Published in arXiv, vol. 2307.01952, “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
https://arxiv.org/abs/2307.01952
License:
OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
✅ Attribution Required
✅ Derivatives Allowed
✅ Redistribution Allowed
✅ Copyleft (Share-Alike) Required
✅ Commercial Use Allowed
✅ Hosting Allowed
Files:
image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) [downloaded]
image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) [downloaded]
text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) [downloaded]
text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) [downloaded]
text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) [downloaded]
text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) [downloaded]
text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) [downloaded]
text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) [downloaded]
text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) [downloaded]
text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) [downloaded]
Total File Size: 7.11 GB
Required packages:
pil~=9.5 [installed]
torch<2.5,>=2.4 [installed]
numpy~=1.22 [installed]
diffusers>=0.29 [installed]
torchvision<0.20,>=0.19 [installed]
transformers>=4.41 [installed]
safetensors~=0.4 [installed]
accelerate~=1.0 [installed]
sentencepiece~=0.2 [installed]
compel~=2.0 [installed]
peft~=0.13 [installed]
Signature:
prompt: Union[str, List[str]], required
prompt_2: Union[str, List[str]], default: None
negative_prompt: Union[str, List[str]], default: None
negative_prompt_2: Union[str, List[str]], default: None
image: ImageType, default: None
mask_image: ImageType, default: None
guidance_scale: float, default: 5.0
guidance_rescale: float, default: 0.0
num_inference_steps: int, default: 20
num_images_per_prompt: int, default: 1
height: int, default: None
width: int, default: None
timesteps: List[int], default: None
sigmas: List[float], default: None
denoising_end: float, default: None
strength: float, default: None
latents: torch.Tensor, default: None
prompt_embeds: torch.Tensor, default: None
negative_prompt_embeds: torch.Tensor, default: None
pooled_prompt_embeds: torch.Tensor, default: None
negative_pooled_prompt_embeds: torch.Tensor, default: None
clip_skip: int, default: None
seed: SeedType, default: None
pag_scale: float, default: None
pag_adaptive_scale: float, default: None
scheduler: Literal[ddim, ddpm, ddpm_wuerstchen, deis_multistep, dpm_cogvideox, dpmsolver_multistep, dpmsolver_multistep_karras, dpmsolver_sde, dpmsolver_sde_multistep, dpmsolver_sde_multistep_karras, dpmsolver_singlestep, dpmsolver_singlestep_karras, edm_dpmsolver_multistep, edm_euler, euler_ancestral_discrete, euler_discrete, euler_discrete_karras, flow_match_euler_discrete, flow_match_heun_discrete, heun_discrete, ipndm, k_dpm_2_ancestral_discrete, k_dpm_2_ancestral_discrete_karras, k_dpm_2_discrete, k_dpm_2_discrete_karras, lcm, lms_discrete, lms_discrete_karras, pndm, tcd, unipc], default: None
output_format: Literal[png, jpeg, float, int, latent], default: png
output_upload: bool, default: False
highres_fix_factor: float, default: 1.0
highres_fix_strength: float, default: None
spatial_prompts: SpatialPromptInputType, default: None
Returns:
ImageResultType
Invoking Tasks
Run taproot invoke
to run any task from the command line. All parameters to the task can be passed as flags to the call using kebab-case, e.g.:
taproot invoke image-generation:stable-diffusion-xl \
--prompt "a photograph of a golden retriever at the park" \
--negative-prompt "fall, autumn, blurry, out-of-focus" \
--seed 12345
Loading task.
100%|███████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.27it/s]
Task loaded in 4.0 s.
Invoking task.
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00, 4.34it/s]
Task invoked in 6.5 s. Result:
8940aa12-66a7-4233-bfd6-f19da339b71b.png
Python
Direct Task Usage
from taproot import Task
sdxl = Task.get("image-generation", "stable-diffusion-xl")
pipeline = sdxl()
pipeline.load()
pipeline(prompt="Hello, world!").save("./output.png")
With a Remote Server
from taproot import Tap
tap = Tap()
tap.remote_address = "ws://127.0.0.1:32189"
result = tap.call("image-generation", model="stable-diffusion-xl", prompt="Hello, world!")
result.save("./output.png")
With a Local Server
Also shows asynchronous usage.
import asyncio
from taproot import Tap
with Tap.local() as tap:
loop = asyncio.get_event_loop()
result = loop.run_until_complete(tap("image-generation", model="stable-diffusion-xl", prompt="Hello, world!"))
result.save("./output.png")
Running Servers
Taproot uses a three-roled cluster structure:
- Overseers are entry points into clusters, routing requests to one or more dispatchers.
- Dispatchers are machines capable of running tasks by spawning executors.
- Executors are servers ready to execute a task.
The simplest way to run a server is to run an overseer simultaneously with a local dispatcher like so:
taproot overseer --local
This will run on the default address of ws://127.0.0.1:32189
, suitable for interaction from python or the browser.
There are many deployment possibilities across networks, with configuration available for encryption, listening addresses, and more. See the wiki for details (coming soon.)
Outside Python
- taproot.js - for the browser and node.js, available in ESM, UMD and IIFE
- taproot.php - coming soon
Task Catalog
18 tasks available with 171 models.
- echo: 1 model
- image-similarity: 2 models
- text-similarity: 1 model
- speech-enhancement: 1 model
- image-interpolation: 2 models
- background-removal: 1 model
- super-resolution: 2 models
- speech-synthesis: 2 models
- audio-transcription: 9 models
- depth-detection: 1 model
- line-detection: 4 models
- edge-detection: 3 models
- pose-detection: 2 models
- image-generation: 52 models
- video-generation: 23 models
- text-generation: 37 models
- visual-question-answering: 14 models
- image-captioning: 14 models
echo
Name | Echo |
Author | Benjamin Paine Taproot https://github.com/painebenjamin/taproot |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | N/A |
Minimum VRAM | N/A |
image-similarity
(default)
Name | Traditional Image Similarity |
Author | Benjamin Paine Taproot https://github.com/painebenjamin/taproot |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | N/A |
Minimum VRAM | N/A |
inception-v3
Name | Inception Image Similarity (FID) |
Author | Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna Google Research and University College London Published in CoRR, vol. 1512.00567, “Rethinking the Inception Architecture for Computer Vision”, 2015 https://arxiv.org/abs/1512.00567 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | image-similarity-inception.fp16.safetensors |
Minimum VRAM | 50.28 MB |
text-similarity
Name | Traditional Text Similarity |
Author | Benjamin Paine Taproot https://github.com/painebenjamin/taproot |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | N/A |
Minimum VRAM | N/A |
speech-enhancement
deep-filter-net-v3 (default)
Name | DeepFilterNet V3 Speech Enhancement |
Author | Hendrick Schröter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier Published in INTERSPEECH, “DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023 https://arxiv.org/abs/2305.08227 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | speech-enhancement-deep-filter-net-3.safetensors |
Minimum VRAM | 87.89 MB |
image-interpolation
film (default)
Name | Frame Interpolation for Large Motion (FiLM) Image Interpolation |
Author | Fitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless Google Research and University of Washington Published in ECCV, “FiLM: Frame Interpolation for Large Motion”, 2022 https://arxiv.org/abs/2202.04901 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | image-interpolation-film-net.fp16.pt |
Minimum VRAM | 70.00 MB |
rife
Name | Real-Time Intermediate Flow Estimation (RIFE) Image Interpolation |
Author | Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence Published in ECCV, “Real-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022 https://arxiv.org/abs/2011.06294 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | image-interpolation-rife-flownet.safetensors |
Minimum VRAM | 22.68 MB |
background-removal
backgroundremover (default)
Name | BackgroundRemover |
Author | Johnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis https://github.com/nadermx/backgroundremover |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | background-removal-u2net.safetensors |
Minimum VRAM | 217.62 MB |
super-resolution
aura
Name | Aura Super Resolution |
Author | fal.ai Published in fal.ai blog, “Introducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024 https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/ |
License | CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/) |
Files | super-resolution-aura.fp16.safetensors |
Minimum VRAM | 1.24 GB |
aura-v2 (default)
Name | Aura Super Resolution V2 |
Author | fal.ai Published in fal.ai blog, “AuraSR V2”, 2024 https://blog.fal.ai/aurasr-v2/ |
License | CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/) |
Files | super-resolution-aura-v2.fp16.safetensors |
Minimum VRAM | 1.24 GB |
speech-synthesis
xtts-v2 (default)
Name | XTTS2 Speech Synthesis |
Author | Coqui AI Published in Coqui AI Blog, “XTTS: Open Model Release Announcement”, 2023 https://coqui.ai/blog/tts/open_xtts |
License | Mozilla Public License 2.0 (https://www.mozilla.org/en-US/MPL/2.0/) |
Files |
Total Size: 1.88 GB |
Minimum VRAM | 1.91 GB |
f5tts
Name | F5TTS Speech Synthesis |
Author | Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen Published in arXiv, vol. 2410.06885, “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024 https://arxiv.org/abs/2410.06885 |
License | CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/) |
Files |
Total Size: 1.40 GB |
Minimum VRAM | 3.94 GB |
audio-transcription
whisper-tiny
Name | Whisper Tiny Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 154.92 MB |
Minimum VRAM | 147.85 MB |
whisper-base
Name | Whisper Base Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 294.27 MB |
Minimum VRAM | 285.74 MB |
whisper-small
Name | Whisper Small Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 970.86 MB |
Minimum VRAM | 945.03 MB |
whisper-medium
Name | Whisper Medium Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 3.06 GB |
Minimum VRAM | 3.06 GB |
whisper-large-v3
Name | Whisper Large V3 Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 3.09 GB |
Minimum VRAM | 3.09 GB |
distilled-whisper-small-english
Name | Distilled Whisper Small (English) Audio Transcription |
Author | Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 336.21 MB |
Minimum VRAM | 649.01 MB |
distilled-whisper-medium-english
Name | Distilled Whisper Medium (English) Audio Transcription |
Author | Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 792.71 MB |
Minimum VRAM | 1.58 GB |
distilled-whisper-large-v3 (default)
Name | Distilled Whisper Large V3 Audio Transcription |
Author | Sanchit Gandhi, Patrick von Platen and Alexander M. Rush Hugging Face Published in arXiv, vol. 2311.00430, “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023 https://arxiv.org/abs/2311.00430 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 1.52 GB |
Minimum VRAM | 1.51 GB |
turbo-whisper-large-v3
Name | Turbo Whisper Large V3 Audio Transcription |
Author | Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever OpenAI Published in arXiv, vol. 2212.04356, “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 1.62 GB |
Minimum VRAM | 1.62 GB |
depth-detection
midas (default)
Name | MiDaS Depth Detection |
Author | René Ranftl, Alexey Bochkovskiy and Vladlen Koltun Published in arXiv, vol. 2103.13413, “Vision Transformers for Dense Prediction”, 2021 https://arxiv.org/abs/2103.13413 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | depth-detection-midas.fp16.safetensors |
Minimum VRAM | 255.65 MB |
line-detection
informative-drawings (default)
Name | Informative Drawings Line Art Detection |
Author | Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | line-detection-informative-drawings.fp16.safetensors |
Minimum VRAM | 8.58 MB |
informative-drawings-coarse
Name | Informative Drawings Coarse Line Art Detection |
Author | Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | line-detection-informative-drawings-coarse.fp16.safetensors |
Minimum VRAM | 8.58 MB |
informative-drawings-anime
Name | Informative Drawings Anime Line Art Detection |
Author | Caroline Chan, Fredo Durand and Phillip Isola Massachusetts Institute of Technology Published in arXiv, vol. 2203.12691, “Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022 https://arxiv.org/abs/2203.12691 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | line-detection-informative-drawings-anime.fp16.safetensors |
Minimum VRAM | 108.81 MB |
mlsd
Name | Mobile Line Segment Detection |
Author | Geonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin NAVER/LINE Vision Published in arXiv, vol. 2106.00186, “Towards Light-weight and Real-time Line Segment Detection”, 2022 https://arxiv.org/abs/2106.00186 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | line-detection-mlsd.fp16.safetensors |
Minimum VRAM | 3.22 MB |
edge-detection
canny (default)
Name | Canny Edge Detection |
Author | John Canny Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, “A Computational Approach to Edge Detection”, 1986 https://ieeexplore.ieee.org/document/4767851 Implementation by OpenCV (https://opencv.org/) |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | N/A |
Minimum VRAM | N/A |
hed
Name | Holistically-Nested Edge Detection |
Author | Saining Xieand Zhuowen Tu University of California, San Diego Published in arXiv, vol. 1504.06375, “Holistically-Nested Edge Detection”, 2015 https://arxiv.org/abs/1504.06375 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files | edge-detection-hed.fp16.safetensors |
Minimum VRAM | 29.44 MB |
pidi
Name | Soft Edge (PIDI) Detection |
Author | Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen and Li Liu Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, “Pixel Difference Networks for Efficient Edge Detection”, 2021 |
License | MIT License with Non-Commercial Clause (https://github.com/hellozhuo/pidinet/blob/master/LICENSE) |
Files | edge-detection-pidi.fp16.safetensors |
Minimum VRAM | 1.40 MB |
pose-detection
openpose
Name | OpenPose Pose Detection |
Author | Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh Published in arXiv, vol. 1812.08008, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018 https://arxiv.org/abs/1812.08008 |
License | OpenPose Academic or Non-Profit Non-Commercial Research License (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/LICENSE) |
Files | pose-detection-openpose.fp16.safetensors |
Minimum VRAM | 259.96 MB |
dwpose (default)
Name | DWPose Pose Detection |
Author | Zhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA) Published in arXiv, vol. 2307.15880, “Effective Whole-body Pose Estimation with Two-stages Distillation”, 2023 https://arxiv.org/abs/2307.15880 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 351.85 MB |
Minimum VRAM | 354.64 MB |
image-generation
stable-diffusion-v1-5
Name | Stable Diffusion v1.5 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 |
License | OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-abyssorange-mix-v3
Name | AbyssOrange Mix V3 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by liudinglin (https://civitai.com/user/liudinglin) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/17233) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-chillout-mix-ni
Name | Chillout Mix Ni Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Dreamlike Art (https://dreamlike.art) |
License | OpenRAIL-M License with Restrictions (https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0/blob/main/LICENSE.md) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-clarity-v3
Name | Clarity V3 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by ndimensional (https://civitai.com/user/ndimensional) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/142125) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-dark-sushi-mix-v2-25d
Name | Dark Sushi Mix V2 2.5D Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Aitasai (https://civitai.com/user/Aitasai) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/93208) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-divine-elegance-mix-v10
Name | Divine Elegance Mix V10 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by TroubleDarkness (https://civitai.com/user/TroubleDarkness) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/432048) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-dreamshaper-v8
Name | DreamShaper V8 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Lykon (https://civitai.com/user/Lykon) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/128713) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-epicrealism-v5
Name | epiCRealism V5 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by epinikion (https://civitai.com/user/epinikion) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/143906) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity
Name | epiCPhotoGasm Ultimate Fidelity Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by epinikion (https://civitai.com/user/epinikion) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/429454) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-ghostmix-v2
Name | GhostMix V2 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by _GhostInShell_ (https://civitai.com/user/_GhostInShell_) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/76907) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-lyriel-v1-6
Name | Lyriel V1.6 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Lyriel (https://civitai.com/user/Lyriel) |
License | OpenRAIL-M License (https://civitai.com/models/license/72396) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-majicmix-realistic-v7
Name | MajicMix Realistic V7 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Merjic (https://civitai.com/user/Merjic) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/176425) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-meinamix-v12
Name | MeinaMix V12 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Meina (https://civitai.com/user/Meina) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/948574) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-mistoon-anime-v3
Name | Mistoon Anime V3 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Inzaniak (https://civitai.com/user/Inzaniak) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/348981) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-perfect-world-v6
Name | Perfect World V6 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Bloodsuga (https://civitai.com/user/Bloodsuga) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/179446) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-photon-v1
Name | Photon V1 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Photographer (https://civitai.com/user/Photographer) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/900072) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-realcartoon3d-v17
Name | RealCartoon3D V17 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by 7whitefire7 (https://civitai.com/user/7whitefire7) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/637156) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-realistic-vision-v5-1
Name | Realistic Vision V5.1 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by SG_161222 (https://civitai.com/user/SG_161222) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/130072) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-realistic-vision-v6-0
Name | Realistic Vision V6.0 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by SG_161222 (https://civitai.com/user/SG_161222) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/245592) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-rev-animated-v2
Name | ReV Animated V2 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Zovya (https://civitai.com/user/Zovya) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/425083) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-v1-5-toonyou-beta-v6
Name | ToonYou Beta V6 Image Generation |
Author | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, “High-Resolution Image Synthesis With Latent Diffusion Models”, 2022 https://arxiv.org/abs/2112.10752 Finetuned by Bradcatt (https://civitai.com/user/Bradcatt) |
License | OpenRAIL-M License with Restrictions (https://civitai.com/models/license/125771) |
Files |
Total Size: 2.13 GB |
Minimum VRAM | 2.58 GB |
stable-diffusion-xl
stable-diffusion-xl-albedobase-v3-1
stable-diffusion-xl-anything
stable-diffusion-xl-animagine-v3-1
stable-diffusion-xl-copax-timeless-v13
stable-diffusion-xl-counterfeit-v2-5
stable-diffusion-xl-dreamshaper-alpha-v2
stable-diffusion-xl-helloworld-v7
stable-diffusion-xl-juggernaut-v11 (default)
stable-diffusion-xl-lightning-8-step
stable-diffusion-xl-lightning-4-step
stable-diffusion-xl-lightning-2-step
stable-diffusion-xl-nightvision-v9
stable-diffusion-xl-realvis-v5
stable-diffusion-xl-stoiqo-newreality-pro
stable-diffusion-xl-turbo
stable-diffusion-xl-unstable-diffusers-nihilmania
stable-diffusion-xl-zavychroma-v10
stable-diffusion-v3-medium
stable-diffusion-v3-5-medium
stable-diffusion-v3-5-large
stable-diffusion-v3-5-large-int8
stable-diffusion-v3-5-large-nf4
flux-v1-dev
Name | FluxDev |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 33.74 GB |
Minimum VRAM | 29.50 GB |
flux-v1-dev-int8
Name | FluxDevInt8 |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 18.24 GB |
Minimum VRAM | 21.22 GB |
flux-v1-dev-stoiqo-newreality-alpha-v2-int8
Name | Stoiqo NewReality F1.D Alpha V2 (Int8) Image Generation |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 18.24 GB |
Minimum VRAM | 21.22 GB |
flux-v1-dev-nf4
Name | FluxDevNF4 |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 13.44 GB |
Minimum VRAM | 14.36 GB |
flux-v1-dev-stoiqo-newreality-alpha-v2-nf4
Name | Stoiqo NewReality F1.D Alpha V2 (NF4) Image Generation |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 13.44 GB |
Minimum VRAM | 14.36 GB |
flux-v1-schnell
Name | FluxSchnell |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 33.72 GB |
Minimum VRAM | 29.50 GB |
flux-v1-schnell-int8
Name | FluxSchnellInt8 |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 18.23 GB |
Minimum VRAM | 21.22 GB |
flux-v1-schnell-nf4
Name | FluxSchnellNF4 |
Author | Black Forest Labs Published in Black Forest Labs Blog, “Announcing Black Forest Labs”, 2024 https://blackforestlabs.ai/announcing-black-forest-labs/ |
License | FLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) |
Files |
Total Size: 13.44 GB |
Minimum VRAM | 14.36 GB |
video-generation
cogvideox-2b
Name | CogVideoX 2B Video Generation |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 13.34 GB |
Minimum VRAM | 13.48 GB |
cogvideox-2b-int8
Name | CogVideoX 2B Video Generation (Int8) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 8.04 GB |
Minimum VRAM | 11.48 GB |
cogvideox-5b
Name | CogVideoX 5B Video Generation |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 21.10 GB |
Minimum VRAM | 21.48 GB |
cogvideox-5b-int8
Name | CogVideoX 5B Video Generation (Int8) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 11.92 GB |
Minimum VRAM | 17.48 GB |
cogvideox-5b-nf4
Name | CogVideoX 5B Video Generation (NF4) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 9.90 GB |
Minimum VRAM | 12.48 GB |
cogvideox-i2v-5b
Name | CogVideoX 5B Image-to-Video Generation |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 21.21 GB |
Minimum VRAM | 21.48 GB |
cogvideox-i2v-5b-int8
Name | CogVideoX 5B Image-to-Video Generation (Int8) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 17.59 GB |
Minimum VRAM | 17.48 GB |
cogvideox-i2v-5b-nf4
Name | CogVideoX 5B Image-to-Video Generation (NF4) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 10.01 GB |
Minimum VRAM | 12.48 GB |
cogvideox-v1-5-5b
Name | CogVideoX V1.5 5B Video Generation |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 21.10 GB |
Minimum VRAM | 21.48 GB |
cogvideox-v1-5-5b-int8
Name | CogVideoX V1.5 5B Video Generation (Int8) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 11.92 GB |
Minimum VRAM | 17.48 GB |
cogvideox-v1-5-5b-nf4
Name | CogVideoX V1.5 5B Video Generation (NF4) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 9.90 GB |
Minimum VRAM | 12.48 GB |
cogvideox-v1-5-i2v-5b
Name | CogVideoX V1.5 5B Image-to-Video Generation |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 21.10 GB |
Minimum VRAM | 21.48 GB |
cogvideox-v1-5-i2v-5b-int8
Name | CogVideoX V1.5 5B Image-to-Video Generation (Int8) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 11.92 GB |
Minimum VRAM | 17.48 GB |
cogvideox-v1-5-i2v-5b-nf4
Name | CogVideoX V1.5 5B Image-to-Video Generation (NF4) |
Author | Zhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang Zhipu AI and Tsinghua University Published in arXiv, vol. 2408.06072, “CogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024 https://arxiv.org/abs/2408.06072 |
License | CogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE) |
Files |
Total Size: 9.90 GB |
Minimum VRAM | 12.48 GB |
hunyuan
Name | Hunyuan Video Generation |
Author | Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603 |
License | Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt) |
Files |
Total Size: 41.90 GB |
Minimum VRAM | 38.30 GB |
hunyuan-int8
Name | Hunyuan Video Generation |
Author | Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603 |
License | Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt) |
Files |
Total Size: 22.13 GB |
Minimum VRAM | 23.30 GB |
hunyuan-nf4
Name | Hunyuan Video Generation |
Author | Hunyuan Foundation Model Team Tencent Published in arXiv, vol. 2412.03603, “HunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024 https://arxiv.org/abs/2412.03603 |
License | Tencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt) |
Files |
Total Size: 13.45 GB |
Minimum VRAM | 14.78 GB |
ltx (default)
Name | LTX Video Generation |
Author | Lightricks https://github.com/Lightricks/LTX-Video |
License | OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m) |
Files |
Total Size: 15.24 GB |
Minimum VRAM | 15.28 GB |
ltx-int8
Name | LTX Video Generation |
Author | Lightricks https://github.com/Lightricks/LTX-Video |
License | OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m) |
Files |
Total Size: 9.70 GB |
Minimum VRAM | 9.72 GB |
ltx-nf4
Name | LTX Video Generation |
Author | Lightricks https://github.com/Lightricks/LTX-Video |
License | OpenRAIL-M License (https://bigscience.huggingface.co/blog/bigscience-openrail-m) |
Files |
Total Size: 9.28 GB |
Minimum VRAM | 7.29 GB |
mochi-v1
Name | Mochi Video Generation |
Author | Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog |
License | |
Files |
Total Size: 30.50 GB |
Minimum VRAM | 22.95 GB |
mochi-v1-int8
Name | Mochi Video Generation |
Author | Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog |
License | |
Files |
Total Size: 16.87 GB |
Minimum VRAM | 15.95 GB |
mochi-v1-nf4
Name | Mochi Video Generation |
Author | Genmo AI Published in Genmo AI Blog, “Mochi 1: A new SOTA in open-source video generation models”, 2024 https://www.genmo.ai/blog |
License | |
Files |
Total Size: 12.89 GB |
Minimum VRAM | 12.41 GB |
text-generation
llama-v3-8b
Name | Llama V3.0 8B Text Generation |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-q8-0.gguf |
Minimum VRAM | 9.64 GB |
llama-v3-8b-q6-k
Name | Llama V3.0 8B Text Generation (Q6-K) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-q6-k.gguf |
Minimum VRAM | 8.10 GB |
llama-v3-8b-q5-k-m
Name | Llama V3.0 8B Text Generation (Q5-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-q5-k-m.gguf |
Minimum VRAM | 7.30 GB |
llama-v3-8b-q4-k-m
Name | Llama V3.0 8B Text Generation (Q4-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-q4-k-m.gguf |
Minimum VRAM | 6.56 GB |
llama-v3-8b-q3-k-m
Name | Llama V3.0 8B Text Generation (Q3-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-q3-k-m.gguf |
Minimum VRAM | 5.72 GB |
llama-v3-8b-instruct
Name | Llama V3.0 8B Instruct Text Generation |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-instruct-q8-0.gguf |
Minimum VRAM | 9.64 GB |
llama-v3-8b-instruct-q6-k
Name | Llama V3.0 8B Instruct Text Generation (Q6-K) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-instruct-q6-k.gguf |
Minimum VRAM | 8.10 GB |
llama-v3-8b-instruct-q5-k-m
Name | Llama V3.0 8B Instruct Text Generation (Q5-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-instruct-q5-k-m.gguf |
Minimum VRAM | 7.30 GB |
llama-v3-8b-instruct-q4-k-m
Name | Llama V3.0 8B Instruct Text Generation (Q4-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-instruct-q4-k-m.gguf |
Minimum VRAM | 6.56 GB |
llama-v3-8b-instruct-q3-k-m
Name | Llama V3.0 8B Instruct Text Generation (Q3-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-8b-instruct-q3-k-m.gguf |
Minimum VRAM | 5.72 GB |
llama-v3-1-8b-instruct
Name | Llama V3.1 8B Instruct Text Generation |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-1-8b-instruct-q8-0.gguf |
Minimum VRAM | 9.64 GB |
llama-v3-1-8b-instruct-q6-k (default)
Name | Llama V3.1 8B Instruct Text Generation (Q6-K) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-1-8b-instruct-q6-k.gguf |
Minimum VRAM | 8.10 GB |
llama-v3-1-8b-instruct-q5-k-m
Name | Llama V3.1 8B Instruct Text Generation (Q5-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-1-8b-instruct-q5-k-m.gguf |
Minimum VRAM | 7.30 GB |
llama-v3-1-8b-instruct-q4-k-m
Name | Llama V3.1 8B Instruct Text Generation (Q4-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-1-8b-instruct-q4-k-m.gguf |
Minimum VRAM | 6.56 GB |
llama-v3-1-8b-instruct-q3-k-m
Name | Llama V3.1 8B Instruct Text Generation (Q3-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-1-8b-instruct-q3-k-m.gguf |
Minimum VRAM | 5.72 GB |
llama-v3-2-3b-instruct
Name | Llama V3.2 3B Instruct Text Generation |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-f16.gguf |
Minimum VRAM | 8.04 GB |
llama-v3-2-3b-instruct-q8-0
Name | Llama V3.2 3B Instruct Text Generation (Q8-0) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-q8-0.gguf |
Minimum VRAM | 5.02 GB |
llama-v3-2-3b-instruct-q6-k
Name | Llama V3.2 3B Instruct Text Generation (Q6-K) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-q6-k.gguf |
Minimum VRAM | 4.20 GB |
llama-v3-2-3b-instruct-q5-k-m
Name | Llama V3.2 3B Instruct Text Generation (Q5-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-q5-k-m.gguf |
Minimum VRAM | 3.90 GB |
llama-v3-2-3b-instruct-q4-k-m
Name | Llama V3.2 3B Instruct Text Generation (Q4-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-q4-k-m.gguf |
Minimum VRAM | 3.50 GB |
llama-v3-2-3b-instruct-q3-k-l
Name | Llama V3.2 3B Instruct Text Generation (Q3-K-L) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-3b-instruct-q3-k-l.gguf |
Minimum VRAM | 3.10 GB |
llama-v3-2-1b-instruct
Name | Llama V3.2 1B Instruct Text Generation |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-f16.gguf |
Minimum VRAM | 3.60 GB |
llama-v3-2-1b-instruct-q8-0
Name | Llama V3.2 1B Instruct Text Generation (Q8-0) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-q8-0.gguf |
Minimum VRAM | 2.43 GB |
llama-v3-2-1b-instruct-q6-k
Name | Llama V3.2 1B Instruct Text Generation (Q6-K) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-q6-k.gguf |
Minimum VRAM | 2.15 GB |
llama-v3-2-1b-instruct-q5-k-m
Name | Llama V3.2 1B Instruct Text Generation (Q5-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-q5-k-m.gguf |
Minimum VRAM | 2.02 GB |
llama-v3-2-1b-instruct-q4-k-m
Name | Llama V3.2 1B Instruct Text Generation (Q4-K-M) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-q4-k-m.gguf |
Minimum VRAM | 1.64 GB |
llama-v3-2-1b-instruct-q3-k-l
Name | Llama V3.2 1B Instruct Text Generation (Q3-K-L) |
Author | Meta AI Published in arXiv, vol. 2407.21783, “The Llama 3 Herd of Models”, 2024 https://arxiv.org/abs/2407.21783 |
License | Meta Llama 3 Community License (https://www.llama.com/llama3/license/) |
Files | text-generation-llama-v3-2-1b-instruct-q3-k-l.gguf |
Minimum VRAM | 1.58 GB |
zephyr-7b-alpha
Name | Zephyr 7B α Text Generation (Q8) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-alpha-7b-q8-0.gguf |
Minimum VRAM | 9.40 GB |
zephyr-7b-alpha-q6-k
Name | Zephyr 7B α Text Generation (Q6-K) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-alpha-7b-q6-k.gguf |
Minimum VRAM | 8.20 GB |
zephyr-7b-alpha-q5-k-m
Name | Zephyr 7B α Text Generation (Q5-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-alpha-7b-q5-k-m.gguf |
Minimum VRAM | 7.25 GB |
zephyr-7b-alpha-q4-k-m
Name | Zephyr 7B α Text Generation (Q4-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-alpha-7b-q4-k-m.gguf |
Minimum VRAM | 6.30 GB |
zephyr-7b-alpha-q3-k-m
Name | Zephyr 7B α Text Generation (Q3-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-alpha-7b-q3-k-m.gguf |
Minimum VRAM | 5.35 GB |
zephyr-7b-beta
Name | Zephyr 7B β Text Generation |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-beta-7b-q8-0.gguf |
Minimum VRAM | 9.40 GB |
zephyr-7b-beta-q6-k
Name | Zephyr 7B β Text Generation (Q6-K) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-beta-7b-q6-k.gguf |
Minimum VRAM | 8.20 GB |
zephyr-7b-beta-q5-k-m
Name | Zephyr 7B β Text Generation (Q5-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-beta-7b-q5-k-m.gguf |
Minimum VRAM | 7.25 GB |
zephyr-7b-beta-q4-k-m
Name | Zephyr 7B β Text Generation (Q4-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-beta-7b-q4-k-m.gguf |
Minimum VRAM | 6.30 GB |
zephyr-7b-beta-q3-k-m
Name | Zephyr 7B β Text Generation (Q3-K-M) |
Author | Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf Published in arXiv, vol. 2310.16944, “Zephyr: Direct Distillation of LM Alignment”, 2023 https://arxiv.org/abs/2310.16944 |
License | MIT License (https://opensource.org/licenses/MIT) |
Files | text-generation-zephyr-beta-7b-q3-k-m.gguf |
Minimum VRAM | 5.35 GB |
visual-question-answering
llava-v1-5-7b
Name | LLaVA V1.5 7B Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 14.10 GB |
Minimum VRAM | 15.80 GB |
llava-v1-5-7b-q8
Name | LLaVA V1.5 7B (Q8-0) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 7.79 GB |
Minimum VRAM | 9.90 GB |
llava-v1-5-7b-q6-k
Name | LLaVA V1.5 7B (Q6-K) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 6.15 GB |
Minimum VRAM | 8.40 GB |
llava-v1-5-7b-q5-k-m
Name | LLaVA V1.5 7B (Q5-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 5.41 GB |
Minimum VRAM | 7.71 GB |
llava-v1-5-7b-q4-k-m
Name | LLaVA V1.5 7B (Q4-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 4.71 GB |
Minimum VRAM | 7.04 GB |
llava-v1-5-7b-q3-k-m
Name | LLaVA V1.5 7B (Q3-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 3.92 GB |
Minimum VRAM | 6.33 GB |
llava-v1-5-13b
Name | LLaVA V1.51 13B (Q8-0) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 14.48 GB |
Minimum VRAM | 17.51 GB |
llava-v1-5-13b-q6-k
Name | LLaVA V1.51 13B (Q6-K) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 11.32 GB |
Minimum VRAM | 14.54 GB |
llava-v1-5-13b-q5-k-m
Name | LLaVA V1.51 13B (Q5-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 9.88 GB |
Minimum VRAM | 13.17 GB |
llava-v1-5-13b-q4-0
Name | LLaVA V1.51 13B (Q4-0) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 8.01 GB |
Minimum VRAM | 11.48 GB |
llava-v1-6-34b-q5-k-m
Name | LLaVA V1.6 34B (Q5-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 25.02 GB |
Minimum VRAM | 24.96 GB |
llava-v1-6-34b-q4-k-m
Name | LLaVA V1.6 34B (Q4-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 21.36 GB |
Minimum VRAM | 21.88 GB |
llava-v1-6-34b-q3-k-m
Name | LLaVA V1.6 34B (Q3-K-M) Visual Question Answering |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 17.35 GB |
Minimum VRAM | 18.06 GB |
moondream-v2 (default)
Name | Moondream V2 Visual Question Answering |
Author | Vikhyat Korrapati Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024 https://huggingface.co/vikhyatk/moondream2 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 3.75 GB |
Minimum VRAM | 4.44 GB |
image-captioning
llava-v1-5-7b
Name | LLaVA V1.5 7B Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 14.10 GB |
Minimum VRAM | 15.80 GB |
llava-v1-5-7b-q8
Name | LLaVA V1.5 7B (Q8-0) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 7.79 GB |
Minimum VRAM | 9.90 GB |
llava-v1-5-7b-q6-k
Name | LLaVA V1.5 7B (Q6-K) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 6.15 GB |
Minimum VRAM | 8.40 GB |
llava-v1-5-7b-q5-k-m
Name | LLaVA V1.5 7B (Q5-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 5.41 GB |
Minimum VRAM | 7.71 GB |
llava-v1-5-7b-q4-k-m
Name | LLaVA V1.5 7B (Q4-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 4.71 GB |
Minimum VRAM | 7.04 GB |
llava-v1-5-7b-q3-k-m
Name | LLaVA V1.5 7B (Q3-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 3.92 GB |
Minimum VRAM | 6.33 GB |
llava-v1-5-13b
Name | LLaVA V1.51 13B (Q8-0) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 14.48 GB |
Minimum VRAM | 17.51 GB |
llava-v1-5-13b-q6-k
Name | LLaVA V1.51 13B (Q6-K) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 11.32 GB |
Minimum VRAM | 14.54 GB |
llava-v1-5-13b-q5-k-m
Name | LLaVA V1.51 13B (Q5-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 9.88 GB |
Minimum VRAM | 13.17 GB |
llava-v1-5-13b-q4-0
Name | LLaVA V1.51 13B (Q4-0) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 8.01 GB |
Minimum VRAM | 11.48 GB |
llava-v1-6-34b-q5-k-m
Name | LLaVA V1.6 34B (Q5-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 25.02 GB |
Minimum VRAM | 24.96 GB |
llava-v1-6-34b-q4-k-m
Name | LLaVA V1.6 34B (Q4-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 21.36 GB |
Minimum VRAM | 21.88 GB |
llava-v1-6-34b-q3-k-m
Name | LLaVA V1.6 34B (Q3-K-M) Image Captioning |
Author | Haotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee Published in arXiv, vol. 2310.03744, “Improved Baselines with Visual Instruction Tuning”, 2023 https://arxiv.org/abs/2310.03744 |
License | Meta Llama 2 Community License (https://www.llama.com/llama2/license/) |
Files |
Total Size: 17.35 GB |
Minimum VRAM | 18.06 GB |
moondream-v2 (default)
Name | Moondream V2 Image Captioning |
Author | Vikhyat Korrapati Published in Hugging Face, vol. 10.57967/hf/3219, “Moondream2”, 2024 https://huggingface.co/vikhyatk/moondream2 |
License | Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) |
Files |
Total Size: 3.75 GB |
Minimum VRAM | 4.44 GB |
- Downloads last month
- 974
16-bit