Edit Models filters

Inference status

Misc

compressed-tensors

Inference Endpoints

AutoTrain Compatible

text-generation-inference

8-bit precision

Misc with no match

4-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

559

Full-text search

Active filters: compressed-tensors

nm-testing/Meta-Llama-3-8B-Instruct-FP8-Dynamic-IA-Per-Tensor-Weight-testing

Updated Dec 6, 2024 • 27

CalamitousFelicitousness/Llama-3.3-70B-Instruct-W8A8-INT8

Updated Dec 7, 2024 • 337 • 3

yejingfu/q-Llama-3.3-70B-Instruct-888

Updated Dec 7, 2024 • 17.6k

cortecs/Llama-3_1-Nemotron-51B-Instruct-FP8-Dynamic

Text Generation • Updated Dec 8, 2024 • 78

nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor

Updated Dec 8, 2024 • 40

nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Channel-Weight-testing

Updated Dec 8, 2024 • 24

nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Tensor-Weight-testing

Updated Dec 8, 2024 • 24

nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Static-testing

Updated Dec 8, 2024 • 24

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-W8A8-testing

Updated Dec 8, 2024 • 12

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-channel_weights_per_token_dynamic_act_fp8-BitMaskCompressed

Updated Dec 10, 2024 • 2

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-tensor_weights_per_token_dynamic_act_fp8-BitMaskCompressed

Updated Dec 10, 2024

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-channel_weights_tensor_act_fp8-BitMaskCompressed

Updated Dec 10, 2024

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-tensor_weights_tensor_act_fp8-BitMaskCompressed

Updated Dec 10, 2024

Infermatic/Llama-3.3-70B-Instruct-FP8-Dynamic

Text Generation • Updated Dec 9, 2024 • 2.06k

Infermatic/L3.3-70B-Euryale-v2.3-FP8-Dynamic

Text Generation • Updated Dec 9, 2024 • 923

BigHuggyD/Sao10K_L3.3-70B-Euryale-v2.3-FP8-Dynamic

Text Generation • Updated Dec 9, 2024 • 348

horheynm/TinyLlama-1.1B-Chat-v1.0-W4A16-G128

Updated Dec 9, 2024 • 2

nm-testing/llama2.c-stories15M-pruned_50.2of4-uncompressed-tensor_weights_tensor_act_fp8-BitMaskCompressed

Updated Dec 10, 2024 • 1

horheynm/d

Updated Dec 10, 2024 • 2

horheynm/llama2.c_stories15M_pruned_50.2of4_compressed

Updated Dec 10, 2024 • 3

horheynm/llama2.c_stories15M_pruned_50.2of4_uncompressed

Updated Dec 10, 2024 • 2

BigHuggyD/EVA-UNIT-01_EVA-LLaMA-3.33-70B-v0.0-FP8-Dynamic

Text Generation • Updated Dec 10, 2024 • 22

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Sym

Updated Dec 10, 2024 • 685

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dynamic-Asym

Updated Dec 10, 2024 • 679

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Asym

Updated Dec 11, 2024 • 712

nm-testing/llama2.c-stories15M-pruned_50.0-2of4-BitMaskCompressed

Updated Dec 10, 2024

nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned_50.0-2of4-BitMaskCompressed

Updated Dec 10, 2024 • 4

nm-testing/Llama-3.2-1B-Instruct-W8A8-Static-Per-Tensor-Asym

Updated Dec 11, 2024 • 21

horheynm/TinyLlama_1.1B_Chat_v1.0_FP8_Dynamic_compressed

Updated Dec 11, 2024 • 138

horheynm/TinyLlama_1.1B_Chat_v1.0_FP8_Dynamic_uncompressed

Updated Dec 11, 2024 • 57