Model Card for Llama3-8B-1.58-100B-tokens-GGUF

Llama3-8B-1.58-100B-tokens-GGUF

The Llama3-8B-1.58-100B-tokens-q2b0 is a quantized version of Llama3-8B-1.58-100B-tokens-GGUF, leveraging the q2b0 quantization method from Candle. This enables extreme compression while maintaining strong performance across various NLP tasks.

Model Details

Model Sources

Quantization Details

The model has been quantized using the q2b0 method from Candle. This approach reduces model size significantly while preserving performance. For more details on this quantization technique, refer to the Candle PR #2683.

Training Details

For details on the dataset and training process, refer to the original Llama3-8B-1.58-100B-tokens.

Downloads last month
74
GGUF
Model size
8.03B params
Architecture
llama
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nebuxcloud/Llama3-8B-1.58-100B-tokens-GGUF

Quantized
(202)
this model

Collection including nebuxcloud/Llama3-8B-1.58-100B-tokens-GGUF