metadata
base_model: JackFram/llama-160m
datasets:
- wikipedia
inference: false
language:
- en
license: other
model_creator: JackFram
model_name: llama-160m
pipeline_tag: text-generation
quantized_by: afrideva
tags:
- gguf
- ggml
- quantized
- q2_k
- q3_k_m
- q4_k_m
- q5_k_m
- q6_k
- q8_0
JackFram/llama-160m-GGUF
Quantized GGUF model files for llama-160m from JackFram
Name | Quant method | Size |
---|---|---|
llama-160m.fp16.gguf | fp16 | 326.58 MB |
llama-160m.q2_k.gguf | q2_k | 77.23 MB |
llama-160m.q3_k_m.gguf | q3_k_m | 87.54 MB |
llama-160m.q4_k_m.gguf | q4_k_m | 104.03 MB |
llama-160m.q5_k_m.gguf | q5_k_m | 119.04 MB |
llama-160m.q6_k.gguf | q6_k | 135.00 MB |
llama-160m.q8_0.gguf | q8_0 | 174.33 MB |
Original Model Card:
Model description
This is a LLaMA-like model with only 160M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets.
No evaluation has been conducted yet, so use it with care.
The model is mainly developed as a base Small Speculative Model in the SpecInfer paper.
Citation
To cite the model, please use
@misc{miao2023specinfer,
title={SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification},
author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Zeyu Wang and Rae Ying Yee Wong and Zhuoming Chen and Daiyaan Arfeen and Reyna Abhyankar and Zhihao Jia},
year={2023},
eprint={2305.09781},
archivePrefix={arXiv},
primaryClass={cs.CL}
}