Text Generation
GGUF
English
Inference Endpoints
aashish1904 commited on
Commit
57c12bf
·
verified ·
1 Parent(s): 3c6562d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ datasets:
8
+ - wikipedia
9
+ pipeline_tag: text-generation
10
+
11
+ ---
12
+
13
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
14
+
15
+
16
+ # QuantFactory/llama-160m-GGUF
17
+ This is quantized version of [JackFram/llama-160m](https://huggingface.co/JackFram/llama-160m) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+ ## Model description
22
+ This is a LLaMA-like model with only 160M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets.
23
+
24
+ No evaluation has been conducted yet, so use it with care.
25
+
26
+ The model is mainly developed as a base Small Speculative Model in the [SpecInfer](https://arxiv.org/abs/2305.09781) paper.
27
+
28
+ ## Citation
29
+ To cite the model, please use
30
+ ```bibtex
31
+ @misc{miao2023specinfer,
32
+ title={SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification},
33
+ author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Zeyu Wang and Rae Ying Yee Wong and Zhuoming Chen and Daiyaan Arfeen and Reyna Abhyankar and Zhihao Jia},
34
+ year={2023},
35
+ eprint={2305.09781},
36
+ archivePrefix={arXiv},
37
+ primaryClass={cs.CL}
38
+ }
39
+ ```