mwitiderrick commited on
Commit
e2bc685
·
1 Parent(s): c405269

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -4
README.md CHANGED
@@ -1,11 +1,25 @@
1
  ---
 
 
 
 
 
 
 
 
2
  tags:
3
  - deepsparse
4
  ---
5
- ## DeepSparse One-Shot and Export of https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
6
-
7
- ## Usage
8
 
 
 
 
 
 
 
 
9
  ```python
10
  from deepsparse import TextGeneration
11
  prompt='### Instruction:\nWrite a Perl script that processes a log file and counts the occurrences of different HTTP status codes. The script should accept the log file path as a command-line argument and print the results to the console in descending order of frequency.\n\n### Response:\n'
@@ -55,4 +69,38 @@ foreach my ($code, $freq) (@sorted) {
55
  }
56
  ```
57
  """
58
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: HuggingFaceH4/zephyr-7b-beta
3
+ inference: false
4
+ model_type: mistral
5
+ prompt_template: |
6
+ ### Instruction:\n
7
+ {prompt}
8
+ ### Response:\n
9
+ quantized_by: mwitiderrick
10
  tags:
11
  - deepsparse
12
  ---
13
+ ## Zephyr 7B β - DeepSparse
14
+ This repo contains model files for [Zephyr 7B β](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
 
15
 
16
+ This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
17
+ ## Inference
18
+ Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
19
+ ```bash
20
+ pip install deepsparse-nightly[llm]
21
+ ```
22
+ Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
23
  ```python
24
  from deepsparse import TextGeneration
25
  prompt='### Instruction:\nWrite a Perl script that processes a log file and counts the occurrences of different HTTP status codes. The script should accept the log file path as a command-line argument and print the results to the console in descending order of frequency.\n\n### Response:\n'
 
69
  }
70
  ```
71
  """
72
+ ```
73
+
74
+ ## Prompt template
75
+ ```
76
+
77
+ ### Instruction:\n
78
+ {prompt}
79
+ ### Response:\n
80
+ ```
81
+ ## Sparsification
82
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
83
+
84
+ ```bash
85
+ git clone https://github.com/neuralmagic/sparseml
86
+ pip install -e "sparseml[transformers]"
87
+ python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py HuggingFaceH4/zephyr-7b-beta open_platypus --recipe recipe.yaml --save True
88
+ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment
89
+ cp deployment/model.onnx deployment/model-orig.onnx
90
+ ```
91
+ Run this kv-cache injection to speed up the model at inference by caching the Key and Value states:
92
+ ```python
93
+ import os
94
+ import onnx
95
+ from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
96
+ input_file = "deployment/model-orig.onnx"
97
+ output_file = "deployment/model.onnx"
98
+ model = onnx.load(input_file, load_external_data=False)
99
+ model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
100
+ onnx.save(model, output_file)
101
+ print(f"Modified model saved to: {output_file}")
102
+ ```
103
+ Follow the instructions on our [One Shot With SparseML](https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq) page for a step-by-step guide for performing one-shot quantization of large language models.
104
+ ## Slack
105
+
106
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)