File size: 3,205 Bytes
1b3328c
 
 
 
822ebbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f37cd27
822ebbe
1b3328c
 
 
 
 
 
 
 
 
 
3ea662a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
tags:
- deepsparse
---
## Usage

```python
from deepsparse import TextGeneration

prompt = "How to get in a good university?"
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"

model = TextGeneration(model="hf:neuralmagic/TinyLlama-1.1B-Chat-v0.3-pruned50-quant-ds")
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)

"""
Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university:

1. Academic performance:

- Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects.
- Check if the university offers a clear curriculum that includes a clear sequence of courses.
- Check if the university offers a clear pathway to graduation, including clear dates and deadlines.

2. Financial aid:

- Look for a university that offers financial aid, such as scholarships, grants, or loans.
- Check if the university offers financial aid that fits your budget.
- Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses.
"""
```

## One-shot and Export

```
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]" "torch<2"
python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py PY007/TinyLlama-1.1B-Chat-v0.3 open_platypus --recipe recipe.yaml --save True
python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment --sequence_length 512
cp deployment/model.onnx deployment/model-orig.onnx
python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
```

`recipe.yaml`
```
test_stage:
  obcq_modifiers:
    SparseGPTModifier:
      sparsity: 0.5
      block_size: 128
      sequential_update: false
      quantize:
        QuantizationModifier:
          ignore:
          - LlamaRotaryEmbedding
          - LlamaRMSNorm
          - SiLUActivation
          - model.layers.21.mlp.down_proj
          - model.layers.7.mlp.down_proj
          - model.layers.2.mlp.down_proj
          - model.layers.20.mlp.down_proj
          - model.layers.19.mlp.down_proj
          post_oneshot_calibration: false
          scheme_overrides:
            Embedding:
              input_activations: null
              weights:
                num_bits: 8
                symmetric: false
      percdamp: 0.01
      prunen: 0
      prunem: 0
      targets:
      - model.layers.0
      - model.layers.1
      - model.layers.2
      - model.layers.3
      - model.layers.4
      - model.layers.5
      - model.layers.6
      - model.layers.7
      - model.layers.8
      - model.layers.9
      - model.layers.10
      - model.layers.11
      - model.layers.12
      - model.layers.13
      - model.layers.14
      - model.layers.15
      - model.layers.16
      - model.layers.17
      - model.layers.18
      - model.layers.19
      - model.layers.20
      - model.layers.21
      target_ids:
      - attention_mask
      - position_ids
```