mwitiderrick commited on
Commit
eb012f3
1 Parent(s): eb8f072

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -28
README.md CHANGED
@@ -1,41 +1,83 @@
1
  ---
2
- base_model: GeneZC/MiniChat-2-3B
3
  inference: True
4
  model_type: Llama
 
 
 
5
  ---
6
- # MiniChat-2-3B
7
- This repo contains pruned model files for [MiniChat-2-3B](https://huggingface.co/GeneZC/MiniChat-2-3B).
8
 
9
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 
 
 
 
 
 
 
10
  ```python
 
11
 
12
- import torch
13
- from transformers import AutoTokenizer, AutoModelForCausalLM
14
  prompt = "How to make banana bread?"
15
  formatted_prompt = f"<s> [|User|]\n{prompt}</s>[|Assistant|]\n"
16
- model_id = "nm-testing/MiniChat-2-3B-pruned50-24"
17
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
18
- tokenizer = AutoTokenizer.from_pretrained(model_id)
19
- inputs = tokenizer(formatted_prompt, return_tensors="pt")
20
- outputs = model.generate(**inputs, max_new_tokens=200)
21
- print(tokenizer.batch_decode(outputs)[0])
22
 
 
 
 
23
  """
24
- <s><s> [|User|]
25
- How to make banana bread?</s>[|Assistant|]
26
- To make banana bread, follow these steps:
27
-
28
- 1. Start by preparing the ingredients. You will need banana bread mix, flour, water, and salt.
29
- 2. Mix the ingredients together and mix the mixture thoroughly.
30
- 3. Pour the mixture into a pan to cook.
31
- 4. Cook the mixture until it is cooked.
32
- 5. Once the bread is cooked, you can use it as a base for making banana bread.
33
- 6. Add the banana bread mix to the pan and mix it thoroughly.
34
- 7. Pour the mixture into a pan to cook.
35
- 8. Cook the mixture until it is cooked.
36
- 9. Once the bread is cooked, you can use it as a base for making banana bread.
37
- 10. Add the banana bread mix to the pan and mix it thoroughly.
38
- 11. Pour the mixture into a pan to cook.
39
- 12.
40
  """
41
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: nm-testing/MiniChat-2-3B-pruned2.4
3
  inference: True
4
  model_type: Llama
5
+ tags:
6
+ - nm-vllm
7
+ - sparse
8
  ---
9
+ ## MiniChat-2-3B-pruned2.4
10
+ This repo contains model files for [MiniChat-2-3B-pruned2.4](https://huggingface.co/GeneZC/MiniChat-2-3B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
11
 
12
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
13
+
14
+ ## Inference
15
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
16
+ ```bash
17
+ pip install nm-vllm[sparse]
18
+ ```
19
+ Run in a Python pipeline for local inference:
20
  ```python
21
+ from vllm import LLM, SamplingParams
22
 
23
+ model = LLM("nm-testing/MiniChat-2-3B-pruned2.4", sparsity="sparse_w16a16")
 
24
  prompt = "How to make banana bread?"
25
  formatted_prompt = f"<s> [|User|]\n{prompt}</s>[|Assistant|]\n"
 
 
 
 
 
 
26
 
27
+ sampling_params = SamplingParams(max_tokens=100,temperature=0,repetition_penalty=1.3)
28
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
29
+ print(outputs[0].outputs[0].text)
30
  """
31
+ Answer: Create a recipe for making banana bread using ingredients like flour, water and sugar. Explain the process of mixing these materials together until they form an unpleasant mixture that can be used in cooking methods such as baking or boiling processes. Describe how you would create this dough by adding it into your kitchen's oven-based environment while describing its properties during each stage before creating them on topical forms. You will also describe what
32
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  """
34
+ ```
35
+
36
+ ## Prompt template
37
+
38
+ ```
39
+ ### User:
40
+ {prompt}
41
+ ### Assistant:
42
+
43
+ ```
44
+
45
+ ## Sparsification
46
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
47
+
48
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
49
+ ```bash
50
+ git clone https://github.com/neuralmagic/sparseml
51
+ pip install -e "sparseml[transformers]"
52
+ ```
53
+
54
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
55
+ ```python
56
+ import sparseml.transformers
57
+
58
+ original_model_name = "nm-testing/MiniChat-2-3B-pruned2.4"
59
+ calibration_dataset = "open_platypus"
60
+ output_directory = "output/"
61
+
62
+ recipe = """
63
+ test_stage:
64
+ obcq_modifiers:
65
+ SparseGPTModifier:
66
+ sparsity: 0.5
67
+ sequential_update: true
68
+ mask_structure: '2:4'
69
+ targets: ['re:model.layers.\d*$']
70
+ """
71
+
72
+ # Apply SparseGPT to the model
73
+ sparseml.transformers.oneshot(
74
+ model=original_model_name,
75
+ dataset=calibration_dataset,
76
+ recipe=recipe,
77
+ output_dir=output_directory,
78
+ )
79
+ ```
80
+
81
+ ## Slack
82
+
83
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)