iansotnek commited on
Commit
011319c
·
1 Parent(s): f2bd260

update to use instruct_pipeline

Browse files
Files changed (1) hide show
  1. README.md +25 -66
README.md CHANGED
@@ -44,85 +44,44 @@ Just as with any other LLM, we advise users of this technology to exercise good
44
 
45
  ## Usage
46
 
47
- The code below shows how to use `dlite-v1-774m` in the way which it was trained. While the model can be used "out of the box" using the
48
- `transformers` library, using the function defined below to create a response from the model will achieve better results.
49
-
50
- ### Load Model and Tokenizer from this Repository Using the `transformers` Package
51
 
52
  ```python
53
- from transformers import AutoModelForCausalLM, AutoTokenizer
54
- import numpy as np
55
- import re
56
-
57
- model_id = 'aisquared/dlite-v1-774m'
58
-
59
- tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = 'left')
60
- model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True, device_map = 'auto')
61
  ```
62
 
63
-
64
- ### Create the Prompt Format and Other Variables
 
 
65
 
66
  ```python
67
- PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
68
 
69
- ### Instruction:
70
- {instruction}
71
 
72
- ### Response:
73
- """
74
 
75
- END_KEY = '### End'
76
- RESPONSE_KEY = '### Response:\n'
 
77
  ```
78
 
79
-
80
- ### Create a Function to Retrieve a Response
81
 
82
  ```python
83
- def create_response(
84
- instruction,
85
- model,
86
- tokenizer,
87
- do_sample = True,
88
- max_new_tokens = 256,
89
- top_p = 0.92,
90
- top_k = 0,
91
- **kwargs
92
- ):
93
- """
94
- Create a response from the model by using a formatted prompt
95
- """
96
- input_ids = tokenizer(
97
- PROMPT.format(instruction=instruction), return_tensors="pt"
98
- ).input_ids
99
-
100
- gen_tokens = model.generate(
101
- input_ids,
102
- pad_token_id=tokenizer.pad_token_id,
103
- do_sample=do_sample,
104
- max_new_tokens=max_new_tokens,
105
- top_p=top_p,
106
- top_k=top_k,
107
- **kwargs,
108
- )
109
- decoded = tokenizer.batch_decode(gen_tokens)[0]
110
-
111
- # The response appears after "### Response:". The model has been trained to append "### End" at the end.
112
- m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", decoded, flags=re.DOTALL)
113
-
114
- response = None
115
- if m:
116
- response = m.group(1).strip()
117
- else:
118
- # The model might not generate the "### End" sequence before reaching the max tokens. In this case, return
119
- # everything after "### Response:".
120
- m = re.search(r"#+\s*Response:\s*(.+)", decoded, flags=re.DOTALL)
121
- if m:
122
- response = m.group(1).strip()
123
- else:
124
- pass
125
- return response
126
  ```
127
 
128
  ### Model Performance Metrics
 
44
 
45
  ## Usage
46
 
47
+ To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
48
+ From your terminal, run:
 
 
49
 
50
  ```python
51
+ pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
 
 
 
 
 
 
 
52
  ```
53
 
54
+ The instruction following pipeline can be loaded using the `pipeline` function as shown below. This loads a custom `InstructionTextGenerationPipeline`
55
+ found in the model repo [here](https://huggingface.co/aisquared/dlite-v1-774m/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
56
+ Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality.
57
+ It is also fine to remove it if there is sufficient memory.
58
 
59
  ```python
60
+ from transformers import pipeline
61
+ import torch
62
 
63
+ generate_text = pipeline(model="aisquared/dlite-v1-774m", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
64
+ ```
65
 
66
+ You can then use the pipeline to answer instructions:
 
67
 
68
+ ```python
69
+ res = generate_text("Who was George Washington?")
70
+ print(res[0]["generated_text"])
71
  ```
72
 
73
+ Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/aisquared/dlite-v1-774m/blob/main/instruct_pipeline.py),
74
+ store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
75
 
76
  ```python
77
+ from instruct_pipeline import InstructionTextGenerationPipeline
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer
79
+ import torch
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained("aisquared/dlite-v1-774m", padding_side="left")
82
+ model = AutoModelForCausalLM.from_pretrained("aisquared/dlite-v1-774m", device_map="auto", torch_dtype=torch.bfloat16)
83
+
84
+ generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ```
86
 
87
  ### Model Performance Metrics