Joetib commited on
Commit
d693fb8
·
1 Parent(s): 096e7d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -99,10 +99,7 @@ model_id = "ibleducation/ibl-neural-edu-content-7B"
99
  tokenizer = AutoTokenizer.from_pretrained(model_id)
100
  model = AutoModelForCausalLM.from_pretrained(
101
  model_id,
102
- use_flash_attention_2=True,
103
- torch_dtype=torch.bfloat16,
104
  device_map="auto",
105
- trust_remote_code=True
106
  )
107
  pipeline = transformers.pipeline(
108
  "text-generation",
@@ -115,10 +112,7 @@ response = pipeline(prompt)
115
  print(response['generated_text'])
116
  ```
117
 
118
- > In cases where the runtime gpu does not support flash attention, `use_flash_attention_2` can be ignored
119
- > though at a possible performance cost
120
-
121
- **Important** - Use the prompt template below for ibl-tutoring-7B-128k :
122
  ```
123
  <s>[INST]{prompt}[/INST]
124
  ```
 
99
  tokenizer = AutoTokenizer.from_pretrained(model_id)
100
  model = AutoModelForCausalLM.from_pretrained(
101
  model_id,
 
 
102
  device_map="auto",
 
103
  )
104
  pipeline = transformers.pipeline(
105
  "text-generation",
 
112
  print(response['generated_text'])
113
  ```
114
 
115
+ **Important** - Use the prompt template below:
 
 
 
116
  ```
117
  <s>[INST]{prompt}[/INST]
118
  ```