iblai
/

ibl-neural-edu-content-7B

Text Generation

text-generation-inference

Model card Files Files and versions

Joetib commited on Jan 3, 2024

Commit

d693fb8

·

1 Parent(s): 096e7d1

Update README.md

Files changed (1) hide show

README.md +1 -7

README.md CHANGED Viewed

@@ -99,10 +99,7 @@ model_id = "ibleducation/ibl-neural-edu-content-7B"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(
   model_id,
-  use_flash_attention_2=True,
-  torch_dtype=torch.bfloat16,
   device_map="auto",
-  trust_remote_code=True
 )
 pipeline = transformers.pipeline(
     "text-generation",
@@ -115,10 +112,7 @@ response = pipeline(prompt)
 print(response['generated_text'])
 ```
-> In cases where the runtime gpu does not support flash attention, `use_flash_attention_2` can be ignored
-> though at a possible performance cost
-**Important** - Use the prompt template below for ibl-tutoring-7B-128k :
 ```
 <s>[INST]{prompt}[/INST]
 ```

 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(
   model_id,
   device_map="auto",
 )
 pipeline = transformers.pipeline(
     "text-generation",
 print(response['generated_text'])
 ```
+**Important** - Use the prompt template below:
 ```
 <s>[INST]{prompt}[/INST]
 ```