macadeliccc commited on
Commit
90184df
·
1 Parent(s): bf128ed

changed use_flash_attention_2=True to attn_implementation="flash_attention_2"

Browse files

I receive this warning when using use_flash_attention_2=True

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.

Using attn_implementation="flash_attention_2" alleviates the warning message

```python
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation="flash_attention_2"
)
```

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -271,7 +271,7 @@ To do so, you first need to install [Flash Attention](https://github.com/Dao-AIL
271
  pip install flash-attn --no-build-isolation
272
  ```
273
 
274
- and then all you have to do is to pass `use_flash_attention_2=True` to `from_pretrained`:
275
 
276
  ```diff
277
  - model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True)
 
271
  pip install flash-attn --no-build-isolation
272
  ```
273
 
274
+ and then all you have to do is to pass `attn_implementation="flash_attention_2"` to `from_pretrained`:
275
 
276
  ```diff
277
  - model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True)