phi-1_5 / modeling_mixformer_sequential.py

Commit History

Fixes flash-attn import with a try/except statement
0254d42

gugarosa commited on

Adds support for flash-attn rotary embedding and fused dense layers.
0bbd68a

gugarosa commited on

Adds support for MQA/GQA and attention mask during training.
de35f90

gugarosa commited on

Update modeling_mixformer_sequential.py
d38e6f9

gugarosa commited on

Adding _set_gradient_checkpointing for compatibility (#22)
8091327

gugarosa vriveras commited on

Upload modeling_mixformer_sequential.py
b6a7e2f

gugarosa commited on

fix(phi-1_5): Checks length of `attention_mask`if it is passed as direct tensor.
f9f2ac7

gugarosa commited on

Support for `attention_mask` in forward pass.
3128bb6

gugarosa commited on

Upload MixFormerSequentialForCausalLM
d655135

suriyagunasekar commited on

Upload MixFormerSequentialForCausalLM
1698206

suriyagunasekar commited on