Adds support for flash-attn rotary embedding and fused dense layers. 0bbd68a gugarosa commited on Nov 1, 2023
Adds support for MQA/GQA and attention mask during training. de35f90 gugarosa commited on Oct 30, 2023
Adding _set_gradient_checkpointing for compatibility (#22) 8091327 gugarosa vriveras commited on Oct 17, 2023
fix(phi-1_5): Checks length of `attention_mask`if it is passed as direct tensor. f9f2ac7 gugarosa commited on Sep 26, 2023