Enables to toggle fused_dense, flash_rotary and attn_pdrop in the configuration. 45f4b21 gugarosa commited on Nov 1, 2023
Adds support for MQA/GQA and attention mask during training. de35f90 gugarosa commited on Oct 30, 2023