File size: 2,218 Bytes

0129efc
 
0c05ce9
 
7d21071
0c05ce9
0129efc
0c05ce9

---
license: bsd-3-clause
datasets:
- bookcorpus
- wikipedia
- openwebtext
---

# FlexiBERT-Mini model

Pretrained model on the English language using a macked language modeling (MLM) objective. It was found by executing a neural architecture search (NAS) over a design space of ~3.32 billion *flexible* and *heterogeneous* transformer architectures in [this paper](https://arxiv.org/abs/2205.11656). The model is case sensitive.

# Model description

The model consists of diverse attention heads including the traditional self-attention and the discrete cosine transform (DCT). The design space also supports weighted multiplicative attention (WMA), discrete Fourier transform (DFT), and convolution operations in the same transformer model along with different hidden dimensions for each encoder layer.

# How to use

This model should be finetuned on a downstream task. Other models within the FlexiBERT design space can be generated using a model dicsiontary. See this [github repo](https://github.com/JHA-Lab/txf_design-space) for more details. To instantiate a fresh FlexiBERT-Mini model (for pre-trainining using the MLM objective):

```python
from transformers import FlexiBERTConfig, FlexiBERTModel, FlexiBERTForMaskedLM
config = FlexiBERTConfig()
model_dict = {'l': 4, 'o': ['sa', 'sa', 'l', 'l'], 'h': [256, 256, 128, 128], 'n': [2, 2, 4, 4],
      'f': [[512, 512, 512], [512, 512, 512], [1024], [1024]], 'p': ['sdp', 'sdp', 'dct', 'dct']}
config.from_model_dict(model_dict)
model = FlexiBERTForMaskedLM(config)
```

# Developer

[Shikhar Tuli](https://github.com/shikhartuli). For any questions, comments or suggestions, please reach me at [[email protected]](mailto:[email protected]).

# Cite this work

Cite our work using the following bitex entry:
```
@article{tuli2022jair,
      title={{FlexiBERT}: Are Current Transformer Architectures too Homogeneous and Rigid?}, 
      author={Tuli, Shikhar and Dedhia, Bhishma and Tuli, Shreshth and Jha, Niraj K.},
      year={2022},
      eprint={2205.11656},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```

# License

BSD-3-Clause. 
Copyright (c) 2022, Shikhar Tuli and Jha Lab.
All rights reserved.

See License file for more details.