Transformers
PyTorch
flexibert
Inference Endpoints

FlexiBERT-Mini model

Pretrained model on the English language using a macked language modeling (MLM) objective. It was found by executing a neural architecture search (NAS) over a design space of ~3.32 billion flexible and heterogeneous transformer architectures in this paper. The model is case sensitive.

Model description

The model consists of diverse attention heads including the traditional self-attention and the discrete cosine transform (DCT). The design space also supports weighted multiplicative attention (WMA), discrete Fourier transform (DFT), and convolution operations in the same transformer model along with different hidden dimensions for each encoder layer.

How to use

This model should be finetuned on a downstream task. Other models within the FlexiBERT design space can be generated using a model dicsiontary. See this github repo for more details. To instantiate a fresh FlexiBERT-Mini model (for pre-trainining using the MLM objective):

from transformers import FlexiBERTConfig, FlexiBERTModel, FlexiBERTForMaskedLM
config = FlexiBERTConfig()
model_dict = {'l': 4, 'o': ['sa', 'sa', 'l', 'l'], 'h': [256, 256, 128, 128], 'n': [2, 2, 4, 4],
      'f': [[512, 512, 512], [512, 512, 512], [1024], [1024]], 'p': ['sdp', 'sdp', 'dct', 'dct']}
config.from_model_dict(model_dict)
model = FlexiBERTForMaskedLM(config)

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

Cite our work using the following bitex entry:

@article{tuli2022jair,
      title={{FlexiBERT}: Are Current Transformer Architectures too Homogeneous and Rigid?}, 
      author={Tuli, Shikhar and Dedhia, Bhishma and Tuli, Shreshth and Jha, Niraj K.},
      year={2022},
      eprint={2205.11656},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

BSD-3-Clause. Copyright (c) 2022, Shikhar Tuli and Jha Lab. All rights reserved.

See License file for more details.

Downloads last month
21
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Datasets used to train shikhartuli/flexibert-mini