perceiver-io-mlm / README.md
krasserm's picture
Update README.md
ef60b7d
---
license: apache-2.0
inference: false
datasets:
- c4
- wikipedia
language:
- en
pipeline_tag: fill-mask
---
# Perceiver IO masked language model
This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It
is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model
but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can
be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion).
Both models generate equal output for the same input.
Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.
## Model description
The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795)
(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).
## Intended use
Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is
fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset
([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the
pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight
initialization.
## Usage examples
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
the `perceiver-io` library with extension `text`.
```shell
pip install perceiver-io[text]
```
Then the model can be used with PyTorch. Either use the model and tokenizer directly
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)
# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
print(tokenizer.decode(masked_token_predictions))
```
```
missing.
```
or use a `fill-mask` pipeline:
```python
from transformers import pipeline
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
```
```
missing.
```
## Model conversion
The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with:
```python
from perceiver.model.text.mlm import convert_model
convert_model(
save_dir="krasserm/perceiver-io-mlm",
source_repo_id="deepmind/language-perceiver",
push_to_hub=True,
)
```
## Citation
```bibtex
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
```