|
--- |
|
license: apache-2.0 |
|
inference: false |
|
datasets: |
|
- c4 |
|
- wikipedia |
|
language: |
|
- en |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
# Perceiver IO masked language model |
|
|
|
This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created |
|
from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It |
|
is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model |
|
but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can |
|
be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion). |
|
Both models generate equal output for the same input. |
|
|
|
Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver) |
|
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and |
|
training details. |
|
|
|
## Model description |
|
|
|
The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) |
|
(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters). |
|
|
|
## Intended use |
|
|
|
Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is |
|
fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset |
|
([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the |
|
pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight |
|
initialization. |
|
|
|
## Usage examples |
|
|
|
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) |
|
the `perceiver-io` library with extension `text`. |
|
|
|
```shell |
|
pip install perceiver-io[text] |
|
``` |
|
|
|
Then the model can be used with PyTorch. Either use the model and tokenizer directly |
|
|
|
```python |
|
from transformers import AutoModelForMaskedLM, AutoTokenizer |
|
from perceiver.model.text import mlm # auto-class registration |
|
|
|
repo_id = "krasserm/perceiver-io-mlm" |
|
|
|
model = AutoModelForMaskedLM.from_pretrained(repo_id) |
|
tokenizer = AutoTokenizer.from_pretrained(repo_id) |
|
|
|
masked_text = "This is an incomplete sentence where some words are" \ |
|
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]" |
|
|
|
encoding = tokenizer(masked_text, return_tensors="pt") |
|
outputs = model(**encoding) |
|
|
|
# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end) |
|
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1) |
|
print(tokenizer.decode(masked_token_predictions)) |
|
``` |
|
``` |
|
missing. |
|
``` |
|
|
|
or use a `fill-mask` pipeline: |
|
|
|
```python |
|
from transformers import pipeline |
|
from perceiver.model.text import mlm # auto-class registration |
|
|
|
repo_id = "krasserm/perceiver-io-mlm" |
|
|
|
masked_text = "This is an incomplete sentence where some words are" \ |
|
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]" |
|
|
|
filler_pipeline = pipeline("fill-mask", model=repo_id) |
|
masked_token_predictions = filler_pipeline(masked_text) |
|
print("".join([pred[0]["token_str"] for pred in masked_token_predictions])) |
|
``` |
|
``` |
|
missing. |
|
``` |
|
|
|
## Model conversion |
|
|
|
The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with: |
|
|
|
```python |
|
from perceiver.model.text.mlm import convert_model |
|
|
|
convert_model( |
|
save_dir="krasserm/perceiver-io-mlm", |
|
source_repo_id="deepmind/language-perceiver", |
|
push_to_hub=True, |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{jaegle2021perceiver, |
|
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, |
|
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, |
|
journal={arXiv preprint arXiv:2107.14795}, |
|
year={2021} |
|
} |
|
``` |