File size: 4,086 Bytes
42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 ef60b7d 42333a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
license: apache-2.0
inference: false
datasets:
- c4
- wikipedia
language:
- en
pipeline_tag: fill-mask
---
# Perceiver IO masked language model
This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It
is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model
but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can
be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion).
Both models generate equal output for the same input.
Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.
## Model description
The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795)
(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).
## Intended use
Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is
fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset
([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the
pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight
initialization.
## Usage examples
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
the `perceiver-io` library with extension `text`.
```shell
pip install perceiver-io[text]
```
Then the model can be used with PyTorch. Either use the model and tokenizer directly
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)
# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
print(tokenizer.decode(masked_token_predictions))
```
```
missing.
```
or use a `fill-mask` pipeline:
```python
from transformers import pipeline
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
```
```
missing.
```
## Model conversion
The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with:
```python
from perceiver.model.text.mlm import convert_model
convert_model(
save_dir="krasserm/perceiver-io-mlm",
source_repo_id="deepmind/language-perceiver",
push_to_hub=True,
)
```
## Citation
```bibtex
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
``` |