File size: 4,086 Bytes
42333a1
 
ef60b7d
42333a1
 
 
 
 
 
 
 
ef60b7d
42333a1
ef60b7d
 
 
 
 
 
42333a1
ef60b7d
42333a1
 
 
 
 
ef60b7d
 
42333a1
ef60b7d
42333a1
ef60b7d
 
 
 
 
42333a1
 
 
 
 
 
 
 
 
 
ef60b7d
42333a1
 
ef60b7d
 
42333a1
ef60b7d
42333a1
ef60b7d
 
42333a1
ef60b7d
 
42333a1
ef60b7d
 
42333a1
ef60b7d
 
 
42333a1
 
ef60b7d
42333a1
 
ef60b7d
42333a1
 
ef60b7d
 
42333a1
ef60b7d
42333a1
ef60b7d
 
42333a1
ef60b7d
 
 
42333a1
 
ef60b7d
42333a1
 
 
 
ef60b7d
42333a1
 
ef60b7d
42333a1
 
ef60b7d
 
42333a1
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: apache-2.0
inference: false
datasets:
- c4
- wikipedia
language:
- en
pipeline_tag: fill-mask
---

# Perceiver IO masked language model

This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It 
is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model 
but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can 
be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion). 
Both models generate equal output for the same input. 

Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.

## Model description

The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795)
(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).

## Intended use

Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is
fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset
([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the 
pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight
initialization. 

## Usage examples

To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) 
the `perceiver-io` library with extension `text`.

```shell
pip install perceiver-io[text]
```

Then the model can be used with PyTorch. Either use the model and tokenizer directly

```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

masked_text = "This is an incomplete sentence where some words are" \
              "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)

# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
print(tokenizer.decode(masked_token_predictions))
```
```
 missing.
```

or use a `fill-mask` pipeline:

```python
from transformers import pipeline
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

masked_text = "This is an incomplete sentence where some words are" \
              "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
```
```
 missing.
```

## Model conversion

The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with: 

```python
from perceiver.model.text.mlm import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-mlm",
    source_repo_id="deepmind/language-perceiver",
    push_to_hub=True,
)
```

## Citation

```bibtex
@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}
```