perceiver-io-mlm / README.md

Update README.md

ef60b7d almost 2 years ago

4.09 kB

	---
	license: apache-2.0
	inference: false
	datasets:
	- c4
	- wikipedia
	language:
	- en
	pipeline_tag: fill-mask
	---

	# Perceiver IO masked language model

	This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
	from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It
	is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model
	but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can
	be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion).
	Both models generate equal output for the same input.

	Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver)
	also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
	training details.

	## Model description

	The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795)
	(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).

	## Intended use

	Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is
	fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset
	([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the
	pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight
	initialization.

	## Usage examples

	To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
	the `perceiver-io` library with extension `text`.

	```shell
	pip install perceiver-io[text]
	```

	Then the model can be used with PyTorch. Either use the model and tokenizer directly

	```python
	from transformers import AutoModelForMaskedLM, AutoTokenizer
	from perceiver.model.text import mlm # auto-class registration

	repo_id = "krasserm/perceiver-io-mlm"

	model = AutoModelForMaskedLM.from_pretrained(repo_id)
	tokenizer = AutoTokenizer.from_pretrained(repo_id)

	masked_text = "This is an incomplete sentence where some words are" \
	"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

	encoding = tokenizer(masked_text, return_tensors="pt")
	outputs = model(**encoding)

	# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
	masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
	print(tokenizer.decode(masked_token_predictions))
	```
	```
	missing.
	```

	or use a `fill-mask` pipeline:

	```python
	from transformers import pipeline
	from perceiver.model.text import mlm # auto-class registration

	repo_id = "krasserm/perceiver-io-mlm"

	masked_text = "This is an incomplete sentence where some words are" \
	"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

	filler_pipeline = pipeline("fill-mask", model=repo_id)
	masked_token_predictions = filler_pipeline(masked_text)
	print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
	```
	```
	missing.
	```

	## Model conversion

	The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with:

	```python
	from perceiver.model.text.mlm import convert_model

	convert_model(
	save_dir="krasserm/perceiver-io-mlm",
	source_repo_id="deepmind/language-perceiver",
	push_to_hub=True,
	)
	```

	## Citation

	```bibtex
	@article{jaegle2021perceiver,
	title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
	author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
	journal={arXiv preprint arXiv:2107.14795},
	year={2021}
	}
	```