File size: 2,406 Bytes
57b0bdd 1e3ebac 57b0bdd 73a21c8 1e3ebac 73a21c8 1e3ebac 73a21c8 1e3ebac 73a21c8 1e3ebac 73a21c8 1e3ebac 667af5a 1e3ebac 667af5a 1e3ebac 49a967d 1e3ebac 49a967d 1e3ebac 49a967d 1e3ebac 49a967d 1e3ebac 49a967d 1e3ebac 49a967d 1e3ebac 667af5a 1e3ebac 667af5a 1e3ebac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
language: en
thumbnail: https://uploads-ssl.webflow.com/5e3898dff507782a6580d710/614a23fcd8d4f7434c765ab9_logo.png
license: mit
---
# LayoutLM for Visual Question Answering
This is a fine-tuned version of the multi-modal [LayoutLM](https://aka.ms/layoutlm) model for the task of question answering on documents. It has been fine-tuned on
## Model details
The LayoutLM model was developed at Microsoft ([paper](https://arxiv.org/abs/1912.13318)) as a general purpose tool for understanding documents. This model is a fine-tuned checkpoint of [LayoutLM-Base-Cased](https://huggingface.co/microsoft/layoutlm-base-uncased), using both the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) and [DocVQA](https://www.docvqa.org/) datasets.
## Getting started with the model
To run these examples, you must have [PIL](https://pillow.readthedocs.io/en/stable/installation.html), [pytesseract](https://pypi.org/project/pytesseract/), and [PyTorch](https://pytorch.org/get-started/locally/) installed in addition to [transformers](https://huggingface.co/docs/transformers/index).
```python
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained(
"impira/layoutlm-document-qa",
add_prefix_space=True,
trust_remote_code=True,
)
nlp = pipeline(
model="impira/layoutlm-document-qa",
tokenizer=tokenizer,
trust_remote_code=True,
)
nlp(
"https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
"What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}
nlp(
"https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
"What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}
nlp(
"https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
"What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}
```
**NOTE**: This model relies on a [model definition](https://github.com/huggingface/transformers/pull/18407) and [pipeline](https://github.com/huggingface/transformers/pull/18414) that are currently in review to be included in the transformers project. In the meantime, you'll have to use the `trust_remote_code=True` flag to run this model.
## About us
This model was created by the team at [Impira](https://www.impira.com/).
|