metadata

language: en
thumbnail: >-
  https://uploads-ssl.webflow.com/5e3898dff507782a6580d710/614a23fcd8d4f7434c765ab9_logo.png
license: mit

LayoutLM for Visual Question Answering

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned using both the SQuAD2.0 and DocVQA datasets.

Getting started with the model

To run these examples, you must have PIL, pytesseract, and PyTorch installed in addition to transformers.

from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained(
    "impira/layoutlm-document-qa",
    add_prefix_space=True,
    trust_remote_code=True,
)

nlp = pipeline(
    model="impira/layoutlm-document-qa",
    tokenizer=tokenizer,
    trust_remote_code=True,
)

nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}

nlp(
    "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
    "What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}

nlp(
    "https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
    "What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}

NOTE: This model was recently landed in transformers via PR #18407, so you'll need to use a recent version of transformers (e.g. pip install git+https://github.com/huggingface/transformers.git@5c4c869014f5839d04c1fd28133045df0c91fd84). The pipeline is currently in review to be included in the transformers project. In the meantime, you'll have to use the trust_remote_code=True flag to run the pipeline.

About us

This model was created by the team at Impira.