arxiv:2305.17219

GVdoc: Graph-based Visual Document Classification

Published on May 26, 2023

Authors:

Abstract

The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2305.17219 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2305.17219 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2305.17219 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.