navodPeiris
/

layoutlmv2-document-classifier

Text Classification

Generated from Trainer

Model card Files Files and versions

layoutlmv2-document-classifier / README.md

navodPeiris's picture

updated readme

2206f59 verified about 1 month ago

|

history blame contribute delete

3.52 kB

	---
	library_name: transformers
	license: cc-by-nc-sa-4.0
	base_model: microsoft/layoutlmv2-base-uncased
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: layoutlmv2-document-classifier
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# layoutlmv2-document-classifier

	This model is a fine-tuned version of [microsoft/layoutlmv2-base-uncased](https://huggingface.co/microsoft/layoutlmv2-base-uncased) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0008
	- Accuracy: 1.0

	## Dataset Infomation

	This model was fine-tuned to classify some company documents.

	Dataset used: [Company Documents Dataset](https://www.kaggle.com/datasets/navodpeiris/company-documents-dataset)

	## Dependencies

	```
	pip install PyMuPDF
	pip install transformers
	pip install torch
	pip install torchvision
	pip install pytesseract
	```

	- setup tesseract locally in your machine follow steps here: [install instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)

	## Model Usage

	use a file in this dataset to test: https://www.kaggle.com/datasets/navodpeiris/company-documents-dataset

	```
	import os
	from PIL import Image
	from transformers import LayoutLMv2Processor, LayoutLMv2ForSequenceClassification
	import fitz
	import io

	processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
	model = LayoutLMv2ForSequenceClassification.from_pretrained("navodPeiris/layoutlmv2-document-classifier")

	DATA_FOLDER = "data"
	filename = "invoice.pdf"

	file_location = os.path.join(DATA_FOLDER, filename)
	doc = fitz.open(file_location)

	page = doc.load_page(0)
	pix = page.get_pixmap(dpi=200)

	# Convert Pixmap to bytes
	img_bytes = pix.tobytes("png")

	# Load into PIL.Image
	image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
	doc.close()

	encoding = processor(image, return_tensors="pt", truncation=True, padding="max_length", max_length=512)

	outputs = model(**encoding)
	logits = outputs.logits

	predicted_class_id = logits.argmax(dim=1).item()
	classified_output = model.config.id2label[predicted_class_id]

	print(f"Predicted class: {classified_output}")
	```

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|
	\| 0.7722 \| 0.0970 \| 26 \| 0.2249 \| 0.9216 \|
	\| 0.0828 \| 0.1940 \| 52 \| 0.0452 \| 0.9907 \|
	\| 0.026 \| 0.2910 \| 78 \| 0.0459 \| 0.9907 \|
	\| 0.0265 \| 0.3881 \| 104 \| 0.0267 \| 0.9907 \|
	\| 0.0263 \| 0.4851 \| 130 \| 0.0068 \| 1.0 \|
	\| 0.008 \| 0.5821 \| 156 \| 0.0026 \| 1.0 \|
	\| 0.0023 \| 0.6791 \| 182 \| 0.0014 \| 1.0 \|
	\| 0.0014 \| 0.7761 \| 208 \| 0.0009 \| 1.0 \|
	\| 0.0011 \| 0.8731 \| 234 \| 0.0008 \| 1.0 \|
	\| 0.0012 \| 0.9701 \| 260 \| 0.0008 \| 1.0 \|


	### Framework versions

	- Transformers 4.51.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.6.0
	- Tokenizers 0.21.1