|
--- |
|
tags: |
|
- image-classification |
|
- document-classification |
|
- vision |
|
library_name: transformers |
|
pipeline_tag: image-classification |
|
license: mit |
|
--- |
|
|
|
# Document Classification Model |
|
|
|
## Overview |
|
This model is trained for document classification using vision transformers (DiT). |
|
|
|
## Model Details |
|
* Architecture: Vision Transformer (DiT) |
|
* Tasks: Document Classification |
|
* Training Framework: 🤗 Transformers |
|
* Base Model: microsoft/dit-large |
|
* Training Dataset Size: 32786 |
|
|
|
## Training Parameters |
|
* Batch Size: 256 |
|
* Learning Rate: 0.001 |
|
* Number of Epochs: 90 |
|
* Mixed Precision: BF16 |
|
* Gradient Accumulation Steps: 2 |
|
* Weight Decay: 0.01 |
|
* Learning Rate Schedule: cosine_with_restarts |
|
* Warmup Ratio: 0.1 |
|
|
|
## Training and Evaluation Metrics |
|
### Training Metrics |
|
* Loss: 0.1915 |
|
* Grad Norm: 1.3002 |
|
* Learning Rate: 0.0009 |
|
* Epoch: 26.4186 |
|
* Step: 1704.0000 |
|
|
|
### Evaluation Metrics |
|
* Loss: 0.9457 |
|
* Accuracy: 0.7757 |
|
* Weighted F1: 0.7689 |
|
* Micro F1: 0.7757 |
|
* Macro F1: 0.7518 |
|
* Weighted Recall: 0.7757 |
|
* Micro Recall: 0.7757 |
|
* Macro Recall: 0.7603 |
|
* Weighted Precision: 0.8023 |
|
* Micro Precision: 0.7757 |
|
* Macro Precision: 0.7941 |
|
* Runtime: 8.4106 |
|
* Samples Per Second: 433.1450 |
|
* Steps Per Second: 3.4480 |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
from PIL import Image |
|
|
|
# Load model and processor |
|
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification") |
|
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification") |
|
|
|
# Process an image |
|
image = Image.open("document.png") |
|
inputs = processor(image, return_tensors="pt") |
|
|
|
# Make prediction |
|
outputs = model(**inputs) |
|
predicted_label = outputs.logits.argmax(-1).item() |
|
``` |
|
|