File size: 1,721 Bytes

eec1cd1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d94ca4a
 
eec1cd1
aca0a54
 
 
 
eec1cd1
d3a7216
 
d94ca4a
 
 
 
 
d3a7216
 
d94ca4a
 
 
 
 
 
 
 
 
 
 
 
 
 
d3a7216
eec1cd1

---
tags:
- image-classification
- document-classification
- vision
library_name: transformers
pipeline_tag: image-classification
license: mit
---

# Document Classification Model

## Overview
This model is trained for document classification using vision transformers (DiT).

## Model Details
* Architecture: Vision Transformer (DiT)
* Tasks: Document Classification
* Training Framework: 🤗 Transformers
* Base Model: microsoft/dit-large
* Training Dataset Size: 32786

## Training Parameters
* Batch Size: 256
* Learning Rate: 0.001
* Number of Epochs: 90
* Mixed Precision: BF16
* Gradient Accumulation Steps: 2
* Weight Decay: 0.01
* Learning Rate Schedule: cosine_with_restarts
* Warmup Ratio: 0.1

## Training and Evaluation Metrics
### Training Metrics
* Loss: 0.1915
* Grad Norm: 1.3002
* Learning Rate: 0.0009
* Epoch: 26.4186
* Step: 1704.0000

### Evaluation Metrics
* Loss: 0.9457
* Accuracy: 0.7757
* Weighted F1: 0.7689
* Micro F1: 0.7757
* Macro F1: 0.7518
* Weighted Recall: 0.7757
* Micro Recall: 0.7757
* Macro Recall: 0.7603
* Weighted Precision: 0.8023
* Micro Precision: 0.7757
* Macro Precision: 0.7941
* Runtime: 8.4106
* Samples Per Second: 433.1450
* Steps Per Second: 3.4480

## Usage

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")

# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")

# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
```