Document Classification Model
Overview
This model is trained for document classification using vision transformers (DiT).
Model Details
- Architecture: Vision Transformer (DiT)
- Tasks: Document Classification
- Training Framework: 🤗 Transformers
- Base Model: microsoft/dit-large
- Training Dataset Size: 32786
Training Parameters
- Batch Size: 256
- Learning Rate: 0.001
- Number of Epochs: 90
- Mixed Precision: BF16
- Gradient Accumulation Steps: 2
- Weight Decay: 0.01
- Learning Rate Schedule: cosine_with_restarts
- Warmup Ratio: 0.1
Training and Evaluation Metrics
Training Metrics
- Loss: 0.1915
- Grad Norm: 1.3002
- Learning Rate: 0.0009
- Epoch: 26.4186
- Step: 1704.0000
Evaluation Metrics
- Loss: 0.9457
- Accuracy: 0.7757
- Weighted F1: 0.7689
- Micro F1: 0.7757
- Macro F1: 0.7518
- Weighted Recall: 0.7757
- Micro Recall: 0.7757
- Macro Recall: 0.7603
- Weighted Precision: 0.8023
- Micro Precision: 0.7757
- Macro Precision: 0.7941
- Runtime: 8.4106
- Samples Per Second: 433.1450
- Steps Per Second: 3.4480
Usage
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")
# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")
# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
- Downloads last month
- 36
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.