Document Classification Model

Overview

This model is trained for document classification using vision transformers (DiT).

Model Details

  • Architecture: Vision Transformer (DiT)
  • Tasks: Document Classification
  • Training Framework: 🤗 Transformers
  • Base Model: microsoft/dit-large
  • Training Dataset Size: 32786

Training Parameters

  • Batch Size: 256
  • Learning Rate: 0.001
  • Number of Epochs: 90
  • Mixed Precision: BF16
  • Gradient Accumulation Steps: 2
  • Weight Decay: 0.01
  • Learning Rate Schedule: cosine_with_restarts
  • Warmup Ratio: 0.1

Training and Evaluation Metrics

Training Metrics

  • Loss: 0.1915
  • Grad Norm: 1.3002
  • Learning Rate: 0.0009
  • Epoch: 26.4186
  • Step: 1704.0000

Evaluation Metrics

  • Loss: 0.9457
  • Accuracy: 0.7757
  • Weighted F1: 0.7689
  • Micro F1: 0.7757
  • Macro F1: 0.7518
  • Weighted Recall: 0.7757
  • Micro Recall: 0.7757
  • Macro Recall: 0.7603
  • Weighted Precision: 0.8023
  • Micro Precision: 0.7757
  • Macro Precision: 0.7941
  • Runtime: 8.4106
  • Samples Per Second: 433.1450
  • Steps Per Second: 3.4480

Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")

# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")

# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
Downloads last month
36
Safetensors
Model size
303M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.