metadata
tags:
- image-classification
- document-classification
- vision
library_name: transformers
pipeline_tag: image-classification
license: mit
Document Classification Model
Overview
This model is trained for document classification using vision transformers (DiT).
Model Details
- Architecture: Vision Transformer (DiT)
- Tasks: Document Classification
- Training Framework: 🤗 Transformers
- Base Model: microsoft/dit-large
- Training Dataset Size: 32786
Training Parameters
- Batch Size: 256
- Learning Rate: 0.002
- Number of Epochs: 1
- Mixed Precision: BF16
Usage
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")
# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")
# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()