jnmrr's picture
Upload model - 2024-12-12 01:23
eec1cd1 verified
|
raw
history blame
1.1 kB
metadata
tags:
  - image-classification
  - document-classification
  - vision
library_name: transformers
pipeline_tag: image-classification
license: mit

Document Classification Model

Overview

This model is trained for document classification using vision transformers (DiT).

Model Details

  • Architecture: Vision Transformer (DiT)
  • Tasks: Document Classification
  • Training Framework: 🤗 Transformers
  • Base Model: microsoft/dit-large
  • Training Dataset Size: 32786

Training Parameters

  • Batch Size: 256
  • Learning Rate: 0.002
  • Number of Epochs: 1
  • Mixed Precision: BF16

Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")

# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")

# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()