--- tags: - image-classification - document-classification - vision library_name: transformers pipeline_tag: image-classification license: mit --- # Document Classification Model ## Overview This model is trained for document classification using vision transformers (DiT). ## Model Details * Architecture: Vision Transformer (DiT) * Tasks: Document Classification * Training Framework: 🤗 Transformers * Base Model: microsoft/dit-large * Training Dataset Size: 32786 ## Training Parameters * Batch Size: 256 * Learning Rate: 0.001 * Number of Epochs: 90 * Mixed Precision: BF16 * Gradient Accumulation Steps: 2 * Weight Decay: 0.01 * Learning Rate Schedule: cosine_with_restarts * Warmup Ratio: 0.1 ## Training and Evaluation Metrics ### Training Metrics * Loss: 0.1915 * Grad Norm: 1.3002 * Learning Rate: 0.0009 * Epoch: 26.4186 * Step: 1704.0000 ### Evaluation Metrics * Loss: 0.9457 * Accuracy: 0.7757 * Weighted F1: 0.7689 * Micro F1: 0.7757 * Macro F1: 0.7518 * Weighted Recall: 0.7757 * Micro Recall: 0.7757 * Macro Recall: 0.7603 * Weighted Precision: 0.8023 * Micro Precision: 0.7757 * Macro Precision: 0.7941 * Runtime: 8.4106 * Samples Per Second: 433.1450 * Steps Per Second: 3.4480 ## Usage ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image # Load model and processor processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification") model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification") # Process an image image = Image.open("document.png") inputs = processor(image, return_tensors="pt") # Make prediction outputs = model(**inputs) predicted_label = outputs.logits.argmax(-1).item() ```