File size: 1,721 Bytes
eec1cd1 d94ca4a eec1cd1 aca0a54 eec1cd1 d3a7216 d94ca4a d3a7216 d94ca4a d3a7216 eec1cd1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
tags:
- image-classification
- document-classification
- vision
library_name: transformers
pipeline_tag: image-classification
license: mit
---
# Document Classification Model
## Overview
This model is trained for document classification using vision transformers (DiT).
## Model Details
* Architecture: Vision Transformer (DiT)
* Tasks: Document Classification
* Training Framework: 🤗 Transformers
* Base Model: microsoft/dit-large
* Training Dataset Size: 32786
## Training Parameters
* Batch Size: 256
* Learning Rate: 0.001
* Number of Epochs: 90
* Mixed Precision: BF16
* Gradient Accumulation Steps: 2
* Weight Decay: 0.01
* Learning Rate Schedule: cosine_with_restarts
* Warmup Ratio: 0.1
## Training and Evaluation Metrics
### Training Metrics
* Loss: 0.1915
* Grad Norm: 1.3002
* Learning Rate: 0.0009
* Epoch: 26.4186
* Step: 1704.0000
### Evaluation Metrics
* Loss: 0.9457
* Accuracy: 0.7757
* Weighted F1: 0.7689
* Micro F1: 0.7757
* Macro F1: 0.7518
* Weighted Recall: 0.7757
* Micro Recall: 0.7757
* Macro Recall: 0.7603
* Weighted Precision: 0.8023
* Micro Precision: 0.7757
* Macro Precision: 0.7941
* Runtime: 8.4106
* Samples Per Second: 433.1450
* Steps Per Second: 3.4480
## Usage
```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")
# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")
# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
```
|