Update README.md

9582471 verified about 1 month ago

18.5 kB

	---
	license: apache-2.0
	base_model:
	- microsoft/conditional-detr-resnet-50
	pipeline_tag: object-detection
	datasets:
	- tech4humans/signature-detection
	metrics:
	- f1
	- precision
	- recall
	library_name: transformers
	inference: false
	tags:
	- object-detection
	- signature-detection
	- detr
	- conditional-detr
	- pytorch
	model-index:
	- name: tech4humans/conditional-detr-50-signature-detector
	results:
	- task:
	type: object-detection
	dataset:
	type: tech4humans/signature-detection
	name: tech4humans/signature-detection
	split: test
	metrics:
	- type: precision
	value: 0.936524
	name: [email protected]
	- type: precision
	value: 0.653321
	name: [email protected]:0.95
	---

	# Conditional-DETR ResNet-50 - Handwritten Signature Detection

	This repository presents a Conditional-DETR model with ResNet-50 backbone, fine-tuned to detect handwritten signatures in document images. This model achieved the highest [email protected] (93.65%) among all tested architectures in our comprehensive evaluation.

	\| Resource \| Links / Badges \| Details \|
	\|---------------------------------\|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| Article \| [![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/blog/samuellimabraz/signature-detection-model) \| A detailed community article covering the full development process of the project \|
	\| Model Files (YOLOv8s) \| [![HF Model](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md.svg)](https://huggingface.co/tech4humans/yolov8s-signature-detector) \| Available formats: [![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat&logo=PyTorch&logoColor=white)](https://pytorch.org/) [![ONNX](https://img.shields.io/badge/ONNX-005CED.svg?style=flat&logo=ONNX&logoColor=white)](https://onnx.ai/) [![TensorRT](https://img.shields.io/badge/TensorRT-76B900.svg?style=flat&logo=NVIDIA&logoColor=white)](https://developer.nvidia.com/tensorrt) \|
	\| Dataset – Original \| [![Roboflow](https://app.roboflow.com/images/download-dataset-badge.svg)](https://universe.roboflow.com/tech-ysdkk/signature-detection-hlx8j) \| 2,819 document images annotated with signature coordinates \|
	\| Dataset – Processed \| [![HF Dataset](https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md.svg)](https://huggingface.co/datasets/tech4humans/signature-detection) \| Augmented and pre-processed version (640px) for model training \|
	\| Notebooks – Model Experiments \| [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wSySw_zwyuv6XSaGmkngI4dwbj-hR4ix) [![W&B Training](https://img.shields.io/badge/W%26B_Training-FFBE00?style=flat&logo=WeightsAndBiases&logoColor=white)](https://api.wandb.ai/links/samuel-lima-tech4humans/30cmrkp8) \| Complete training and evaluation pipeline with selection among different architectures (yolo, detr, rt-detr, conditional-detr, yolos) \|
	\| Notebooks – HP Tuning \| [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wSySw_zwyuv6XSaGmkngI4dwbj-hR4ix) [![W&B HP Tuning](https://img.shields.io/badge/W%26B_HP_Tuning-FFBE00?style=flat&logo=WeightsAndBiases&logoColor=white)](https://api.wandb.ai/links/samuel-lima-tech4humans/31a6zhb1) \| Optuna trials for optimizing the precision/recall balance \|
	\| Inference Server \| [![GitHub](https://img.shields.io/badge/Deploy-ffffff?style=for-the-badge&logo=github&logoColor=black)](https://github.com/tech4ai/t4ai-signature-detect-server) \| Complete deployment and inference pipeline with Triton Inference Server<br> [![OpenVINO](https://img.shields.io/badge/OpenVINO-00c7fd?style=flat&logo=intel&logoColor=white)](https://docs.openvino.ai/2025/index.html) [![Docker](https://img.shields.io/badge/Docker-2496ED?logo=docker&logoColor=fff)](https://www.docker.com/) [![Triton](https://img.shields.io/badge/Triton-Inference%20Server-76B900?labelColor=black&logo=nvidia)](https://developer.nvidia.com/triton-inference-server) \|
	\| Live Demo \| [![HF Space](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg)](https://huggingface.co/spaces/tech4humans/signature-detection) \| Graphical interface with real-time inference<br> [![Gradio](https://img.shields.io/badge/Gradio-FF5722?style=flat&logo=Gradio&logoColor=white)](https://www.gradio.app/) [![Plotly](https://img.shields.io/badge/PLotly-000000?style=flat&logo=plotly&logoColor=white)](https://plotly.com/python/) \|

	---

	## Dataset

	<table>
	<tr>
	<td style="text-align: center; padding: 10px;">
	<a href="https://universe.roboflow.com/tech-ysdkk/signature-detection-hlx8j">
	<img src="https://app.roboflow.com/images/download-dataset-badge.svg">
	</a>
	</td>
	<td style="text-align: center; padding: 10px;">
	<a href="https://huggingface.co/datasets/tech4humans/signature-detection">
	<img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md-dark.svg" alt="Dataset on HF">
	</a>
	</td>
	</tr>
	</table>
	The training utilized a dataset built from two public datasets: [Tobacco800](https://paperswithcode.com/dataset/tobacco-800) and [signatures-xc8up](https://universe.roboflow.com/roboflow-100/signatures-xc8up), unified and processed in [Roboflow](https://roboflow.com/).

	Dataset Summary:
	- Training: 1,980 images (70%)
	- Validation: 420 images (15%)
	- Testing: 419 images (15%)
	- Format: COCO JSON
	- Resolution: 640x640 pixels

	![Roboflow Dataset](./assets/roboflow_ds.png)

	---

	## Training Process

	The training process involved the following steps:

	### 1. Model Selection:

	Various object detection models were evaluated to identify the best balance between precision, recall, and inference time.


	\| Metric \| [rtdetr-l](https://github.com/ultralytics/assets/releases/download/v8.2.0/rtdetr-l.pt) \| [yolos-base](https://huggingface.co/hustvl/yolos-base) \| [yolos-tiny](https://huggingface.co/hustvl/yolos-tiny) \| [conditional-detr-resnet-50](https://huggingface.co/microsoft/conditional-detr-resnet-50) \| [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) \| [yolov8x](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x.pt) \| [yolov8l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l.pt) \| [yolov8m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m.pt) \| [yolov8s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt) \| [yolov8n](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt) \| [yolo11x](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt) \| [yolo11l](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l.pt) \| [yolo11m](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m.pt) \| [yolo11s](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt) \| [yolo11n](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt) \| [yolov10x](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10x.pt) \| [yolov10l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10l.pt) \| [yolov10b](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10b.pt) \| [yolov10m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10m.pt) \| [yolov10s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10s.pt) \| [yolov10n](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10n.pt) \|
	\|:---------------------\|---------:\|-----------:\|-----------:\|---------------------------:\|---------------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|---------:\|---------:\|---------:\|---------:\|---------:\|---------:\|
	\| Inference Time - CPU (ms) \| 583.608 \| 1706.49 \| 265.346 \| 476.831 \| 425.649 \| 1259.47 \| 871.329 \| 401.183 \| 216.6 \| 110.442 \| 1016.68 \| 518.147 \| 381.652 \| 179.792 \| 106.656 \| 821.183 \| 580.767 \| 473.109 \| 320.12 \| 150.076 \| 73.8596 \|
	\| mAP50 \| 0.92709 \| 0.901154 \| 0.869814 \| 0.936524 \| 0.88885 \| 0.794237\| 0.800312\| 0.875322\| 0.874721\| 0.816089\| 0.667074\| 0.707409\| 0.809557\| 0.835605\| 0.813799\| 0.681023\| 0.726802\| 0.789835\| 0.787688\| 0.663877\| 0.734332 \|
	\| mAP50-95 \| 0.622364 \| 0.583569 \| 0.469064 \| 0.653321 \| 0.579428 \| 0.552919\| 0.593976\| 0.665495\| 0.65457 \| 0.623963\| 0.482289\| 0.499126\| 0.600797\| 0.638849\| 0.617496\| 0.474535\| 0.522654\| 0.578874\| 0.581259\| 0.473857\| 0.552704 \|


	![Model Selection](./assets/model_selection.png)

	#### Highlights:
	- Best mAP50: `conditional-detr-resnet-50` (0.936524)
	- Best mAP50-95: `yolov8m` (0.665495)
	- Fastest Inference Time: `yolov10n` (73.8596 ms)

	Detailed experiments are available on [Weights & Biases](https://api.wandb.ai/links/samuel-lima-tech4humans/30cmrkp8).

	### 2. Hyperparameter Tuning:

	The YOLOv8s model, which demonstrated a good balance of inference time, precision, and recall, was selected for hyperparameter tuning.

	[Optuna](https://optuna.org/) was used for 20 optimization trials.
	The hyperparameter tuning used the following parameter configuration:

	```python
	dropout = trial.suggest_float("dropout", 0.0, 0.5, step=0.1)
	lr0 = trial.suggest_float("lr0", 1e-5, 1e-1, log=True)
	box = trial.suggest_float("box", 3.0, 7.0, step=1.0)
	cls = trial.suggest_float("cls", 0.5, 1.5, step=0.2)
	opt = trial.suggest_categorical("optimizer", ["AdamW", "RMSProp"])
	```
	Results can be visualized here: [Hypertuning Experiment](https://api.wandb.ai/links/samuel-lima-tech4humans/31a6zhb1).

	![Hypertuning Sweep](./assets/sweep.png)

	### 3. Evaluation:

	The models were evaluated on the test set at the end of training in ONNX (CPU) and TensorRT (GPU - T4) formats. Performance metrics included precision, recall, mAP50, and mAP50-95.

	![Trials](./assets/trials.png)

	#### Results Comparison:

	\| Metric \| Base Model \| Best Trial (#10) \| Difference \|
	\|------------\|------------\|-------------------\|-------------\|
	\| mAP50 \| 87.47% \| 95.75% \| +8.28% \|
	\| mAP50-95 \| 65.46% \| 66.26% \| +0.81% \|
	\| Precision \| 97.23% \| 95.61% \| -1.63% \|
	\| Recall \| 76.16% \| 91.21% \| +15.05% \|
	\| F1-score \| 85.42% \| 93.36% \| +7.94% \|

	---

	## Results

	After hyperparameter tuning of the YOLOv8s model, the best model achieved the following results on the test set:

	- Precision: 94.74%
	- Recall: 89.72%
	- mAP@50: 94.50%
	- mAP@50-95: 67.35%
	- Inference Time:
	- ONNX Runtime (CPU): 171.56 ms
	- TensorRT (GPU - T4): 7.657 ms

	---

	## How to Use

	### Installation

	```bash
	pip install transformers torch torchvision pillow
	```

	### Inference

	```python
	from transformers import AutoImageProcessor, AutoModelForObjectDetection
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "tech4humans/conditional-detr-50-signature-detector"
	processor = AutoImageProcessor.from_pretrained(model_name)
	model = AutoModelForObjectDetection.from_pretrained(model_name)

	# Load and process image
	image = Image.open("path/to/your/document.jpg")
	inputs = processor(images=image, return_tensors="pt")

	# Run inference
	with torch.no_grad():
	outputs = model(**inputs)

	# Post-process results
	target_sizes = torch.tensor([image.size[::-1]])
	results = processor.post_process_object_detection(
	outputs, target_sizes=target_sizes, threshold=0.5
	)[0]

	# Extract detections
	for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
	box = [round(i, 2) for i in box.tolist()]
	print(f"Detected signature with confidence {round(score.item(), 3)} at location {box}")
	```

	### Visualization

	```python
	import matplotlib.pyplot as plt
	import matplotlib.patches as patches
	from PIL import Image

	def visualize_predictions(image_path, results, threshold=0.5):
	image = Image.open(image_path)
	fig, ax = plt.subplots(1, figsize=(12, 9))
	ax.imshow(image)

	for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
	if score > threshold:
	x, y, x2, y2 = box.tolist()
	width, height = x2 - x, y2 - y

	rect = patches.Rectangle(
	(x, y), width, height,
	linewidth=2, edgecolor='red', facecolor='none'
	)
	ax.add_patch(rect)
	ax.text(x, y-10, f'Signature: {score:.3f}',
	bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.7))

	ax.set_title("Signature Detection Results")
	plt.axis('off')
	plt.show()

	# Use the visualization
	visualize_predictions("path/to/your/document.jpg", results)
	```

	---

	## Demo

	You can explore the model and test real-time inference in the Hugging Face Spaces demo, built with Gradio and ONNXRuntime.

	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg)](https://huggingface.co/spaces/tech4humans/signature-detection)

	---

	## 🔗 Inference with Triton Server

	If you want to deploy this signature detection model in a production environment, check out our inference server repository based on the NVIDIA Triton Inference Server.

	<table>
	<tr>
	<td>
	<a href="https://github.com/triton-inference-server/server"><img src="https://img.shields.io/badge/Triton-Inference%20Server-76B900?style=for-the-badge&labelColor=black&logo=nvidia" alt="Triton Badge" /></a>
	</td>
	<td>
	<a href="https://github.com/tech4ai/t4ai-signature-detect-server"><img src="https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white" alt="GitHub Badge" /></a>
	</td>
	</tr>
	</table>

	---

	## Infrastructure

	### Software

	The model was trained and tuned using a Jupyter Notebook environment.

	- Operating System: Ubuntu 22.04
	- Python: 3.10.12
	- PyTorch: 2.5.1+cu121
	- Ultralytics: 8.3.58
	- Roboflow: 1.1.50
	- Optuna: 4.1.0
	- ONNX Runtime: 1.20.1
	- TensorRT: 10.7.0

	### Hardware

	Training was performed on a Google Cloud Platform n1-standard-8 instance with the following specifications:

	- CPU: 8 vCPUs
	- GPU: NVIDIA Tesla T4

	---

	## License

	### Model Weights, Code and Training Materials – Apache 2.0
	- License: Apache License 2.0
	- Usage: All training scripts, deployment code, and usage instructions are licensed under the Apache 2.0 license.

	---

	## Contact and Information

	For further information, questions, or contributions, contact us at [email protected].

	<div align="center">
	<p>
	📧 <b>Email:</b> <a href="mailto:[email protected]">[email protected]</a><br>
	🌐 <b>Website:</b> <a href="https://www.tech4.ai/">www.tech4.ai</a><br>
	💼 <b>LinkedIn:</b> <a href="https://www.linkedin.com/company/tech4humans-hyperautomation/">Tech4Humans</a>
	</p>
	</div>

	## Author

	<div align="center">
	<table>
	<tr>
	<td align="center" width="140">
	<a href="https://huggingface.co/samuellimabraz">
	<img src="https://avatars.githubusercontent.com/u/115582014?s=400&u=c149baf46c51fdee45ad5344cf1b360236d90d09&v=4" width="120" alt="Samuel Lima"/>
	<h3>Samuel Lima</h3>
	</a>
	<p><i>AI Research Engineer</i></p>
	<p>
	<a href="https://huggingface.co/samuellimabraz">
	<img src="https://img.shields.io/badge/🤗_HuggingFace-samuellimabraz-orange" alt="HuggingFace"/>
	</a>
	</p>
	</td>
	<td width="500">
	<h4>Responsibilities in this Project</h4>
	<ul>
	<li>🔬 Model development and training</li>
	<li>📊 Dataset analysis and processing</li>
	<li>⚙️ Hyperparameter optimization and performance evaluation</li>
	<li>📝 Technical documentation and model card</li>
	</ul>
	</td>
	</tr>
	</table>
	</div>

	---

	<div align="center">
	<p>Developed with 💜 by <a href="https://www.tech4.ai/">Tech4Humans</a></p>
	</div>