File size: 4,846 Bytes
4061dc6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
license: apache-2.0
language:
- en
base_model:
- openmmlab/mask-rcnn
- microsoft/swin-base-patch4-window7-224-in22k
pipeline_tag: image-segmentation
---
# Model Card for ChartPointNet-InstanceSeg
ChartPointNet-InstanceSeg is a high-precision data point instance segmentation model for scientific charts. It uses Mask R-CNN with a Swin Transformer backbone to detect and segment individual data points, especially in dense and small-object scenarios common in scientific figures.
## Model Details
### Model Description
ChartPointNet-InstanceSeg is designed for pixel-precise instance segmentation of data points in scientific charts (e.g., scatter plots). It leverages Mask R-CNN with a Swin Transformer backbone, trained on enhanced COCO-style datasets with instance masks for data points. The model is ideal for extracting quantitative data from scientific figures and for downstream chart analysis.
- **Developed by:** Hansheng Zhu
- **Model type:** Instance Segmentation
- **License:** Apache-2.0
- **Finetuned from model:** openmmlab/mask-rcnn
### Model Sources
- **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense)
- **Paper:** https://arxiv.org/abs/2106.01841
## Uses
### Direct Use
- Instance segmentation of data points in scientific charts
- Automated extraction of quantitative data from figures
- Preprocessing for downstream chart understanding and data mining
### Downstream Use
- As a preprocessing step for chart structure parsing or data extraction
- Integration into document parsing, digital library, or accessibility systems
### Out-of-Scope Use
- Segmentation of non-data-point elements
- Use on figures outside the supported chart types
- Medical or legal decision making
## Bias, Risks, and Limitations
- The model is limited to data point segmentation in scientific charts.
- May not generalize to figures with highly unusual styles or poor image quality.
- Potential dataset bias: Training data is sourced from scientific literature.
### Recommendations
Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.
## How to Get Started with the Model
```python
import torch
from mmdet.apis import inference_detector, init_detector
config_file = 'legend_match_swin/mask_rcnn_swin_datapoint.py'
checkpoint_file = 'chart_datapoint.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'example_chart.png')
# result: list of detected masks and class labels
```
## Training Details
### Training Data
- **Dataset:** Enhanced COCO-style scientific chart dataset with instance masks
- Data point class with pixel-precise segmentation masks
- Images and annotations filtered and preprocessed for optimal Swin Transformer performance
### Training Procedure
- Images resized to 1120x672
- Mask R-CNN with Swin Transformer backbone
- **Training regime:** fp32
- **Optimizer:** AdamW
- **Batch size:** 8
- **Epochs:** 36
- **Learning rate:** 1e-4
## Evaluation
### Testing Data, Factors & Metrics
- **Testing Data:** Held-out split from enhanced COCO-style dataset
- **Factors:** Data point density, image quality
- **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP
### Results
| Category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
|-----------------|-------|--------|--------|-------|-------|-------|
| data-point | 0.485 | 0.687 | 0.581 | 0.487 | 0.05 | nan |
#### Summary
The model achieves strong mAP for data point segmentation, excelling in dense and small-object scenarios. It is highly effective for scientific figures requiring pixel-level accuracy.
## Environmental Impact
- **Hardware Type:** NVIDIA V100 GPU
- **Hours used:** 10
- **Cloud Provider:** Google Cloud
- **Compute Region:** us-central1
- **Carbon Emitted:** ~15 kg CO2eq (estimated)
## Technical Specifications
### Model Architecture and Objective
- Mask R-CNN with Swin Transformer backbone
- Instance segmentation head for data point class
### Compute Infrastructure
- **Hardware:** NVIDIA V100 GPU
- **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9
## Citation
**BibTeX:**
```bibtex
@article{DocFigure2021,
title={DocFigure: A Dataset for Scientific Figure Classification},
author={S. Afzal, et al.},
journal={arXiv preprint arXiv:2106.01841},
year={2021}
}
```
**APA:**
Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.
## Glossary
- **Data Point:** An individual visual marker representing a value in a scientific chart (e.g., a dot in a scatter plot)
## More Information
- [DocFigure Paper](https://arxiv.org/abs/2106.01841)
## Model Card Authors
Hansheng Zhu
## Model Card Contact
[email protected] |