File size: 4,846 Bytes
4061dc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: apache-2.0
language:
- en
base_model:
- openmmlab/mask-rcnn
- microsoft/swin-base-patch4-window7-224-in22k
pipeline_tag: image-segmentation
---

# Model Card for ChartPointNet-InstanceSeg

ChartPointNet-InstanceSeg is a high-precision data point instance segmentation model for scientific charts. It uses Mask R-CNN with a Swin Transformer backbone to detect and segment individual data points, especially in dense and small-object scenarios common in scientific figures.

## Model Details

### Model Description

ChartPointNet-InstanceSeg is designed for pixel-precise instance segmentation of data points in scientific charts (e.g., scatter plots). It leverages Mask R-CNN with a Swin Transformer backbone, trained on enhanced COCO-style datasets with instance masks for data points. The model is ideal for extracting quantitative data from scientific figures and for downstream chart analysis.

- **Developed by:** Hansheng Zhu
- **Model type:** Instance Segmentation
- **License:** Apache-2.0
- **Finetuned from model:** openmmlab/mask-rcnn

### Model Sources

- **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense)
- **Paper:** https://arxiv.org/abs/2106.01841

## Uses

### Direct Use

- Instance segmentation of data points in scientific charts
- Automated extraction of quantitative data from figures
- Preprocessing for downstream chart understanding and data mining

### Downstream Use

- As a preprocessing step for chart structure parsing or data extraction
- Integration into document parsing, digital library, or accessibility systems

### Out-of-Scope Use

- Segmentation of non-data-point elements
- Use on figures outside the supported chart types
- Medical or legal decision making

## Bias, Risks, and Limitations

- The model is limited to data point segmentation in scientific charts.
- May not generalize to figures with highly unusual styles or poor image quality.
- Potential dataset bias: Training data is sourced from scientific literature.

### Recommendations

Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.

## How to Get Started with the Model

```python
import torch
from mmdet.apis import inference_detector, init_detector

config_file = 'legend_match_swin/mask_rcnn_swin_datapoint.py'
checkpoint_file = 'chart_datapoint.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')

result = inference_detector(model, 'example_chart.png')
# result: list of detected masks and class labels
```

## Training Details

### Training Data

- **Dataset:** Enhanced COCO-style scientific chart dataset with instance masks
- Data point class with pixel-precise segmentation masks
- Images and annotations filtered and preprocessed for optimal Swin Transformer performance

### Training Procedure

- Images resized to 1120x672
- Mask R-CNN with Swin Transformer backbone
- **Training regime:** fp32
- **Optimizer:** AdamW
- **Batch size:** 8
- **Epochs:** 36
- **Learning rate:** 1e-4

## Evaluation

### Testing Data, Factors & Metrics

- **Testing Data:** Held-out split from enhanced COCO-style dataset
- **Factors:** Data point density, image quality
- **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP

### Results

| Category        | mAP   | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
|-----------------|-------|--------|--------|-------|-------|-------|
| data-point      | 0.485 | 0.687  | 0.581  | 0.487 | 0.05  |  nan  |

#### Summary

The model achieves strong mAP for data point segmentation, excelling in dense and small-object scenarios. It is highly effective for scientific figures requiring pixel-level accuracy.

## Environmental Impact

- **Hardware Type:** NVIDIA V100 GPU
- **Hours used:** 10
- **Cloud Provider:** Google Cloud
- **Compute Region:** us-central1
- **Carbon Emitted:** ~15 kg CO2eq (estimated)

## Technical Specifications

### Model Architecture and Objective

- Mask R-CNN with Swin Transformer backbone
- Instance segmentation head for data point class

### Compute Infrastructure

- **Hardware:** NVIDIA V100 GPU
- **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9

## Citation

**BibTeX:**

```bibtex
@article{DocFigure2021,
  title={DocFigure: A Dataset for Scientific Figure Classification},
  author={S. Afzal, et al.},
  journal={arXiv preprint arXiv:2106.01841},
  year={2021}
}
```

**APA:**

Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.

## Glossary

- **Data Point:** An individual visual marker representing a value in a scientific chart (e.g., a dot in a scatter plot)

## More Information

- [DocFigure Paper](https://arxiv.org/abs/2106.01841)

## Model Card Authors

Hansheng Zhu

## Model Card Contact

[email protected]