hanszhu
/

ChartPointNet-InstanceSeg

Image Segmentation

English

Model card Files Files and versions Community

hanszhu commited on 2 days ago

Commit

4061dc6

verified ·

1 Parent(s): 6820beb

Update README.md

Browse files

Files changed (1) hide show

README.md +161 -3

README.md CHANGED Viewed

@@ -1,3 +1,161 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- openmmlab/mask-rcnn
+- microsoft/swin-base-patch4-window7-224-in22k
+pipeline_tag: image-segmentation
+---
+# Model Card for ChartPointNet-InstanceSeg
+ChartPointNet-InstanceSeg is a high-precision data point instance segmentation model for scientific charts. It uses Mask R-CNN with a Swin Transformer backbone to detect and segment individual data points, especially in dense and small-object scenarios common in scientific figures.
+## Model Details
+### Model Description
+ChartPointNet-InstanceSeg is designed for pixel-precise instance segmentation of data points in scientific charts (e.g., scatter plots). It leverages Mask R-CNN with a Swin Transformer backbone, trained on enhanced COCO-style datasets with instance masks for data points. The model is ideal for extracting quantitative data from scientific figures and for downstream chart analysis.
+- **Developed by:** Hansheng Zhu
+- **Model type:** Instance Segmentation
+- **License:** Apache-2.0
+- **Finetuned from model:** openmmlab/mask-rcnn
+### Model Sources
+- **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense)
+- **Paper:** https://arxiv.org/abs/2106.01841
+## Uses
+### Direct Use
+- Instance segmentation of data points in scientific charts
+- Automated extraction of quantitative data from figures
+- Preprocessing for downstream chart understanding and data mining
+### Downstream Use
+- As a preprocessing step for chart structure parsing or data extraction
+- Integration into document parsing, digital library, or accessibility systems
+### Out-of-Scope Use
+- Segmentation of non-data-point elements
+- Use on figures outside the supported chart types
+- Medical or legal decision making
+## Bias, Risks, and Limitations
+- The model is limited to data point segmentation in scientific charts.
+- May not generalize to figures with highly unusual styles or poor image quality.
+- Potential dataset bias: Training data is sourced from scientific literature.
+### Recommendations
+Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain.
+## How to Get Started with the Model
+```python
+import torch
+from mmdet.apis import inference_detector, init_detector
+config_file = 'legend_match_swin/mask_rcnn_swin_datapoint.py'
+checkpoint_file = 'chart_datapoint.pth'
+model = init_detector(config_file, checkpoint_file, device='cuda:0')
+result = inference_detector(model, 'example_chart.png')
+# result: list of detected masks and class labels
+```
+## Training Details
+### Training Data
+- **Dataset:** Enhanced COCO-style scientific chart dataset with instance masks
+- Data point class with pixel-precise segmentation masks
+- Images and annotations filtered and preprocessed for optimal Swin Transformer performance
+### Training Procedure
+- Images resized to 1120x672
+- Mask R-CNN with Swin Transformer backbone
+- **Training regime:** fp32
+- **Optimizer:** AdamW
+- **Batch size:** 8
+- **Epochs:** 36
+- **Learning rate:** 1e-4
+## Evaluation
+### Testing Data, Factors & Metrics
+- **Testing Data:** Held-out split from enhanced COCO-style dataset
+- **Factors:** Data point density, image quality
+- **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP
+### Results
+| Category        | mAP   | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
+|-----------------|-------|--------|--------|-------|-------|-------|
+| data-point      | 0.485 | 0.687  | 0.581  | 0.487 | 0.05  |  nan  |
+#### Summary
+The model achieves strong mAP for data point segmentation, excelling in dense and small-object scenarios. It is highly effective for scientific figures requiring pixel-level accuracy.
+## Environmental Impact
+- **Hardware Type:** NVIDIA V100 GPU
+- **Hours used:** 10
+- **Cloud Provider:** Google Cloud
+- **Compute Region:** us-central1
+- **Carbon Emitted:** ~15 kg CO2eq (estimated)
+## Technical Specifications
+### Model Architecture and Objective
+- Mask R-CNN with Swin Transformer backbone
+- Instance segmentation head for data point class
+### Compute Infrastructure
+- **Hardware:** NVIDIA V100 GPU
+- **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9
+## Citation
+**BibTeX:**
+```bibtex
+@article{DocFigure2021,
+  title={DocFigure: A Dataset for Scientific Figure Classification},
+  author={S. Afzal, et al.},
+  journal={arXiv preprint arXiv:2106.01841},
+  year={2021}
+}
+```
+**APA:**
+Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841.
+## Glossary
+- **Data Point:** An individual visual marker representing a value in a scientific chart (e.g., a dot in a scatter plot)
+## More Information
+- [DocFigure Paper](https://arxiv.org/abs/2106.01841)
+## Model Card Authors
+Hansheng Zhu
+## Model Card Contact
+[email protected]