How to fine tune microsoft/table-transformer-detection in huggingface?

#16
by Spondon - opened

Dear All,

After reading all the threads available in the internet I am using below script to fine tune table-transformer-detection
https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb

I have
Replace:

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")

for:

processor = DetrImageProcessor()

also
Replace:

DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
revision="no_timm",
num_labels=len(id2label),
ignore_mismatched_sizes=True)
for:

TableTransformerForObjectDetection.from_pretrained(
"microsoft/table-transformer-structure-recognition",
ignore_mismatched_sizes=True,
)

After finetuning the model using Trainer(max_steps=3000, gradient_clip_val=0.1) I am getting very low accuracy below
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.334
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.356
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.223
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.468
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.487
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.487

My dataset size is:
Number of training examples: 159
Number of validation examples: 19

Any thoughts on this ?
P.S. I know about this project https://github.com/microsoft/table-transformer/ , I know how to finetune using this project , I also know about convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py, present is transformers repo. But my question is why the above finetune is giving me such a low accuracy ? should I increase my dataset size or am I missing anything. I am using a proprietary dataset

Hi @Spondon . How did you pre-processed(create dataset) your own custom data ? Any Code or links for it ?
Thanks.

Hi @mali17361 ,
Code is proprietary, but the dataset format is COCO.

Interestingly when I convert the dataset to PASCAL VOC format and finetune using table transformer source script (https://github.com/microsoft/table-transformer/) , I got below accuracy on 16 epochs

IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.823
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.959
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.880
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.533
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.877
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.887
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
pubmed: AP50: 0.959, AP75: 0.880, AP: 0.823, AR: 0.887

Any thoughts on this?

Hi @Spondon . Thanks for the reply.
I was able to convert my own dataset to COCO and create my own custom dataset and fine tune the model.
I'm aslo getting very low Accuracy scores.
May be we lack the size of the dataset. What do you think ?
Have you tried any other method ?
Also I have issues when it comes to the output classes as the model has 6 output classes and in the balloon dataset case it is only 1 output class and in my case there are 2 output classes. When I'm inferencing on outside data, I see the outputs are coming only in 1 or 2 tensors i.e., only 2 classes rather than all the classes as "table-structure-recognition" has 6 output classes.

Hi @Spondon ,

We just updated our object detection guide (for easier mAP calculation with the Trainer API): https://huggingface.co/docs/transformers/main/en/tasks/object_detection, and we now also added official object detection scripts (both with Trainer API and Accelerate): https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection.

Definitely recommend these guides for fine-tuning Table Transformer on a custom dataset.

Sign up or log in to comment