Unable to train this model
Hi folks,
I am currently working on a benchmark of mitosis detection on medical images, so I am performing object detection. Currently, I can train YOLOS and DETR on my data but I can't find a way to train DeformableDETR. In particular, I am using the exact same code as the one provided in the balloon example for DETR, with the few following modifications :
My feature extractor is initialized the following way :
feature_extractor = DeformableDetrImageProcessor.from_pretrained("SenseTime/deformable-detr")
and my model :
self.model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr", num_labels=1, ignore_mismatched_sizes=True)
I only have 1 label so I guess it is not a problem, since it is working for the init of DETR.
Using this code, during the Validation sanity check
, I have the following error :
Traceback (most recent call last):
File "/home/elliot/apriorics/scripts/train/train_deformabledetr.py", line 57, in <module>
trainer.fit(
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/elliot/apriorics/apriorics/models.py", line 899, in validation_step
loss, loss_dict = self.common_step(batch, batch_idx)
File "/home/elliot/apriorics/apriorics/models.py", line 865, in common_step
outputs = self.model(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
File "/home/elliot/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/transformers/models/deformable_detr/modeling_deformable_detr.py", line 1980, in forward
loss_dict = criterion(outputs_loss, labels)
File "/home/elliot/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/transformers/models/deformable_detr/modeling_deformable_detr.py", line 2213, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/elliot/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/elliot/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data/apps/conda/elliot/envs/transformers/lib/python3.9/site-packages/transformers/models/deformable_detr/modeling_deformable_detr.py", line 2341, in forward
class_cost = pos_cost_class[:, target_ids] - neg_cost_class[:, target_ids]
IndexError: index 1 is out of bounds for dimension 0 with size 1
I guess I am using the feature extractor wrongly, but I can't figure out why. I tried two different ways to preprocess the data, but none is working :
encoding = self.feature_extractor.preprocess(images=img, annotations=target, return_tensors="pt")
and
encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
Any help would be appreciated, I can also provide more information if needed.
Elliot
Hi,
Thanks for your interest in Deformable DETR! Could you print out the content of encoding
? It might there's something wrong with the class label IDs.
It looks like that :
{'pixel_values': tensor([[[[1.1700, 1.1700, 1.1700, ..., 1.7523, 1.7694, 1.7694],
[1.1700, 1.1700, 1.1700, ..., 1.7523, 1.7694, 1.7694],
[1.2043, 1.2043, 1.2043, ..., 1.7523, 1.7523, 1.7523],
...,
[1.4440, 1.4440, 1.4269, ..., 1.5468, 1.5468, 1.5468],
[1.3755, 1.3755, 1.3584, ..., 1.5639, 1.5810, 1.5810],
[1.3755, 1.3755, 1.3584, ..., 1.5639, 1.5810, 1.5810]],
[[0.7654, 0.7654, 0.7654, ..., 1.6232, 1.6408, 1.6408],
[0.7654, 0.7654, 0.7654, ..., 1.6232, 1.6408, 1.6408],
[0.8004, 0.8004, 0.8004, ..., 1.6232, 1.6408, 1.6408],
...,
[1.1681, 1.1681, 1.1506, ..., 1.0280, 1.0280, 1.0280],
[1.0980, 1.0980, 1.0805, ..., 1.0455, 1.0630, 1.0630],
[1.0980, 1.0980, 1.0805, ..., 1.0455, 1.0630, 1.0630]],
[[1.1759, 1.1759, 1.1759, ..., 2.0125, 2.0300, 2.0300],
[1.1759, 1.1759, 1.1759, ..., 2.0125, 2.0300, 2.0300],
[1.2108, 1.2108, 1.2108, ..., 2.0125, 2.0300, 2.0300],
...,
[1.7511, 1.7511, 1.7337, ..., 1.2631, 1.2631, 1.2631],
[1.6814, 1.6814, 1.6640, ..., 1.2805, 1.2980, 1.2980],
[1.6814, 1.6814, 1.6640, ..., 1.2805, 1.2980, 1.2980]]]]), 'pixel_mask': tensor([[[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]]]), 'labels': [{'size': tensor([800, 800]), 'image_id': tensor([63]), 'class_labels': tensor([1]), 'boxes': tensor([[0.7051, 0.2578, 0.0586, 0.0625]]), 'area': tensor([722.6562]), 'iscrowd': tensor([0]), 'orig_size': tensor([256, 256])}]}
I think it looks like it should ? I could see it for different images and it looked the same.
Thanks for your help
Hmm ok but the Deformable DETR tutorial is on the "balloon" dataset which also consists of only one class, right?
Hi,
I tried with the balloon dataset (I thought I had done it but actually I didn't). It worked, and I realized that the class_label
is actually 0 and not 1 in the balloon dataset. I changed this value in my custom dataset, and now it is training. What I find weird is that I used the exact same dataset for DETR and YOLOS (class label
starting at 1), and it worked.
Thanks for your help and your work,
Elliot