---
license: cc-by-4.0
pipeline_tag: video-classification
library_name: pytorch
---

# EAR-WACV25-DAKiet-TSM

The model was presented in the paper [](https://huggingface.co/papers/2503.07821).

This model is a Temporal Shift Module (TSM) based video classification model with a resnext50_32x4d backbone.

**Github Repository:** https://github.com/fdfyaytkt/EAR-WACV25-DAKiet-TSM

## Data
The model was trained on a combination of datasets:

*   **Toyota Smarthome dataset:** Used for activity recognition.
*   **ETRI-Activity3D:** RGB videos (specific subsets or full dataset used depending on configuration).
*   **ETRI-Activity3D-LivingLab:** RGB videos (specific subsets or full dataset used depending on configuration).

Two configurations are detailed below, with their respective public leaderboard scores:

### Config 1 (Public Leaderboard: 0.84402)

*   Toyota Smarthome dataset
*   ETRI-Activity3D - RGB videos (RGB\_P091-P100)
*   ETRI-Activity3D-LivingLab - RGB videos (RGB(P201-P230))

### Config 2 (Public Leaderboard: 0.78856)

*   Toyota Smarthome dataset
*   ETRI-Activity3D - RGB videos (full)
*   ETRI-Activity3D-LivingLab - RGB videos (full)

## Running

Example training and evaluation commands are provided below. Refer to the repository for complete details and options:

### Train

```console
python main.py elderly RGB --arch resnext50_32x4d --num_segments 8 --gd 20 --lr 0.001 --wd 1e-4 --lr_steps 20 40 --epochs 100 --batch-size 4 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres --npb
```

### Eval

```console
python generate_submission.py elderly --arch=resnext50_32x4d --csv_file=submission.csv  --weights=checkpoint/TSM_elderly_RGB_resnext50_32x4d_shift8_blockres_avg_segment8_e100/ckpt.best.pth.tar --test_segments=8 --batch_size=1 --test_crops=1
```