--- license: cc-by-4.0 pipeline_tag: video-classification library_name: pytorch --- # EAR-WACV25-DAKiet-TSM The model was presented in the paper [](https://huggingface.co/papers/2503.07821). This model is a Temporal Shift Module (TSM) based video classification model with a resnext50_32x4d backbone. **Github Repository:** https://github.com/fdfyaytkt/EAR-WACV25-DAKiet-TSM ## Data The model was trained on a combination of datasets: * **Toyota Smarthome dataset:** Used for activity recognition. * **ETRI-Activity3D:** RGB videos (specific subsets or full dataset used depending on configuration). * **ETRI-Activity3D-LivingLab:** RGB videos (specific subsets or full dataset used depending on configuration). Two configurations are detailed below, with their respective public leaderboard scores: ### Config 1 (Public Leaderboard: 0.84402) * Toyota Smarthome dataset * ETRI-Activity3D - RGB videos (RGB\_P091-P100) * ETRI-Activity3D-LivingLab - RGB videos (RGB(P201-P230)) ### Config 2 (Public Leaderboard: 0.78856) * Toyota Smarthome dataset * ETRI-Activity3D - RGB videos (full) * ETRI-Activity3D-LivingLab - RGB videos (full) ## Running Example training and evaluation commands are provided below. Refer to the repository for complete details and options: ### Train ```console python main.py elderly RGB --arch resnext50_32x4d --num_segments 8 --gd 20 --lr 0.001 --wd 1e-4 --lr_steps 20 40 --epochs 100 --batch-size 4 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres --npb ``` ### Eval ```console python generate_submission.py elderly --arch=resnext50_32x4d --csv_file=submission.csv --weights=checkpoint/TSM_elderly_RGB_resnext50_32x4d_shift8_blockres_avg_segment8_e100/ckpt.best.pth.tar --test_segments=8 --batch_size=1 --test_crops=1 ```