reproducing: "Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness" (https://arxiv.org/abs/2408.05446)
source code and usage examples: https://github.com/ETH-DISCO/self-ensembling
architecture based on Torchvision's Resnet152 default implementation
hyperparameters:
- criterion:
torch.nn.CrossEntropyLoss()
- optimizer:
torch.optim.AdamW
- scaler:
GradScaler
- datasets:
["cifar10", "cirfar100"]
- lr:
0.0001
- num_epochs:
16
(higher would be even better, but maybe by <1%) - crossmax_k:
2
(difference betweencrossmax_k=2
andcrossmax_k=3
is about 1-2%, so it's not a big deal)