MoDE_CALVIN_ABCD / mode_calvin_abcd.log
mbreuss's picture
Upload mode_calvin_abcd.log
85dde4e verified
[2024-12-16 22:55:38,135][mode.evaluation.multistep_sequences][INFO] - Start generating evaluation sequences.
[2024-12-16 22:55:50,491][mode.evaluation.multistep_sequences][INFO] - Done generating evaluation sequences.
[2024-12-16 22:55:51,214][mode.models.mode_agent][INFO] - Precomputing experts with sampling steps 5
1/5 : 97.1% | 2/5 : 93.4% | 3/5 : 89.9% | 4/5 : 86.2% | 5/5 : 82.4% | Average: 4.5 |: 100%|███| 1000/1000 [2:17:17<00:00, 8.24s/it]
Results for Epoch 0:
Average successful sequence length: 4.49
Success rates for i instructions in a row:
1: 97.1%
2: 93.4%
3: 89.9%
4: 86.2%
5: 82.4%
rotate_blue_block_right: 75 / 78 | SR: 96.2%
move_slider_right: 289 / 289 | SR: 100.0%
lift_red_block_slider: 136 / 141 | SR: 96.5%
place_in_slider: 361 / 369 | SR: 97.8%
turn_off_lightbulb: 155 / 156 | SR: 99.4%
turn_off_led: 176 / 176 | SR: 100.0%
push_into_drawer: 99 / 125 | SR: 79.2%
lift_blue_block_drawer: 17 / 18 | SR: 94.4%
close_drawer: 219 / 219 | SR: 100.0%
lift_pink_block_slider: 139 / 143 | SR: 97.2%
open_drawer: 366 / 366 | SR: 100.0%
rotate_red_block_right: 74 / 75 | SR: 98.7%
lift_red_block_table: 181 / 182 | SR: 99.5%
lift_pink_block_table: 175 / 178 | SR: 98.3%
move_slider_left: 263 / 263 | SR: 100.0%
turn_on_lightbulb: 181 / 181 | SR: 100.0%
rotate_blue_block_left: 67 / 67 | SR: 100.0%
push_blue_block_left: 64 / 70 | SR: 91.4%
turn_on_led: 183 / 184 | SR: 99.5%
stack_block: 182 / 206 | SR: 88.3%
push_pink_block_right: 49 / 67 | SR: 73.1%
push_red_block_left: 69 / 79 | SR: 87.3%
lift_blue_block_table: 187 / 188 | SR: 99.5%
place_in_drawer: 188 / 190 | SR: 98.9%
rotate_red_block_left: 66 / 66 | SR: 100.0%
push_pink_block_left: 76 / 77 | SR: 98.7%
push_red_block_right: 47 / 71 | SR: 66.2%
lift_pink_block_drawer: 14 / 15 | SR: 93.3%
rotate_pink_block_right: 66 / 71 | SR: 93.0%
lift_blue_block_slider: 135 / 142 | SR: 95.1%
unstack_block: 67 / 67 | SR: 100.0%
rotate_pink_block_left: 57 / 57 | SR: 100.0%
push_blue_block_right: 49 / 72 | SR: 68.1%
lift_red_block_drawer: 18 / 18 | SR: 100.0%
Best model: epoch 0 with average sequences length of 4.49