hf (dtype=bfloat16,use_cache=True,pretrained=./checkpoint-1400/,max_length=2048), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16 | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |-------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:| |leaderboard_gpqa | N/A| | | | | | | | | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.3030|± |0.0327| | - leaderboard_gpqa_extended | 1|none | 0|acc_norm|↑ |0.3004|± |0.0196| | - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.2969|± |0.0216| |leaderboard_musr | N/A| | | | | | | | | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm|↑ |0.5400|± |0.0316| | - leaderboard_musr_object_placements| 1|none | 0|acc_norm|↑ |0.3203|± |0.0292| | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm|↑ |0.4080|± |0.0311| hf (dtype=bfloat16,use_cache=True,pretrained=./checkpoint-1400/,max_length=768), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 128 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.5974|± |0.0135| | | |strict-match | 5|exact_match|↑ |0.5921|± |0.0135|