Commit History
Add KTO support (#1640)
22ae21a
unverified
Fix `total_num_steps` (#1566)
81da7d2
unverified
PoSE context length ext (#1567)
5294653
unverified
ORPO Trainer replacement (#1551)
7d1d22f
unverified
DBRX Model Support (#1462)
132eb74
unverified
use locale agnostic seperator to make large nums easier to read (#1503)
da9b1a3
unverified
Pretrain multipack v2 (#1470)
5aa5097
unverified
Jamba (#1451)
02af082
unverified
fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support (#1413)
da265dd
unverified
Fix falcon tokenization step (#1441) [skip ci]
bcdc9b1
unverified
strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428)
2a1589f
unverified
more fixes 20240228 (#1342) [skip ci]
0f985e1
unverified
support for true batches with multipack (#1230)
00568c1
unverified
report min lenght of tokenized data (#1186) [skip ci]
d85d494
unverified
Phi2 multipack (#1173)
814aee6
unverified
Add desc to map/filter (#1162)
6840381
unverified
Falcon embeddings (#1149) [skip docker]
e799e08
unverified
Vram fix attempt (#1164) [skip ci]
32580c1
unverified
Deprecate max packed sequence len (#1141)
2ce5c0d
unverified
Multipack simplify for Mixtral (#1142)
6910e6a
unverified
Preprocess dataset size fix (#1131)
7570446
unverified
additional logging to get maximum token length of a sequence in the dataset (#1066) [skip ci]
2f2582e
unverified
streaming multipack for pretraining dataset (#959)
553c80f
unverified
RL/DPO (#935)
f243c21
Fix Deepspeed loading (#950)
5ea3aa3
unverified
support for mamba (#915)
40a6362
unverified
don't train if eval split is too small (#873)
797f3dd
unverified
various bugfixes (#856)
1470650
unverified
multipack w batch sampler (#795)
641e6f7
unverified
use accelerate logging for zero/main loggin only
b2430ce
cleanup verbosity a bit
4c834bf
Threaded MultipackDistributedDataloader with prefetched samples (#759)
05bd6f1
unverified
refactor setup trainer so we can add more hooks (#773)
6c81c61
unverified
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention (#728)
3553172
unverified
Save Axolotl config as WandB artifact (#716)
490923f
unverified
Jan Philipp Harries
commited on