Any plans to use RMSNorm (or FlashNorm) instead of LayerNorm?
1
#12 opened 6 months ago
by
graefics
lack of digit splitting in slow version of tokenizer
#11 opened 10 months ago
by
Forence

Adding Evaluation Results
#10 opened 12 months ago
by
leaderboard-pr-bot

Big difference between the before-cooldown-ckpt and the final checkpoint in the results of downstream tasks?
1
#9 opened about 1 year ago
by
siqi-zz
Adding Evaluation Results
#8 opened about 1 year ago
by
leaderboard-pr-bot

Will there be a version with traditional Chinese in the future?
#5 opened about 1 year ago
by
win10

Training config link is broken
11
#3 opened about 1 year ago
by
davidgortega
