Commit History
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
4cad39c
Adding pad_to_multiple_of=16
986ff4e
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
037bc96
Model at 182k steps, mlm acc 0.6494
73f6af5
Changed print to logger
1c5d797
Preparing code for final runs
ea0132b
Improved version of conversion script Flax → PyTorch
346a10a
Fixed widget example
3f4b8d4
Fix config for checkpoint
3950061
Changed and added vocab and tokenizer
29e26bb
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
61f6971
New Flax model
300e533
Fixes to mc4 fork
8bd9e95
Fixes treatment of jsonl
7b22f12
Fix format for filepaths
7d6bbb2
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
13757a8
Adding reading streaming files from local disk
4e4228c
Base model at 105k steps
f7ba030
Update .gitattributes
b020d07
Fixes and defaults
a5b19d7
Adding Numpy random number generator
f562f06
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
f965ae3
Adding random sampling
60b6f6b
Adding config and models for the hub widget
d75240e
Adding missing import
79555ba
Adding base config and organizing configs
9c5541b
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
36b7dde
Adding sampling to mc4
3f09f56
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
9072c50
New tokenizer
eb4e77c
Adjust batch size for extrating tokens
8b9ba87
Scripts for perplexity sampling and fixes
853cd83
Remove unused imports
d5cede4
Merge branch 'main' of https://huggingface.co/flax-community/bertin-roberta-large-spanish into main
840171b
Add script to generate dataset of embeddings and perplexities. Add script to generate t-SNE plot for embedding and perplexity visualization.
a81e575
Adding correct models 10k steps
fe7ff35
Updating run script
a1f93c9
Adding checkpointing, wandb, and new mlm script
d988382
Epoch 1 Flax model
48f8c78
Changed batch size
a95f7b8
Changed execution mode
40f69ff
Initial test with BETO's corpus
2835721
:sparkles: Added test_script and a folder for scripts
2a963f0
Pablo
commited on
:see_no_evil: Added .gitignore file
de633ab
Pablo
commited on