Which 46 Languages
Hi
@Robertl
,
Please find the full list below! You can see it by clicking on the widget 46 languages
above
Thanks a lot!
I agree it is not clear enough, I proposed a PR here: https://github.com/huggingface/transformers/pull/18645 to have the full detailed list of the trained languages
Actually you can also find the full list here: https://huggingface.co/bigscience/bloom#languages !
Why no Czech for training bloom? Czech has large corpora, has a very active community in NLP, have published previous NLP models (e.g., a BERT version)... ?
@cerisara The training corpus was crowdsourced by workshop participants; the final list of languages took shape organically through community hackathons and volunteer efforts.
More info in this thread: https://twitter.com/YJernite/status/1505920454825066496