ClassCat commited on
Commit
35f64bc
1 Parent(s): 9f49c6d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - wikipedia
6
+ - cc100
7
+ widget:
8
+ - text: "Je m'appele <mask>."
9
+ - text: "Je vais à <mask>."
10
+ ---
11
+
12
+ ## RoBERTa French base model (Uncased)
13
+
14
+ ### Prerequisites
15
+
16
+ transformers==4.19.2
17
+
18
+ ### Model architecture
19
+
20
+ This model uses RoBERTa base setttings except vocabulary size.
21
+
22
+ ### Tokenizer
23
+
24
+ Using BPE tokenizer with vocabulary size 50,000.
25
+
26
+ ### Training Data
27
+
28
+ * [wiki40b/fr](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bfr) (French Wikipedia)
29
+ * Subset of [CC-100/fr](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
30
+
31
+ ### Usage
32
+
33
+ ```python
34
+ from transformers import pipeline
35
+
36
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-base-french')
37
+ unmasker("Je vais à la <mask>.")
38
+ ```