ClassCat commited on
Commit
c9e43d7
1 Parent(s): 20e79bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - wikipedia
6
+ - cc100
7
+ widget:
8
+ - text: Yo soy <mask>.
9
+ ---
10
+
11
+ ## RoBERTa Spanish base model
12
+
13
+ ### Model architecture
14
+
15
+ This model uses RoBERTa base setttings except vocabulary size.
16
+
17
+ ### Tokenizer
18
+
19
+ Using BPE tokenizer with vocabulary size 50,000.
20
+
21
+ ### Training Data
22
+
23
+ * [wiki40b/es](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bes) (Spanish Wikipedia)
24
+ * Subset of [CC-100/es](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
25
+
26
+ ### Usage
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+
31
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-base-spanish')
32
+ unmasker("Yo soy <mask>.")
33
+ ```