PyTorch
Serbian
Croatian
xlm-roberta
procesaur commited on
Commit
5256358
1 Parent(s): 61dcf3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -0
README.md CHANGED
@@ -13,3 +13,198 @@ language:
13
  base_model:
14
  - FacebookAI/xlm-roberta-large
15
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  base_model:
14
  - FacebookAI/xlm-roberta-large
15
  ---
16
+
17
+ <table style="width:100%;height:100%">
18
+ <tr>
19
+ <td colspan=2>
20
+ <h4><i class="highlight-container"><b class="highlight">TeslaXLM</b></i></h4>
21
+ </td>
22
+ </tr>
23
+ <tr style="width:100%;height:100%">
24
+ <td width=50%>
25
+ <p>Вишејезични модел, 561 милион параметара</p>
26
+ <p>Обучаван над корпусима српског и српскохрватског језика - 20 милијарди речи</p>
27
+ <p>Једнака подршка уноса на ћирилици и латиници!</p>
28
+ </td>
29
+ <td>
30
+ <p>Multilingual model, 561 million parameters</p>
31
+ <p>Trained on Serbian and Serbo-Croatian corpora - 20 billion words</p>
32
+ <p>Equal support for Cyrillic and Latin input!</p>
33
+ </td>
34
+ </tr>
35
+ </table>
36
+
37
+ ```python
38
+ >>> from transformers import pipeline
39
+ >>> unmasker = pipeline('fill-mask', model='te-sla/teslaXLM')
40
+ >>> unmasker("Kada bi čovek znao gde će pasti on bi<mask>.")
41
+ ```
42
+
43
+ ```python
44
+ >>> from transformers import AutoTokenizer, AutoModelForMaskedLM
45
+ >>> from torch import LongTensor, no_grad
46
+ >>> from scipy import spatial
47
+ >>> tokenizer = AutoTokenizer.from_pretrained('te-sla/teslaXLM')
48
+ >>> model = AutoModelForMaskedLM.from_pretrained('te-sla/teslaXLM', output_hidden_states=True)
49
+ >>> x = " pas"
50
+ >>> y = " mačka"
51
+ >>> z = " svemir"
52
+ >>> tensor_x = LongTensor(tokenizer.encode(x, add_special_tokens=False)).unsqueeze(0)
53
+ >>> tensor_y = LongTensor(tokenizer.encode(y, add_special_tokens=False)).unsqueeze(0)
54
+ >>> tensor_z = LongTensor(tokenizer.encode(z, add_special_tokens=False)).unsqueeze(0)
55
+ >>> model.eval()
56
+ >>> with no_grad():
57
+ >>> vektor_x = model(input_ids=tensor_x).hidden_states[-1].squeeze()
58
+ >>> vektor_y = model(input_ids=tensor_y).hidden_states[-1].squeeze()
59
+ >>> vektor_z = model(input_ids=tensor_z).hidden_states[-1].squeeze()
60
+ >>> print(spatial.distance.cosine(vektor_x, vektor_y))
61
+ >>> print(spatial.distance.cosine(vektor_x, vektor_z))
62
+ ```
63
+
64
+ <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:50px">
65
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Author</div>
66
+ <a href="https://huggingface.co/procesaur">
67
+ <div class="flex">
68
+ <div
69
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
70
+ background-size: cover; background-image: url(&#39;https://cdn-uploads.huggingface.co/production/uploads/1673534533167-63bc254fb8c61b8aa496a39b.jpeg?w=200&h=200&f=face&#39;)">
71
+ </div>
72
+ </div>
73
+ </a>
74
+ <div style="text-align: center; font-size: 16px; font-weight: 800">Mihailo Škorić</div>
75
+ <div>
76
+ <a href="https://huggingface.co/procesaur">
77
+ <div style="text-align: center; font-size: 14px;">@procesaur</div>
78
+ </a>
79
+ </div>
80
+ </div>
81
+ </div>
82
+
83
+ <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:50px">
84
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Author</div>
85
+ <a href="https://huggingface.co/tanor">
86
+ <div class="flex">
87
+ <div
88
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
89
+ background-size: cover; background-image: url('https://cdn-avatars.huggingface.co/v1/production/uploads/6409d3d71ee054d66a673701/KTOOnCRS9NhpAMZIvLlU7.png?w=200&h=200&f=face')">
90
+ </div>
91
+ </div>
92
+ </a>
93
+ <div style="text-align: center; font-size: 16px; font-weight: 800">Saša Petalinkar</div>
94
+ <div>
95
+ <a href="https://huggingface.co/tanor">
96
+ <div style="text-align: center; font-size: 14px;">@tanor</div>
97
+ </a>
98
+ </div>
99
+ </div>
100
+ </div>
101
+
102
+ <div class="inline-flex flex-col" style="line-height: 1.5;">
103
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Computation</div>
104
+ <a href="https://tesla.rgf.bg.ac.rs">
105
+ <div class="flex">
106
+ <div
107
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
108
+ background-size: cover; background-image: url(https://cdn-avatars.huggingface.co/v1/production/uploads/63bc254fb8c61b8aa496a39b/TfM_-sc8-b34ddfhHBGTA.png?w=200&h=200&f=face)">
109
+ </div>
110
+ </div>
111
+ </a>
112
+ <div style="text-align: center; font-size: 16px; font-weight: 800">TESLA project</div>
113
+ <div>
114
+ <a href="https://huggingface.co/te-sla">
115
+ <div style="text-align: center; font-size: 14px;">@te-sla</div>
116
+ </a>
117
+ </div>
118
+ </div>
119
+ </div>
120
+ <br/><br/>
121
+ <div id="zastava">
122
+ <div class="grb">
123
+ <img src="https://www.ai.gov.rs/img/logo_60x120-2.png" style="position:relative; left:30px; z-index:10; height:85px">
124
+ </div>
125
+ <table width=100% style="border:0px">
126
+ <tr style="background-color:#C6363C;width:100%;border:0px;height:30px"><td style="width:100vw"></td></tr>
127
+ <tr style="background-color:#0C4076;width:100%;border:0px;height:30px"><td></td></tr>
128
+ <tr style="background-color:#ffffff;width:100%;border:0px;height:30px"><td></td></tr>
129
+ </table>
130
+ </div>
131
+
132
+ <table style="width:100%;height:100%">
133
+ <tr style="width:100%;height:100%">
134
+ <td width=50%>
135
+ <p>Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA</p>
136
+ </td>
137
+ <td>
138
+ <p>This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA</p>
139
+ </td>
140
+ </tr>
141
+ </table>
142
+
143
+
144
+
145
+ <style>
146
+ .ffeat: {
147
+ color:red
148
+ }
149
+
150
+ .cover {
151
+ width: 100%;
152
+ margin-bottom: 5pt
153
+ }
154
+
155
+ .highlight-container, .highlight {
156
+ position: relative;
157
+ text-decoration:none
158
+ }
159
+
160
+ .highlight-container {
161
+ display: inline-block;
162
+
163
+ }
164
+
165
+ .highlight{
166
+ color:white;
167
+ text-transform:uppercase;
168
+ font-size: 16pt;
169
+ }
170
+
171
+ .highlight-container{
172
+ padding:5px 10px
173
+ }
174
+
175
+ .highlight-container:before {
176
+ content: " ";
177
+ display: block;
178
+ height: 100%;
179
+ width: 100%;
180
+ margin-left: 0px;
181
+ margin-right: 0px;
182
+ position: absolute;
183
+ background: #e80909;
184
+ transform: rotate(2deg);
185
+ top: -1px;
186
+ left: -1px;
187
+ border-radius: 20% 25% 20% 24%;
188
+ padding: 10px 18px 18px 10px;
189
+ }
190
+
191
+ div.grb, #zastava>table {
192
+ position:absolute;
193
+ top:0px;
194
+ left: 0px;
195
+ margin:0px
196
+ }
197
+
198
+ div.grb>img, #zastava>table{
199
+ margin:0px
200
+ }
201
+
202
+ #zastava {
203
+ position: relative;
204
+ margin-bottom:120px
205
+ }
206
+
207
+ p {
208
+ font-size:14pt
209
+ }
210
+ </style>