AngelPanizo commited on
Commit
23699fd
·
verified ·
1 Parent(s): e54e4b1

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_mikesdigest
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_mikesdigest")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 3
34
+ * Number of training documents: 199
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | extraterrestrial - climatarian - plural - ominous - swamp | 53 | -1_extraterrestrial_climatarian_plural_ominous |
42
+ | 0 | suddenly - every - walked - fear - head | 14 | 0_suddenly_every_walked_fear |
43
+ | 1 | bigfoot - sightings - ufo - conspiracy - alleged | 132 | 1_bigfoot_sightings_ufo_conspiracy |
44
+
45
+ </details>
46
+
47
+ ## Training hyperparameters
48
+
49
+ * calculate_probabilities: True
50
+ * language: None
51
+ * low_memory: False
52
+ * min_topic_size: 10
53
+ * n_gram_range: (1, 1)
54
+ * nr_topics: None
55
+ * seed_topic_list: None
56
+ * top_n_words: 10
57
+ * verbose: False
58
+ * zeroshot_min_similarity: 0.7
59
+ * zeroshot_topic_list: None
60
+
61
+ ## Framework versions
62
+
63
+ * Numpy: 1.26.4
64
+ * HDBSCAN: 0.8.40
65
+ * UMAP: 0.5.7
66
+ * Pandas: 2.2.3
67
+ * Scikit-Learn: 1.5.2
68
+ * Sentence-transformers: 3.3.1
69
+ * Transformers: 4.46.3
70
+ * Numba: 0.60.0
71
+ * Plotly: 5.24.1
72
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5a8f2c12ae23e3290a3a97153d6676c6ceb63945885a8e848d5356056e99e9b
3
+ size 130804
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f70d67174e665f67e6ac9e9738db2a8a1810231f348c6f785ba81227c9871a9b
3
+ size 12376
topics.json ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "extraterrestrial",
6
+ 0.5718045234680176
7
+ ],
8
+ [
9
+ "climatarian",
10
+ 0.541415274143219
11
+ ],
12
+ [
13
+ "plural",
14
+ 0.5203798413276672
15
+ ],
16
+ [
17
+ "ominous",
18
+ 0.518060564994812
19
+ ],
20
+ [
21
+ "swamp",
22
+ 0.512150764465332
23
+ ]
24
+ ],
25
+ "0": [
26
+ [
27
+ "suddenly",
28
+ 0.5722545981407166
29
+ ],
30
+ [
31
+ "every",
32
+ 0.5308952331542969
33
+ ],
34
+ [
35
+ "walked",
36
+ 0.5263108015060425
37
+ ],
38
+ [
39
+ "fear",
40
+ 0.5016676783561707
41
+ ],
42
+ [
43
+ "head",
44
+ 0.49860572814941406
45
+ ]
46
+ ],
47
+ "1": [
48
+ [
49
+ "bigfoot",
50
+ 0.6727191209793091
51
+ ],
52
+ [
53
+ "sightings",
54
+ 0.6131260395050049
55
+ ],
56
+ [
57
+ "ufo",
58
+ 0.585844874382019
59
+ ],
60
+ [
61
+ "conspiracy",
62
+ 0.567592203617096
63
+ ],
64
+ [
65
+ "alleged",
66
+ 0.5327224135398865
67
+ ]
68
+ ]
69
+ },
70
+ "topics": [
71
+ 0,
72
+ 0,
73
+ 0,
74
+ 0,
75
+ 0,
76
+ 0,
77
+ 0,
78
+ 0,
79
+ 0,
80
+ 1,
81
+ 0,
82
+ 0,
83
+ 0,
84
+ 0,
85
+ 0,
86
+ 0,
87
+ 0,
88
+ 0,
89
+ 0,
90
+ 0,
91
+ 0,
92
+ 0,
93
+ 0,
94
+ 0,
95
+ 1,
96
+ 0,
97
+ 0,
98
+ 0,
99
+ 0,
100
+ 0,
101
+ 0,
102
+ 0,
103
+ 1,
104
+ 0,
105
+ 1,
106
+ -1,
107
+ 0,
108
+ -1,
109
+ 0,
110
+ 0,
111
+ 0,
112
+ 0,
113
+ 0,
114
+ 0,
115
+ 0,
116
+ 0,
117
+ 0,
118
+ 0,
119
+ 1,
120
+ 0,
121
+ 0,
122
+ 0,
123
+ 0,
124
+ 0,
125
+ 0,
126
+ 1,
127
+ 0,
128
+ 1,
129
+ 1,
130
+ 0,
131
+ 0,
132
+ 0,
133
+ 0,
134
+ 0,
135
+ 0,
136
+ 1,
137
+ 0,
138
+ 0,
139
+ 0,
140
+ 0,
141
+ 0,
142
+ 0,
143
+ 0,
144
+ 0,
145
+ 0,
146
+ 0,
147
+ 0,
148
+ 0,
149
+ 0,
150
+ 0,
151
+ 0,
152
+ 0,
153
+ 0,
154
+ 0,
155
+ 1,
156
+ 0,
157
+ 0,
158
+ 0,
159
+ 1,
160
+ 0,
161
+ 0,
162
+ 0,
163
+ 0,
164
+ 0,
165
+ -1,
166
+ 1,
167
+ 0,
168
+ 0,
169
+ 0,
170
+ 1,
171
+ 0,
172
+ 0,
173
+ 1,
174
+ 0,
175
+ 0,
176
+ 1,
177
+ -1,
178
+ -1,
179
+ 1,
180
+ 0,
181
+ 0,
182
+ 0,
183
+ 0,
184
+ 1,
185
+ 0,
186
+ 1,
187
+ 0,
188
+ 0,
189
+ 0,
190
+ 1,
191
+ 0,
192
+ 1,
193
+ 1,
194
+ 1,
195
+ 0,
196
+ 0,
197
+ 1,
198
+ 0,
199
+ 0,
200
+ 1,
201
+ 1,
202
+ 0,
203
+ 1,
204
+ 0,
205
+ 0,
206
+ 0,
207
+ -1,
208
+ 1,
209
+ 0,
210
+ 0,
211
+ -1,
212
+ 1,
213
+ 0,
214
+ 0,
215
+ 1,
216
+ 1,
217
+ 1,
218
+ 1,
219
+ 1,
220
+ 1,
221
+ 0,
222
+ 1,
223
+ -1,
224
+ 1,
225
+ 1,
226
+ 1,
227
+ 0,
228
+ 0,
229
+ 1,
230
+ 1,
231
+ 1,
232
+ 1,
233
+ 0,
234
+ 0,
235
+ 1,
236
+ 1,
237
+ 1,
238
+ 0,
239
+ 0,
240
+ 0,
241
+ 1,
242
+ 1,
243
+ 0,
244
+ 1,
245
+ 1,
246
+ -1,
247
+ 0,
248
+ 0,
249
+ 1,
250
+ 0,
251
+ 1,
252
+ 1,
253
+ 0,
254
+ 1,
255
+ -1,
256
+ 0,
257
+ -1,
258
+ -1,
259
+ 0,
260
+ 0,
261
+ 0,
262
+ 0,
263
+ 0,
264
+ -1,
265
+ 0,
266
+ 0,
267
+ -1,
268
+ 0,
269
+ 0
270
+ ],
271
+ "topic_sizes": {
272
+ "0": 132,
273
+ "1": 53,
274
+ "-1": 14
275
+ },
276
+ "topic_mapper": [
277
+ [
278
+ -1,
279
+ -1,
280
+ -1
281
+ ],
282
+ [
283
+ 0,
284
+ 0,
285
+ 0
286
+ ],
287
+ [
288
+ 1,
289
+ 1,
290
+ 1
291
+ ]
292
+ ],
293
+ "topic_labels": {
294
+ "-1": "-1_extraterrestrial_climatarian_plural_ominous",
295
+ "0": "0_suddenly_every_walked_fear",
296
+ "1": "1_bigfoot_sightings_ufo_conspiracy"
297
+ },
298
+ "custom_labels": null,
299
+ "_outliers": 1,
300
+ "topic_aspects": {}
301
+ }