bertopic_sim70_10topics_larger_embed_raw

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("DobreMihai/bertopic_sim70_10topics_larger_embed_raw")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 16
  • Number of training documents: 56774
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 loud - very - not - super - definitely Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 loud
1 subscription - be - but - joke - high Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 subscription
2 ad - bs - - - Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 ads
3 snooze - snoozing - never - what - no Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 snooze
4 premium - why - - - Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 premium*
5 math - need - the - more - Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 math
6 be - it - the - to - and Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 -1_be_it_the_to
7 app - be - the - it - to Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 0_app_be_the_it
8 good - nice - very - excellent - awesome Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 1_good_nice_very_excellent
9 work - easy - very - use - helpful Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 2_work_easy_very_use
10 hai - que - de - la - se Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 3_hai_que_de_la
11 ok - well - be - good - it Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 4_ok_well_be_good
12 super - epic - noice - top - excelent Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 5_super_epic_noice_top
13 life - annoying - change - save - it Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 6_life_annoying_change_save
14 never - reliable - fail - clock - dependable Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 7_never_reliable_fail_clock
15 step - math - squat - the - count Topic Count
6 7 20334
15 6 17927
8 8 11505
7 9 3067
10 10 1362
9 11 1333
14 12 619
11 13 302
12 14 138
13 15 120
0 0 50
1 1 6
4 2 4
2 3 3
3 4 2
5 5 2 8_step_math_squat_the

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.38.post1
  • UMAP: 0.5.6
  • Pandas: 2.2.1
  • Scikit-Learn: 1.5.2
  • Sentence-transformers: 3.1.0
  • Transformers: 4.44.2
  • Numba: 0.60.0
  • Plotly: 5.24.1
  • Python: 3.10.15
Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.