cnn_dailymail_6789_200000_100000_v1_50topics_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_200000_100000_v1_50topics_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 50
  • Number of training documents: 200000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - would 5 -1_said_one_year_people
0 league - player - game - team - cup 104194 0_league_player_game_team
1 said - police - told - court - family 27178 1_said_police_told_court
2 said - government - us - military - president 16379 2_said_government_us_military
3 car - said - flight - fire - plane 12476 3_car_said_flight_fire
4 per - cent - said - year - school 6776 4_per_cent_said_year
5 obama - president - said - state - republican 4128 5_obama_president_said_state
6 film - show - movie - cosby - the 3747 6_film_show_movie_cosby
7 said - mexico - mexican - government - border 2967 7_said_mexico_mexican_government
8 dog - animal - cat - zoo - pet 2258 8_dog_animal_cat_zoo
9 fashion - weight - art - painting - dress 2176 9_fashion_weight_art_painting
10 apple - user - iphone - google - facebook 2139 10_apple_user_iphone_google
11 food - energy - climate - per - gas 1861 11_food_energy_climate_per
12 ebola - virus - health - disease - outbreak 1846 12_ebola_virus_health_disease
13 war - soldier - british - mr - said 1693 13_war_soldier_british_mr
14 shark - whale - ship - oil - water 1686 14_shark_whale_ship_oil
15 cancer - drug - marijuana - smoking - study 1576 15_cancer_drug_marijuana_smoking
16 space - earth - planet - mars - nasa 1361 16_space_earth_planet_mars
17 prince - royal - queen - duchess - princess 1230 17_prince_royal_queen_duchess
18 ancient - found - site - archaeologist - discovered 769 18_ancient_found_site_archaeologist
19 pope - vatican - church - francis - cardinal 605 19_pope_vatican_church_francis
20 lottery - ticket - jackpot - million - winning 604 20_lottery_ticket_jackpot_million
21 game - robot - console - xbox - 3d 494 21_game_robot_console_xbox
22 park - hotel - island - beach - resort 428 22_park_hotel_island_beach
23 hollande - sarkozy - trierweiler - french - francois 354 23_hollande_sarkozy_trierweiler_french
24 teeth - eye - hand - ear - surgery 180 24_teeth_eye_hand_ear
25 kyle - routh - sniper - littlefield - gun 137 25_kyle_routh_sniper_littlefield
26 country - population - corruption - per - city 121 26_country_population_corruption_per
27 dubai - hajj - pilgrim - mecca - mme 88 27_dubai_hajj_pilgrim_mecca
28 ballet - filin - bolshoi - dancer - dmitrichenko 66 28_ballet_filin_bolshoi_dancer
29 oldest - age - guinness - worlds - dangi 50 29_oldest_age_guinness_worlds
30 fragrance - scent - perfume - smell - bottle 45 30_fragrance_scent_perfume_smell
31 dna - cell - graphene - genome - synthetic 44 31_dna_cell_graphene_genome
32 accent - favourite - fan - language - top 35 32_accent_favourite_fan_language
33 nobel - prize - peace - award - committee 33 33_nobel_prize_peace_award
34 violin - orchestra - stradivarius - instrument - symphony 31 34_violin_orchestra_stradivarius_instrument
35 turing - bletchley - enigma - code - machine 30 35_turing_bletchley_enigma_code
36 gandolfini - sopranos - gandolfinis - soprano - actor 26 36_gandolfini_sopranos_gandolfinis_soprano
37 nelson - napoleon - battle - trafalgar - hms 26 37_nelson_napoleon_battle_trafalgar
38 redskins - name - native - snyder - washington 25 38_redskins_name_native_snyder
39 eurovision - contest - song - conchita - country 25 39_eurovision_contest_song_conchita
40 evolution - creationism - scientific - intelligent - believe 21 40_evolution_creationism_scientific_intelligent
41 prabowo - indonesia - jakarta - widodo - jokowi 17 41_prabowo_indonesia_jakarta_widodo
42 dmlaterbundle - twittervia - lanza - zann - ilfracombe 15 42_dmlaterbundle_twittervia_lanza_zann
43 clock - time - hour - daylight - westworth 13 43_clock_time_hour_daylight
44 ikea - furniture - ikeas - kamprad - refugee 12 44_ikea_furniture_ikeas_kamprad
45 vick - vicks - nfl - dog - virginia 10 45_vick_vicks_nfl_dog
46 bulb - light - leds - paddle - bulbs 8 46_bulb_light_leds_paddle
47 port - cairo - ministry - egypt - fan 7 47_port_cairo_ministry_egypt
48 sanford - sanfords - jenny - carolina - mark 5 48_sanford_sanfords_jenny_carolina

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 50
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.