|
--- |
|
tags: |
|
- bertopic |
|
library_name: bertopic |
|
pipeline_tag: text-classification |
|
license: apache-2.0 |
|
datasets: |
|
- kmfoda/booksum |
|
language: |
|
- en |
|
inference: False |
|
--- |
|
|
|
# BERTopic-booksum-ngram1-sentence-t5-xl-chapter |
|
|
|
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. |
|
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. |
|
|
|
## Usage |
|
|
|
To use this model, please install BERTopic: |
|
|
|
``` |
|
pip install -U bertopic safetensors |
|
``` |
|
|
|
You can use the model as follows: |
|
|
|
```python |
|
from bertopic import BERTopic |
|
topic_model = BERTopic.load("pszemraj/BERTopic-booksum-ngram1-sentence-t5-xl-chapter") |
|
|
|
topic_model.get_topic_info() |
|
``` |
|
|
|
## Topic overview |
|
|
|
* Number of topics: 138 |
|
* Number of training documents: 70840 |
|
|
|
<details> |
|
<summary>Click here for an overview of all topics.</summary> |
|
|
|
| Topic ID | Topic Keywords | Topic Frequency | Label | |
|
|----------|----------------|-----------------|-------| |
|
| -1 | were - her - was - had - she | 30 | -1_were_her_was_had | |
|
| 0 | were - had - was - could - miss | 28715 | 0_were_had_was_could | |
|
| 1 | artagnan - athos - musketeers - porthos - treville | 16916 | 1_artagnan_athos_musketeers_porthos | |
|
| 2 | rama - ravan - brahma - lakshman - raghu | 4563 | 2_rama_ravan_brahma_lakshman | |
|
| 3 | were - canoe - hist - huron - hutter | 1268 | 3_were_canoe_hist_huron | |
|
| 4 | slave - were - slavery - had - was | 1011 | 4_slave_were_slavery_had | |
|
| 5 | holmes - sherlock - watson - moor - baskerville | 580 | 5_holmes_sherlock_watson_moor | |
|
| 6 | prisoner - milady - felton - were - madame | 549 | 6_prisoner_milady_felton_were | |
|
| 7 | coriolanus - cassius - brutus - sicinius - titus | 527 | 7_coriolanus_cassius_brutus_sicinius | |
|
| 8 | confederation - constitution - federal - states - senate | 511 | 8_confederation_constitution_federal_states | |
|
| 9 | heathcliff - catherine - wuthering - cathy - hindley | 498 | 9_heathcliff_catherine_wuthering_cathy | |
|
| 10 | were - seemed - rima - was - had | 492 | 10_were_seemed_rima_was | |
|
| 11 | laws - lawes - law - civill - actions | 452 | 11_laws_lawes_law_civill | |
|
| 12 | fang - wolf - fangs - musher - growl | 401 | 12_fang_wolf_fangs_musher | |
|
| 13 | sigurd - thorgeir - thord - gunnar - skarphedinn | 395 | 13_sigurd_thorgeir_thord_gunnar | |
|
| 14 | achilles - troy - patroclus - aeneas - ulysses | 385 | 14_achilles_troy_patroclus_aeneas | |
|
| 15 | fogg - passengers - passed - phileas - travellers | 376 | 15_fogg_passengers_passed_phileas | |
|
| 16 | troy - trojans - aeneas - fates - trojan | 370 | 16_troy_trojans_aeneas_fates | |
|
| 17 | disciples - jesus - pharisees - temple - jerusalem | 340 | 17_disciples_jesus_pharisees_temple | |
|
| 18 | helsing - harker - diary - dr - he | 324 | 18_helsing_harker_diary_dr | |
|
| 19 | lama - who - no - kim - am | 312 | 19_lama_who_no_kim | |
|
| 20 | sara - princess - herself - she - minchin | 301 | 20_sara_princess_herself_she | |
|
| 21 | horses - horse - saddle - stable - were | 293 | 21_horses_horse_saddle_stable | |
|
| 22 | hester - pearl - scarlet - her - human | 292 | 22_hester_pearl_scarlet_her | |
|
| 23 | candide - inquisitor - friar - cunegonde - philosopher | 286 | 23_candide_inquisitor_friar_cunegonde | |
|
| 24 | dick - aunt - were - could - had | 275 | 24_dick_aunt_were_could | |
|
| 25 | wolves - wolf - cub - hunger - were | 261 | 25_wolves_wolf_cub_hunger | |
|
| 26 | god - gods - consequences - satan - som | 241 | 26_god_gods_consequences_satan | |
|
| 27 | modesty - women - behaviour - human - woman | 240 | 27_modesty_women_behaviour_human | |
|
| 28 | society - education - distribution - service - labour | 240 | 28_society_education_distribution_service | |
|
| 29 | siddhartha - buddha - gotama - kamaswami - om | 237 | 29_siddhartha_buddha_gotama_kamaswami | |
|
| 30 | ship - captain - aboard - squire - ll | 229 | 30_ship_captain_aboard_squire | |
|
| 31 | cyrano - roxane - montfleury - hark - love | 227 | 31_cyrano_roxane_montfleury_hark | |
|
| 32 | alice - were - rabbit - hare - hatter | 225 | 32_alice_were_rabbit_hare | |
|
| 33 | toto - kansas - dorothy - oz - scarecrow | 211 | 33_toto_kansas_dorothy_oz | |
|
| 34 | lancelot - camelot - merlin - guinevere - arthur | 209 | 34_lancelot_camelot_merlin_guinevere | |
|
| 35 | were - soldiers - seemed - soldier - th | 201 | 35_were_soldiers_seemed_soldier | |
|
| 36 | were - was - fields - seemed - hills | 200 | 36_were_was_fields_seemed | |
|
| 37 | reason - thyself - actions - thine - life | 179 | 37_reason_thyself_actions_thine | |
|
| 38 | hetty - her - she - judith - were | 170 | 38_hetty_her_she_judith | |
|
| 39 | othello - iago - desdemona - ll - roderigo | 170 | 39_othello_iago_desdemona_ll | |
|
| 40 | wildeve - yes - were - vye - was | 165 | 40_wildeve_yes_were_vye | |
|
| 41 | utilitarian - morality - morals - virtue - moral | 165 | 41_utilitarian_morality_morals_virtue | |
|
| 42 | ransom - isaac - thine - thy - shekels | 163 | 42_ransom_isaac_thine_thy | |
|
| 43 | weasels - rat - ratty - toad - badger | 157 | 43_weasels_rat_ratty_toad | |
|
| 44 | philip - he - were - vicar - was | 155 | 44_philip_he_were_vicar | |
|
| 45 | macbeth - banquo - macduff - fleance - murderer | 154 | 45_macbeth_banquo_macduff_fleance | |
|
| 46 | lydgate - bulstrode - himself - he - had | 145 | 46_lydgate_bulstrode_himself_he | |
|
| 47 | capulet - romeo - juliet - verona - mercutio | 142 | 47_capulet_romeo_juliet_verona | |
|
| 48 | dying - her - were - helen - she | 141 | 48_dying_her_were_helen | |
|
| 49 | anne - avonlea - diana - her - marilla | 141 | 49_anne_avonlea_diana_her | |
|
| 50 | tartuffe - scene - dorine - pernelle - scoundrel | 140 | 50_tartuffe_scene_dorine_pernelle | |
|
| 51 | were - yes - had - was - no | 139 | 51_were_yes_had_was | |
|
| 52 | jekyll - hyde - were - myself - had | 135 | 52_jekyll_hyde_were_myself | |
|
| 53 | loved - were - philip - was - could | 128 | 53_loved_were_philip_was | |
|
| 54 | falstaff - mistress - ford - forsooth - windsor | 127 | 54_falstaff_mistress_ford_forsooth | |
|
| 55 | hurstwood - were - barn - had - was | 127 | 55_hurstwood_were_barn_had | |
|
| 56 | provost - capell - collier - conj - pope | 126 | 56_provost_capell_collier_conj | |
|
| 57 | gretchen - highness - chancellor - hildegarde - yes | 125 | 57_gretchen_highness_chancellor_hildegarde | |
|
| 58 | delamere - watson - dr - ll - no | 124 | 58_delamere_watson_dr_ll | |
|
| 59 | jem - her - were - felt - margaret | 123 | 59_jem_her_were_felt | |
|
| 60 | beowulf - grendel - hrothgar - wiglaf - hero | 111 | 60_beowulf_grendel_hrothgar_wiglaf | |
|
| 61 | verloc - seemed - was - were - had | 102 | 61_verloc_seemed_was_were | |
|
| 62 | hamlet - guildenstern - rosencrantz - fortinbras - polonius | 102 | 62_hamlet_guildenstern_rosencrantz_fortinbras | |
|
| 63 | corey - mrs - yes - business - lapham | 101 | 63_corey_mrs_yes_business | |
|
| 64 | projectiles - cannon - projectile - distance - satellite | 99 | 64_projectiles_cannon_projectile_distance | |
|
| 65 | piano - musical - music - played - beethoven | 98 | 65_piano_musical_music_played | |
|
| 66 | wedding - bridegroom - were - marriage - looked | 93 | 66_wedding_bridegroom_were_marriage | |
|
| 67 | juan - her - fame - some - had | 92 | 67_juan_her_fame_some | |
|
| 68 | were - looked - felt - her - had | 91 | 68_were_looked_felt_her | |
|
| 69 | staked - gambling - wildeve - stakes - dice | 91 | 69_staked_gambling_wildeve_stakes | |
|
| 70 | mistress - leonora - wanted - florence - was | 89 | 70_mistress_leonora_wanted_florence | |
|
| 71 | delano - ship - sailor - captain - benito | 87 | 71_delano_ship_sailor_captain | |
|
| 72 | yes - goring - no - robert - room | 85 | 72_yes_goring_no_robert | |
|
| 73 | stockmann - yes - horster - mayor - dr | 81 | 73_stockmann_yes_horster_mayor | |
|
| 74 | ll - were - looked - carl - was | 80 | 74_ll_were_looked_carl | |
|
| 75 | barber - philosophy - no - some - man | 78 | 75_barber_philosophy_no_some | |
|
| 76 | tom - maggie - came - had - tulliver | 78 | 76_tom_maggie_came_had | |
|
| 77 | middlemarch - hustings - candidate - brooke - may | 75 | 77_middlemarch_hustings_candidate_brooke | |
|
| 78 | inspector - verloc - yes - affair - police | 75 | 78_inspector_verloc_yes_affair | |
|
| 79 | scrooge - merry - no - christmas - man | 73 | 79_scrooge_merry_no_christmas | |
|
| 80 | coquenard - mutton - served - were - pudding | 70 | 80_coquenard_mutton_served_were | |
|
| 81 | yes - no - jack - ll - tell | 69 | 81_yes_no_jack_ll | |
|
| 82 | seth - lisbeth - th - ud - no | 67 | 82_seth_lisbeth_th_ud | |
|
| 83 | higgins - eliza - her - she - liza | 66 | 83_higgins_eliza_her_she | |
|
| 84 | yarmouth - were - went - had - was | 65 | 84_yarmouth_were_went_had | |
|
| 85 | servian - sergius - yes - catherine - no | 64 | 85_servian_sergius_yes_catherine | |
|
| 86 | service - army - salvation - institution - training | 61 | 86_service_army_salvation_institution | |
|
| 87 | condemn - ff - pray - mercy - conj | 58 | 87_condemn_ff_pray_mercy | |
|
| 88 | lucy - bartlett - were - could - she | 57 | 88_lucy_bartlett_were_could | |
|
| 89 | wills - seemed - bequest - were - testator | 54 | 89_wills_seemed_bequest_were | |
|
| 90 | scene - iii - malvolio - valentine - cesario | 54 | 90_scene_iii_malvolio_valentine | |
|
| 91 | fuss - think - ll - thinks - oh | 53 | 91_fuss_think_ll_thinks | |
|
| 92 | hermia - demetrius - helena - theseus - helen | 50 | 92_hermia_demetrius_helena_theseus | |
|
| 93 | seemed - rochester - were - had - yes | 50 | 93_seemed_rochester_were_had | |
|
| 94 | sorrow - mourned - myself - had - was | 48 | 94_sorrow_mourned_myself_had | |
|
| 95 | gerty - sleepless - tea - weariness - tired | 48 | 95_gerty_sleepless_tea_weariness | |
|
| 96 | rushworth - crawford - were - sotherton - was | 47 | 96_rushworth_crawford_were_sotherton | |
|
| 97 | reasoning - syllogisme - names - signification - definitions | 46 | 97_reasoning_syllogisme_names_signification | |
|
| 98 | could - caleb - sure - work - no | 46 | 98_could_caleb_sure_work | |
|
| 99 | rose - tears - hope - tell - wish | 46 | 99_rose_tears_hope_tell | |
|
| 100 | peggotty - em - gummidge - he - ll | 46 | 100_peggotty_em_gummidge_he | |
|
| 101 | time - future - story - paradox - traveller | 46 | 101_time_future_story_paradox | |
|
| 102 | cleopatra - antony - caesar - loved - slave | 45 | 102_cleopatra_antony_caesar_loved | |
|
| 103 | appendicitis - doctors - doctor - dr - wanted | 45 | 103_appendicitis_doctors_doctor_dr | |
|
| 104 | slept - awoke - waking - sleep - seemed | 44 | 104_slept_awoke_waking_sleep | |
|
| 105 | parlour - room - seemed - sat - had | 43 | 105_parlour_room_seemed_sat | |
|
| 106 | prophets - scripture - prophet - moses - prophecy | 43 | 106_prophets_scripture_prophet_moses | |
|
| 107 | letter - honour - adieu - duval - evelina | 43 | 107_letter_honour_adieu_duval | |
|
| 108 | complications - cranky - had - tanis - was | 43 | 108_complications_cranky_had_tanis | |
|
| 109 | fled - armies - brussels - imperial - napoleon | 42 | 109_fled_armies_brussels_imperial | |
|
| 110 | philip - easel - greco - impressionists - manet | 42 | 110_philip_easel_greco_impressionists | |
|
| 111 | harlings - harling - frances - were - shimerdas | 40 | 111_harlings_harling_frances_were | |
|
| 112 | jane - mrs - janet - eyre - her | 40 | 112_jane_mrs_janet_eyre | |
|
| 113 | prisoner - confinement - prisoners - prison - gaoler | 40 | 113_prisoner_confinement_prisoners_prison | |
|
| 114 | hardcastle - marlow - impudence - constance - modesty | 40 | 114_hardcastle_marlow_impudence_constance | |
|
| 115 | horatio - murder - revenge - sorrow - hieronimo | 40 | 115_horatio_murder_revenge_sorrow | |
|
| 116 | traddles - had - married - room - horace | 39 | 116_traddles_had_married_room | |
|
| 117 | philip - tell - feelings - was - remember | 38 | 117_philip_tell_feelings_was | |
|
| 118 | nervous - countenance - seemed - he - huxtable | 38 | 118_nervous_countenance_seemed_he | |
|
| 119 | rogers - wanted - lapham - could - silas | 38 | 119_rogers_wanted_lapham_could | |
|
| 120 | titus - timon - varro - servilius - alcibiades | 37 | 120_titus_timon_varro_servilius | |
|
| 121 | morality - justice - moral - impartiality - unjust | 37 | 121_morality_justice_moral_impartiality | |
|
| 122 | willard - elmer - were - was - henderson | 37 | 122_willard_elmer_were_was | |
|
| 123 | had - was - could - circumstances - possession | 37 | 123_had_was_could_circumstances | |
|
| 124 | monkey - he - sahib - rat - sara | 36 | 124_monkey_he_sahib_rat | |
|
| 125 | mcmurdo - mcginty - cormac - police - scanlan | 36 | 125_mcmurdo_mcginty_cormac_police | |
|
| 126 | hetty - herself - she - her - had | 36 | 126_hetty_herself_she_her | |
|
| 127 | dimmesdale - reverend - chillingworth - clergyman - deacon | 35 | 127_dimmesdale_reverend_chillingworth_clergyman | |
|
| 128 | formerly - eliza - was - friend - friends | 34 | 128_formerly_eliza_was_friend | |
|
| 129 | were - seemed - had - was - felt | 34 | 129_were_seemed_had_was | |
|
| 130 | prisoner - jerry - lorry - tellson - court | 33 | 130_prisoner_jerry_lorry_tellson | |
|
| 131 | macmurdo - wenham - captain - steyne - crawley | 33 | 131_macmurdo_wenham_captain_steyne | |
|
| 132 | ducal - duchy - xv - fetes - theatre | 32 | 132_ducal_duchy_xv_fetes | |
|
| 133 | chapter - book - dows - unt - windowpane | 32 | 133_chapter_book_dows_unt | |
|
| 134 | money - riches - things - risk - thoughts | 31 | 134_money_riches_things_risk | |
|
| 135 | bethy - beth - seemed - sister - her | 31 | 135_bethy_beth_seemed_sister | |
|
| 136 | oliver - pickwick - were - was - inn | 30 | 136_oliver_pickwick_were_was | |
|
|
|
</details> |
|
|
|
## Training hyperparameters |
|
|
|
* calculate_probabilities: True |
|
* language: None |
|
* low_memory: False |
|
* min_topic_size: 30 |
|
* n_gram_range: (1, 1) |
|
* nr_topics: auto |
|
* seed_topic_list: None |
|
* top_n_words: 10 |
|
* verbose: True |
|
|
|
## Framework versions |
|
|
|
* Numpy: 1.24.3 |
|
* HDBSCAN: 0.8.29 |
|
* UMAP: 0.5.3 |
|
* Pandas: 2.0.2 |
|
* Scikit-Learn: 1.2.2 |
|
* Sentence-transformers: 2.2.2 |
|
* Transformers: 4.30.2 |
|
* Numba: 0.57.1 |
|
* Plotly: 5.15.0 |
|
* Python: 3.10.11 |