xsum_22457_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_22457_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 45
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - would - people - also 6 -1_said_mr_would_people
0 win - kick - game - foul - united 1243 0_win_kick_game_foul
1 health - patient - nhs - hospital - cancer 453 1_health_patient_nhs_hospital
2 film - actor - music - song - star 119 2_film_actor_music_song
3 bank - business - share - market - sale 83 3_bank_business_share_market
4 police - northern - ireland - said - crime 80 4_police_northern_ireland_said
5 wicket - england - cricket - test - bowler 67 5_wicket_england_cricket_test
6 president - mr - election - government - farc 67 6_president_mr_election_government
7 labour - party - mr - election - corbyn 62 7_labour_party_mr_election
8 bird - specie - animal - zoo - dna 58 8_bird_specie_animal_zoo
9 school - education - student - teacher - schools 48 9_school_education_student_teacher
10 murder - court - mr - police - said 42 10_murder_court_mr_police
11 crash - police - road - died - collision 42 11_crash_police_road_died
12 rail - transport - said - passenger - train 38 12_rail_transport_said_passenger
13 facebook - console - broadband - game - company 38 13_facebook_console_broadband_game
14 lifeboat - rnli - water - sea - hms 37 14_lifeboat_rnli_water_sea
15 fire - blaze - said - cladding - building 35 15_fire_blaze_said_cladding
16 russia - syria - russian - syrian - military 34 16_russia_syria_russian_syrian
17 girl - child - abuse - court - sexual 32 17_girl_child_abuse_court
18 trump - mr - president - trumps - clinton 29 18_trump_mr_president_trumps
19 man - police - arrested - suspicion - hospital 27 19_man_police_arrested_suspicion
20 murray - tennis - djokovic - wimbledon - grand 26 20_murray_tennis_djokovic_wimbledon
21 medal - gold - olympic - games - world 25 21_medal_gold_olympic_games
22 india - indian - crop - modi - hindu 24 22_india_indian_crop_modi
23 birdie - open - round - golf - mcilroy 23 23_birdie_open_round_golf
24 earth - particle - space - moon - dark 20 24_earth_particle_space_moon
25 madrid - barcelona - foul - assisted - corner 20 25_madrid_barcelona_foul_assisted
26 eu - uk - brexit - european - would 20 26_eu_uk_brexit_european
27 athlete - doping - ioc - olympic - medal 19 27_athlete_doping_ioc_olympic
28 wales - welsh - government - waste - money 18 28_wales_welsh_government_waste
29 race - rosberg - hamilton - mercedes - engine 16 29_race_rosberg_hamilton_mercedes
30 plane - flight - mh370 - aircraft - airlines 16 30_plane_flight_mh370_aircraft
31 fight - pacquiao - mayweather - champion - whyte 14 31_fight_pacquiao_mayweather_champion
32 attack - us - security - bin - killed 14 32_attack_us_security_bin
33 virus - ebola - outbreak - disease - infected 12 33_virus_ebola_outbreak_disease
34 greece - migrant - eu - greek - crisis 12 34_greece_migrant_eu_greek
35 hie - farm - enterprise - energy - funicular 12 35_hie_farm_enterprise_energy
36 inflation - growth - rate - economist - manufacturing 11 36_inflation_growth_rate_economist
37 yn - ar - bod - ei - wedi 11 37_yn_ar_bod_ei
38 cup - group - sredojevic - al - mazembe 11 38_cup_group_sredojevic_al
39 picasso - picture - image - collection - cameron 9 39_picasso_picture_image_collection
40 froome - sky - tour - wiggins - team 8 40_froome_sky_tour_wiggins
41 carnival - event - pride - lgbt - notting 7 41_carnival_event_pride_lgbt
42 cocaine - corkindale - supply - connelly - drug 6 42_cocaine_corkindale_supply_connelly
43 meal - child - school - family - scheme 6 43_meal_child_school_family

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.