xsum_123_3000_1500_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 47
- Number of training documents: 3000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - mr - police - people - would | 5 | -1_said_mr_police_people |
0 | win - game - half - foul - league | 1132 | 0_win_game_half_foul |
1 | eu - labour - party - would - uk | 591 | 1_eu_labour_party_would |
2 | athlete - sport - gold - olympic - medal | 149 | 2_athlete_sport_gold_olympic |
3 | nhs - health - care - patient - hospital | 104 | 3_nhs_health_care_patient |
4 | growth - price - market - sale - economy | 84 | 4_growth_price_market_sale |
5 | president - mr - government - maduro - rousseff | 71 | 5_president_mr_government_maduro |
6 | crash - police - hospital - road - driver | 58 | 6_crash_police_hospital_road |
7 | murray - match - set - tennis - seed | 46 | 7_murray_match_set_tennis |
8 | syrian - us - syria - rebel - force | 45 | 8_syrian_us_syria_rebel |
9 | school - education - pupil - schools - child | 41 | 9_school_education_pupil_schools |
10 | animal - zoo - wildlife - bird - specie | 40 | 10_animal_zoo_wildlife_bird |
11 | film - actor - star - series - drama | 38 | 11_film_actor_star_series |
12 | abuse - court - sexual - police - victim | 38 | 12_abuse_court_sexual_police |
13 | trump - mr - clinton - republican - president | 31 | 13_trump_mr_clinton_republican |
14 | fire - blaze - building - service - firefighters | 31 | 14_fire_blaze_building_service |
15 | suu - party - mr - government - election | 29 | 15_suu_party_mr_government |
16 | china - korea - chinese - south - north | 29 | 16_china_korea_chinese_south |
17 | album - band - song - music - best | 25 | 17_album_band_song_music |
18 | ms - heard - court - death - said | 24 | 18_ms_heard_court_death |
19 | wales - welsh - said - train - government | 23 | 19_wales_welsh_said_train |
20 | road - police - death - seen - found | 23 | 20_road_police_death_seen |
21 | passenger - crew - sea - boat - aircraft | 23 | 21_passenger_crew_sea_boat |
22 | russian - ukraine - russia - mr - ukrainian | 22 | 22_russian_ukraine_russia_mr |
23 | fight - joshua - title - khan - boxing | 22 | 23_fight_joshua_title_khan |
24 | samsung - phone - app - android - user | 20 | 24_samsung_phone_app_android |
25 | earthquake - particle - nepal - building - mars | 19 | 25_earthquake_particle_nepal_building |
26 | highways - traffic - dartford - council - road | 18 | 26_highways_traffic_dartford_council |
27 | vettel - hamilton - lap - race - alonso | 18 | 27_vettel_hamilton_lap_race |
28 | park - building - visitor - festival - visitscotland | 16 | 28_park_building_visitor_festival |
29 | site - council - street - project - plan | 15 | 29_site_council_street_project |
30 | abdeslam - paris - attack - belgian - salah | 15 | 30_abdeslam_paris_attack_belgian |
31 | virus - ebola - disease - hiv - sierra | 14 | 31_virus_ebola_disease_hiv |
32 | security - data - attack - cyber - malware | 14 | 32_security_data_attack_cyber |
33 | dog - dogs - stray - pet - owner | 14 | 33_dog_dogs_stray_pet |
34 | birdie - pga - bogey - woods - open | 13 | 34_birdie_pga_bogey_woods |
35 | man - police - wearing - incident - anyone | 13 | 35_man_police_wearing_incident |
36 | energy - pipeline - waste - renewables - electricity | 13 | 36_energy_pipeline_waste_renewables |
37 | silence - bishop - belfast - people - attended | 11 | 37_silence_bishop_belfast_people |
38 | painting - art - work - artist - exhibition | 11 | 38_painting_art_work_artist |
39 | eyre - gaunt - lyttle - peter - court | 10 | 39_eyre_gaunt_lyttle_peter |
40 | crime - police - force - constable - chief | 9 | 40_crime_police_force_constable |
41 | flood - river - rain - louisiana - flooded | 9 | 41_flood_river_rain_louisiana |
42 | charity - abuse - yentob - porn - batmanghelidjh | 7 | 42_charity_abuse_yentob_porn |
43 | india - nidar - gun - yrf - film | 6 | 43_india_nidar_gun_yrf |
44 | driving - stirling - winn - fraser - road | 6 | 44_driving_stirling_winn_fraser |
45 | boko - haram - shekau - militant - monguno | 5 | 45_boko_haram_shekau_militant |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.22.4
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.57.1
- Plotly: 5.13.1
- Python: 3.10.12
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.