cnn_dailymail_123_3000_1500_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 57
- Number of training documents: 3000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - one - police - people - year | 10 | -1_said_one_police_people |
0 | league - player - cup - goal - game | 1070 | 0_league_player_cup_goal |
1 | police - said - home - murder - found | 320 | 1_police_said_home_murder |
2 | court - mr - said - year - sex | 142 | 2_court_mr_said_year |
3 | obama - president - republicans - house - republican | 113 | 3_obama_president_republicans_house |
4 | plane - flight - passenger - airport - aircraft | 89 | 4_plane_flight_passenger_airport |
5 | hospital - care - family - baby - mr | 59 | 5_hospital_care_family_baby |
6 | fashion - dress - style - look - collection | 57 | 6_fashion_dress_style_look |
7 | mr - minister - cameron - party - labour | 50 | 7_mr_minister_cameron_party |
8 | weight - diet - food - fat - school | 49 | 8_weight_diet_food_fat |
9 | mars - space - climate - nasa - mission | 43 | 9_mars_space_climate_nasa |
10 | apple - ipad - iphone - app - apples | 41 | 10_apple_ipad_iphone_app |
11 | shark - dolphin - fish - coast - water | 39 | 11_shark_dolphin_fish_coast |
12 | teacher - school - student - said - state | 37 | 12_teacher_school_student_said |
13 | murray - wimbledon - win - champion - match | 36 | 13_murray_wimbledon_win_champion |
14 | race - prix - hamilton - gold - world | 33 | 14_race_prix_hamilton_gold |
15 | dog - animal - owner - dogs - tiger | 32 | 15_dog_animal_owner_dogs |
16 | syrian - syria - isis - islamic - force | 32 | 16_syrian_syria_isis_islamic |
17 | storm - weather - lava - snow - said | 32 | 17_storm_weather_lava_snow |
18 | chocolate - sale - cent - online - caramel | 32 | 18_chocolate_sale_cent_online |
19 | afghanistan - afghan - pakistan - herat - taliban | 32 | 19_afghanistan_afghan_pakistan_herat |
20 | music - band - halen - song - album | 30 | 20_music_band_halen_song |
21 | beach - island - resort - park - hotel | 29 | 21_beach_island_resort_park |
22 | mcilroy - golf - round - shot - hole | 27 | 22_mcilroy_golf_round_shot |
23 | text - data - nsa - credit - email | 26 | 23_text_data_nsa_credit |
24 | show - film - movie - actor - griffiths | 26 | 24_show_film_movie_actor |
25 | putin - russian - russia - ukraine - moscow | 26 | 25_putin_russian_russia_ukraine |
26 | art - artist - work - painting - pinata | 25 | 26_art_artist_work_painting |
27 | economy - eurozone - european - euro - debt | 24 | 27_economy_eurozone_european_euro |
28 | north - kim - korea - korean - jong | 24 | 28_north_kim_korea_korean |
29 | ebola - virus - liberia - africa - outbreak | 22 | 29_ebola_virus_liberia_africa |
30 | bike - speed - road - driver - cyclist | 22 | 30_bike_speed_road_driver |
31 | car - accident - driver - scene - crash | 20 | 31_car_accident_driver_scene |
32 | price - london - house - home - property | 20 | 32_price_london_house_home |
33 | al - qaeda - yemen - us - yemeni | 20 | 33_al_qaeda_yemen_us |
34 | mrs - police - murder - greaves - mr | 20 | 34_mrs_police_murder_greaves |
35 | per - cent - people - age - average | 19 | 35_per_cent_people_age |
36 | philpott - court - berry - husband - dewani | 18 | 36_philpott_court_berry_husband |
37 | facebook - photo - user - instagram - cuddle | 17 | 37_facebook_photo_user_instagram |
38 | vaccine - meningitis - disease - flu - princeton | 17 | 38_vaccine_meningitis_disease_flu |
39 | bear - lion - gorilla - cub - zoo | 16 | 39_bear_lion_gorilla_cub |
40 | brain - drug - alzheimers - memory - patient | 16 | 40_brain_drug_alzheimers_memory |
41 | prince - royal - queen - duchess - duke | 16 | 41_prince_royal_queen_duchess |
42 | boat - ship - river - vessel - ferry | 15 | 42_boat_ship_river_vessel |
43 | china - chinese - chinas - organ - hong | 14 | 43_china_chinese_chinas_organ |
44 | egypt - election - egyptian - mubarak - protest | 13 | 44_egypt_election_egyptian_mubarak |
45 | mexico - mexican - cartel - mexicos - drug | 13 | 45_mexico_mexican_cartel_mexicos |
46 | cia - assange - snowden - us - interrogation | 13 | 46_cia_assange_snowden_us |
47 | police - hartman - hore - store - maitua | 13 | 47_police_hartman_hore_store |
48 | israeli - israel - palestinian - gaza - hamas | 12 | 48_israeli_israel_palestinian_gaza |
49 | pension - tax - scheme - energy - cent | 12 | 49_pension_tax_scheme_energy |
50 | council - neighbour - village - site - shed | 12 | 50_council_neighbour_village_site |
51 | occupy - protester - york - cosby - mayor | 11 | 51_occupy_protester_york_cosby |
52 | mould - allergic - allergy - reaction - hand | 11 | 52_mould_allergic_allergy_reaction |
53 | boko - haram - nigeria - sudan - isis | 11 | 53_boko_haram_nigeria_sudan |
54 | disaster - building - tsunami - people - quake | 11 | 54_disaster_building_tsunami_people |
55 | castro - sloot - der - ariel - aruba | 11 | 55_castro_sloot_der_ariel |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.22.4
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.56.4
- Plotly: 5.13.1
- Python: 3.10.6
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.