wadood commited on
Commit
0c2f99e
β€’
1 Parent(s): 28687f6

removed unreadable pie chart

Browse files
Files changed (2) hide show
  1. app.py +3 -3
  2. src/about.py +1 -3
app.py CHANGED
@@ -13,7 +13,7 @@ from src.about import (
13
  LLM_BENCHMARKS_TEXT_1,
14
  EVALUATION_EXAMPLE_IMG,
15
  LLM_BENCHMARKS_TEXT_2,
16
- ENTITY_DISTRIBUTION_IMG,
17
  LLM_BENCHMARKS_TEXT_3,
18
  TITLE,
19
  LOGO
@@ -83,7 +83,7 @@ token_based_types_leaderboard_df = token_based_types_original_df.copy()
83
 
84
 
85
  def update_df(evaluation_metric, shown_columns, subset="datasets"):
86
- print(evaluation_metric)
87
 
88
  if subset == "datasets":
89
  match evaluation_metric:
@@ -506,7 +506,7 @@ with demo:
506
  gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
507
  gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
508
  gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
509
- gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
510
  gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
511
 
512
  with gr.TabItem("πŸš€ Submit here! ", elem_id="llm-benchmark-tab-table", id=3):
 
13
  LLM_BENCHMARKS_TEXT_1,
14
  EVALUATION_EXAMPLE_IMG,
15
  LLM_BENCHMARKS_TEXT_2,
16
+ # ENTITY_DISTRIBUTION_IMG,
17
  LLM_BENCHMARKS_TEXT_3,
18
  TITLE,
19
  LOGO
 
83
 
84
 
85
  def update_df(evaluation_metric, shown_columns, subset="datasets"):
86
+ # print(evaluation_metric)
87
 
88
  if subset == "datasets":
89
  match evaluation_metric:
 
506
  gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
507
  gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
508
  gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
509
+ # gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
510
  gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
511
 
512
  with gr.TabItem("πŸš€ Submit here! ", elem_id="llm-benchmark-tab-table", id=3):
src/about.py CHANGED
@@ -184,8 +184,6 @@ The above datasets are modified to cater to the clinical setting. For this, the
184
  | Gene | 1180 |
185
  | Gene Variant | 241 |
186
 
187
-
188
- The pie chart on the left below the distribution of clinical entities and their original dataset types.
189
  """
190
 
191
  ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
@@ -214,7 +212,7 @@ He had been diagnosed with <span class="disease" >osteoarthritis of the knees</s
214
  After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
215
 
216
  # Reproducibility
217
- To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/medics_ner/blob/master/docs/reproducing_results.md)
218
 
219
  # Disclaimer and Advisory
220
  The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.
 
184
  | Gene | 1180 |
185
  | Gene Variant | 241 |
186
 
 
 
187
  """
188
 
189
  ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
 
212
  After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
213
 
214
  # Reproducibility
215
+ To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/master/docs/reproducing_results.md)
216
 
217
  # Disclaimer and Advisory
218
  The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.