guishe
/

span-marker-bge-base-en-v1.5-fewnerd-fine-super

@@ -1,4 +1,7 @@
 ---
 library_name: span-marker
 tags:
 - span-marker
@@ -6,34 +9,145 @@ tags:
 - ner
 - named-entity-recognition
 - generated_from_span_marker_trainer
 metrics:
 - precision
 - recall
 - f1
-widget: []
 pipeline_tag: token-classification
 ---
-# SpanMarker
-This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.
 ## Model Details
 ### Model Description
 - **Model Type:** SpanMarker
-<!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
-<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
 ### Model Sources
 - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
 ## Uses
 ### Direct Use for Inference
@@ -42,9 +156,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that ca
 from span_marker import SpanMarkerModel
 # Download from the 🤗 Hub
-model = SpanMarkerModel.from_pretrained("span_marker_model_id")
 # Run inference
-entities = model.predict("None")
 ```
 ### Downstream Use
@@ -56,7 +170,7 @@ You can finetune this model on your own dataset.
 from span_marker import SpanMarkerModel, Trainer
 # Download from the 🤗 Hub
-model = SpanMarkerModel.from_pretrained("span_marker_model_id")
 # Specify a Dataset with "tokens" and "ner_tag" columns
 dataset = load_dataset("conll2003") # For example CoNLL2003
@@ -68,30 +182,37 @@ trainer = Trainer(
     eval_dataset=dataset["validation"],
 )
 trainer.train()
-trainer.save_model("span_marker_model_id-finetuned")
 ```
 </details>
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
 ## Training Details
 ### Framework Versions
 - Python: 3.10.8
 - SpanMarker: 1.4.0
@@ -111,21 +232,3 @@ trainer.save_model("span_marker_model_id-finetuned")
     url = {https://github.com/tomaarsen/SpanMarkerNER}
 }
 ```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
+language:
+- en
+license: cc-by-nc-sa-4.0
 library_name: span-marker
 tags:
 - span-marker
 - ner
 - named-entity-recognition
 - generated_from_span_marker_trainer
+datasets:
+- DFKI-SLT/few-nerd
 metrics:
 - precision
 - recall
 - f1
+widget:
+- text: The WPC led the international peace movement in the decade after the Second
+    World War, but its failure to speak out against the Soviet suppression of the
+    1956 Hungarian uprising and the resumption of Soviet nuclear tests in 1961 marginalised
+    it, and in the 1960s it was eclipsed by the newer, non-aligned peace organizations
+    like the Campaign for Nuclear Disarmament.
+- text: Most of the Steven Seagal movie "Under Siege "(co-starring Tommy Lee Jones)
+    was filmed on the, which is docked on Mobile Bay at Battleship Memorial Park and
+    open to the public.
+- text: 'The Central African CFA franc (French: "franc CFA "or simply "franc ", ISO
+    4217 code: XAF) is the currency of six independent states in Central Africa: Cameroon,
+    Central African Republic, Chad, Republic of the Congo, Equatorial Guinea and Gabon.'
+- text: Brenner conducted post-doctoral research at Brandeis University with Gregory
+    Petsko and then took his first academic position at Thomas Jefferson University
+    in 1996, moving to Dartmouth Medical School in 2003, where he served as Associate
+    Director for Basic Sciences at Norris Cotton Cancer Center.
+- text: On Friday, October 27, 2017, the Senate of Spain (Senado) voted 214 to 47
+    to invoke Article 155 of the Spanish Constitution over Catalonia after the Catalan
+    Parliament declared the independence.
 pipeline_tag: token-classification
+base_model: BAAI/bge-base-en-v1.5
+model-index:
+- name: SpanMarker with BAAI/bge-base-en-v1.5 on FewNERD
+  results:
+  - task:
+      type: token-classification
+      name: Named Entity Recognition
+    dataset:
+      name: FewNERD
+      type: DFKI-SLT/few-nerd
+      split: eval
+    metrics:
+    - type: f1
+      value: 0.6726393599802055
+      name: F1
+    - type: precision
+      value: 0.6740082644628099
+      name: Precision
+    - type: recall
+      value: 0.6712760046916476
+      name: Recall
 ---
+# SpanMarker with BAAI/bge-base-en-v1.5 on FewNERD
+This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) as the underlying encoder.
 ## Model Details
 ### Model Description
 - **Model Type:** SpanMarker
+- **Encoder:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
+- **Training Dataset:** [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd)
+- **Language:** en
+- **License:** cc-by-nc-sa-4.0
 ### Model Sources
 - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
+### Model Labels
+| Label                                    | Examples                                                                                                 |
+|:-----------------------------------------|:---------------------------------------------------------------------------------------------------------|
+| art-broadcastprogram                     | "Corazones", "The Gale Storm Show : Oh , Susanna", "Street Cents"                                        |
+| art-film                                 | "L'Atlantide", "Bosch", "Shawshank Redemption"                                                           |
+| art-music                                | "Atkinson , Danko and Ford ( with Brockie and Hilton )", "Hollywood Studio Symphony", "Champion Lover"   |
+| art-other                                | "Venus de Milo", "The Today Show", "Aphrodite of Milos"                                                  |
+| art-painting                             | "Cofiwch Dryweryn", "Touit", "Production/Reproduction"                                                   |
+| art-writtenart                           | "Time", "The Seven Year Itch", "Imelda de ' Lambertazzi"                                                 |
+| building-airport                         | "Newark Liberty International Airport", "Luton Airport", "Sheremetyevo International Airport"            |
+| building-hospital                        | "Hokkaido University Hospital", "Memorial Sloan-Kettering Cancer Center", "Yeungnam University Hospital" |
+| building-hotel                           | "Radisson Blu Sea Plaza Hotel", "Flamingo Hotel", "The Standard Hotel"                                   |
+| building-library                         | "Bayerische Staatsbibliothek", "British Library", "Berlin State Library"                                 |
+| building-other                           | "Communiplex", "Alpha Recording Studios", "Henry Ford Museum"                                            |
+| building-restaurant                      | "Carnegie Deli", "Fatburger", "Trumbull"                                                                 |
+| building-sportsfacility                  | "Boston Garden", "Glenn Warner Soccer Facility", "Sports Center"                                         |
+| building-theater                         | "Pittsburgh Civic Light Opera", "National Paris Opera", "Sanders Theatre"                                |
+| event-attack/battle/war/militaryconflict | "Jurist", "Vietnam War", "Easter Offensive"                                                              |
+| event-disaster                           | "1693 Sicily earthquake", "the 1912 North Mount Lyell Disaster", "1990s North Korean famine"             |
+| event-election                           | "Elections to the European Parliament", "March 1898 elections", "1982 Mitcham and Morden by-election"    |
+| event-other                              | "Eastwood Scoring Stage", "Masaryk Democratic Movement", "Union for a Popular Movement"                  |
+| event-protest                            | "French Revolution", "Iranian Constitutional Revolution", "Russian Revolution"                           |
+| event-sportsevent                        | "Stanley Cup", "National Champions", "World Cup"                                                         |
+| location-GPE                             | "Croatian", "the Republic of Croatia", "Mediterranean Basin"                                             |
+| location-bodiesofwater                   | "Norfolk coast", "Atatürk Dam Lake", "Arthur Kill"                                                       |
+| location-island                          | "Staten Island", "Laccadives", "new Samsat district"                                                     |
+| location-mountain                        | "Ruweisat Ridge", "Salamander Glacier", "Miteirya Ridge"                                                 |
+| location-other                           | "Victoria line", "Northern City Line", "Cartuther"                                                       |
+| location-park                            | "Shenandoah National Park", "Gramercy Park", "Painted Desert Community Complex Historic District"        |
+| location-road/railway/highway/transit    | "NJT", "Friern Barnet Road", "Newark-Elizabeth Rail Link"                                                |
+| organization-company                     | "Texas Chicken", "Dixy Chicken", "Church 's Chicken"                                                     |
+| organization-education                   | "Barnard College", "MIT", "Belfast Royal Academy and the Ulster College of Physical Education"           |
+| organization-government/governmentagency | "Diet", "Congregazione dei Nobili", "Supreme Court"                                                      |
+| organization-media/newspaper             | "Clash", "TimeOut Melbourne", "Al Jazeera"                                                               |
+| organization-other                       | "Defence Sector C", "IAEA", "4th Army"                                                                   |
+| organization-politicalparty              | "Al Wafa ' Islamic", "Kenseitō", "Shimpotō"                                                              |
+| organization-religion                    | "Christian", "Jewish", "UPCUSA"                                                                          |
+| organization-showorganization            | "Lizzy", "Mr. Mister", "Bochumer Symphoniker"                                                            |
+| organization-sportsleague                | "First Division", "China League One", "NHL"                                                              |
+| organization-sportsteam                  | "Arsenal", "Tottenham", "Luc Alphand Aventures"                                                          |
+| other-astronomything                     | "Algol", "`` Caput Larvae ''", "Zodiac"                                                                  |
+| other-award                              | "Grand Commander of the Order of the Niger", "Order of the Republic of Guinea and Nigeria", "GCON"       |
+| other-biologything                       | "BAR", "N-terminal lipid", "Amphiphysin"                                                                 |
+| other-chemicalthing                      | "uranium", "carbon dioxide", "sulfur"                                                                    |
+| other-currency                           | "lac crore", "$", "Travancore Rupee"                                                                     |
+| other-disease                            | "French Dysentery Epidemic of 1779", "hypothyroidism", "bladder cancer"                                  |
+| other-educationaldegree                  | "Bachelor", "Master", "BSc ( Hons ) in physics"                                                          |
+| other-god                                | "El", "Fujin", "Raijin"                                                                                  |
+| other-language                           | "English", "Latin", "Breton-speaking"                                                                    |
+| other-law                                | "Thirty Years ' Peace", "Leahy–Smith America Invents Act ( AIA", "United States Freedom Support Act"     |
+| other-livingthing                        | "monkeys", "insects", "patchouli"                                                                        |
+| other-medical                            | "amitriptyline", "Pediatrics", "pediatrician"                                                            |
+| person-actor                             | "Ellaline Terriss", "Edmund Payne", "Tchéky Karyo"                                                       |
+| person-artist/author                     | "Hicks", "George Axelrod", "Gaetano Donizett"                                                            |
+| person-athlete                           | "Jaguar", "Tozawa", "Neville"                                                                            |
+| person-director                          | "Bob Swaim", "Richard Quine", "Frank Darabont"                                                           |
+| person-other                             | "Richard Benson", "Holden", "Campbell"                                                                   |
+| person-politician                        | "Emeric", "William", "Rivière"                                                                           |
+| person-scholar                           | "Stalmine", "Wurdack", "Stedman"                                                                         |
+| person-soldier                           | "Krukenberg", "Joachim Ziegler", "Helmuth Weidling"                                                      |
+| product-airplane                         | "EC135T2 CPDS", "Spey-equipped FGR.2s", "Luton"                                                          |
+| product-car                              | "100EX", "Phantom", "Corvettes - GT1 C6R"                                                                |
+| product-food                             | "red grape", "V. labrusca", "yakiniku"                                                                   |
+| product-game                             | "Splinter Cell", "Hardcore RPG", "Airforce Delta"                                                        |
+| product-other                            | "X11", "PDP-1", "Fairbottom Bobs"                                                                        |
+| product-ship                             | "HMS `` Chinkara ''", "Essex", "Congress"                                                                |
+| product-software                         | "Wikipedia", "AmiPDF", "Apdf"                                                                            |
+| product-train                            | "55022", "Royal Scots Grey", "High Speed Trains"                                                         |
+| product-weapon                           | "ZU-23-2M Wróbel", "ZU-23-2MR Wróbel II", "AR-15 's"                                                     |
 ## Uses
 ### Direct Use for Inference
 from span_marker import SpanMarkerModel
 # Download from the 🤗 Hub
+model = SpanMarkerModel.from_pretrained("guishe/span-marker-bge-base-en-v1.5-fewnerd-fine-super")
 # Run inference
+entities = model.predict("Most of the Steven Seagal movie \"Under Siege \"(co-starring Tommy Lee Jones) was filmed on the, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")
 ```
 ### Downstream Use
 from span_marker import SpanMarkerModel, Trainer
 # Download from the 🤗 Hub
+model = SpanMarkerModel.from_pretrained("guishe/span-marker-bge-base-en-v1.5-fewnerd-fine-super")
 # Specify a Dataset with "tokens" and "ner_tag" columns
 dataset = load_dataset("conll2003") # For example CoNLL2003
     eval_dataset=dataset["validation"],
 )
 trainer.train()
+trainer.save_model("guishe/span-marker-bge-base-en-v1.5-fewnerd-fine-super-finetuned")
 ```
 </details>
 ## Training Details
+### Training Set Metrics
+| Training set          | Min | Median  | Max |
+|:----------------------|:----|:--------|:----|
+| Sentence length       | 1   | 24.4945 | 267 |
+| Entities per sentence | 0   | 2.5832  | 88  |
+### Training Hyperparameters
+- learning_rate: 1e-05
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 3
+### Training Results
+| Epoch  | Step  | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
+|:------:|:-----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
+| 0.5964 | 3000  | 0.0324          | 0.6263               | 0.5826            | 0.6037        | 0.8981              |
+| 1.1928 | 6000  | 0.0278          | 0.6620               | 0.6499            | 0.6559        | 0.9132              |
+| 1.7893 | 9000  | 0.0264          | 0.6719               | 0.6614            | 0.6666        | 0.9159              |
+| 2.3857 | 12000 | 0.0260          | 0.6724               | 0.6703            | 0.6714        | 0.9174              |
+| 2.9821 | 15000 | 0.0258          | 0.6740               | 0.6713            | 0.6726        | 0.9177              |
 ### Framework Versions
 - Python: 3.10.8
 - SpanMarker: 1.4.0
     url = {https://github.com/tomaarsen/SpanMarkerNER}
 }
 ```