HooshvareLab
/

bert-base-parsbert-armanner-uncased

@@ -17,37 +17,6 @@ All the models (downstream tasks) are uncased and trained with whole word maskin
 This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
-### PEYMA
-PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
-1. Organization
-2. Money
-3. Location
-4. Date
-5. Time
-6. Person
-7. Percent
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 16964 |
-|     Money    |  2037 |
-|   Location   |  8782 |
-|     Date     |  4259 |
-|     Time     |  732  |
-|    Person    |  7675 |
-|    Percent   |  699  |
-**Download**
-You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
----
 ### ARMAN
 ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
@@ -80,11 +49,9 @@ You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
 The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
-| Dataset         | ParsBERT | MorphoBERT |  Beheshti-NER  |  LSTM-CRF  |  Rule-Based CRF  |  BiLSTM-CRF  |
-|:---------------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
-|  ARMAN + PEYMA  |   95.13* |      -     |        -       |      -     |         -        |       -      |
-|  PEYMA          |   98.79* |      -     |      90.59     |      -     |       84.00      |       -      |
-|  ARMAN          |   93.10* |    89.9    |      84.03     |    86.55   |         -        |     77.45    |
 ## How to use :hugs:

 This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
 ### ARMAN
 ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
 The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
+| Dataset | ParsBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
+|---------|----------|------------|--------------|----------|----------------|------------|
+| ARMAN   | 93.10*   | 89.9       | 84.03        | 86.55    | -              | 77.45      |
 ## How to use :hugs: