Turkish NER dataset from Wikipedia sentences. 20.000 sentences are sampled and re-annotated from Kuzgunlar NER dataset.

Data split:

-18.000 train -1000 test -1000 dev

Labels:

• CARDINAL • DATE • EVENT • FAC • GPE • LANGUAGE • LAW • LOC • MONEY • NORP • ORDINAL • ORG • PERCENT • PERSON • PRODUCT • QUANTITY • TIME • TITLE • WORK_OF_ART

Example:

image/png

Model Evaluation

The validation process of the model was performed on the test dataset. During the evaluation:

• The model was put into evaluation mode.

• Loss and accuracy were calculated.

• A classification report was created using the Seqeval library. It shows the performance of the model for each label in detail. image/png

Results and Performance

The accuracy and loss values ​​obtained in the training and validation stages of the model are reported, and the classification report and F1 score, precision and recall values ​​of each label are given. The performance of the model reached high accuracy rates in the Turkish NER task.

It has shown the effectiveness of the BERT model for named entity recognition tasks in the Turkish language. The methods used in the training and evaluation processes increased the overall performance of the model and ensured that the difficulties related to the language model were overcome. image/png

Downloads last month
35
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train girayyagmur/bert-base-turkish-ner-cased

Space using girayyagmur/bert-base-turkish-ner-cased 1