--- library_name: span-marker tags: - span-marker - token-classification - ner - named-entity-recognition - generated_from_span_marker_trainer metrics: - precision - recall - f1 widget: [] pipeline_tag: token-classification --- # SpanMarker This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. ## Model Details ### Model Description - **Model Type:** SpanMarker - **Maximum Sequence Length:** 512 tokens - **Maximum Entity Length:** 150 words ### Model Sources - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) ## Uses ### Direct Use for Inference ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("span_marker_model_id") # Run inference entities = model.predict("None") ``` ### Downstream Use You can finetune this model on your own dataset.
Click to expand ```python from span_marker import SpanMarkerModel, Trainer # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("span_marker_model_id") # Specify a Dataset with "tokens" and "ner_tag" columns dataset = load_dataset("conll2003") # For example CoNLL2003 # Initialize a Trainer using the pretrained model & dataset trainer = Trainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["validation"], ) trainer.train() trainer.save_model("span_marker_model_id-finetuned") ```
## Training Details ### Framework Versions - Python: 3.10.12 - SpanMarker: 1.5.0 - Transformers: 4.35.2 - PyTorch: 2.1.0+cu121 - Datasets: 2.16.1 - Tokenizers: 0.15.1 ## Citation ### BibTeX ``` @software{Aarsen_SpanMarker, author = {Aarsen, Tom}, license = {Apache-2.0}, title = {{SpanMarker for Named Entity Recognition}}, url = {https://github.com/tomaarsen/SpanMarkerNER} } ```