metadata

tags:
  - generated_from_keras_callback
model-index:
  - name: GeoBERT
    results: []

GeoBERT

GeoBERT is a NER model that was fine-tuned from SciBERT on the Geoscientific Corpus dataset. The model was trained on the Labeled Geoscientific Corpus dataset (~1 million sentences).

Intended uses

The NER product in this model has a goal to identify four main semantic types or categories related to Geosciences.

GeoPetro for any entities that belong to all terms in Geosciences
GeoMeth for all tools or methods associated with Geosciences
GeoLoc to identify geological locations
GeoTime for identifying the geological time scale entities

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 14000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: mixed_float16

Framework versions

Transformers 4.22.1
TensorFlow 2.10.0
Datasets 2.4.0
Tokenizers 0.12.1

Model performances (metric: seqeval)

entity	precision	recall	f1
GeoLoc	0.9727	0.9591	0.9658
GeoMeth	0.9433	0.9447	0.9445
GeoPetro	0.9767	0.9745	0.9756
GeoTime	0.9695	0.9666	0.9680

How to use GeoBERT with HuggingFace

Load GeoBERT and its sub-word tokenizer :

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("botryan96/GeoBERT")
model = AutoModelForTokenClassification.from_pretrained("botryan96/GeoBERT")