license: apache-2.0 | |
language: | |
- en | |
base_model: | |
- nasa-impact/nasa-smd-ibm-v0.1 | |
pipeline_tag: token-classification | |
tags: | |
- astronomy | |
# INDUS - NER-DEAL | |
Indus-NER-DEAL (nasa-smd-ibm-v0.1_NER_DEAL) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search. | |
This specific fork was finetuned on SciX Digital Library (https://scixplorer.org/, formerly NASA-ADS) proprietary data to label text with DEAL labels (https://ui.adsabs.harvard.edu/WIESP/2022/LabelDefinitions) | |
## Usage | |
```python | |
from transformers import AutoModelForTokenClassification, AutoTokenizer | |
INDUS_NER_DEAL = AutoModelForTokenClassification.from_pretrained(pretrained_model_name_or_path='adsabs/nasa-smd-ibm-v0.1_NER_DEAL', | |
revision=None, | |
) | |
INDUS_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path='adsabs/nasa-smd-ibm-v0.1_NER_DEAL', | |
do_lower_case=False, | |
) | |
``` | |
## Model Details | |
- **Base Model**: RoBERTa | |
- **Tokenizer**: Custom | |
- **Parameters**: 125M | |
## Training Data | |
- 5K acknowledgements and full-text fragments from astronomy papers provided by NASA-SciX with manually tagged astronomical facilities and other entities of interest (e.g., celestial objects). | |
- approximately 1.6M words | |
<!-- ## Note --> | |
<!-- ## Citation --> | |
<!-- If you find this work useful, please cite using the following bibtex citation: --> | |
<!-- ## Disclaimer --> |