|
# Logion: Machine Learning for Greek Philology |
|
|
|
BERT model trained on largest set of Ancient Greek texts to-date. |
|
Read the ALP paper on [here](https://aclanthology.org/2023.alp-1.20/). |
|
|
|
Trained using WordPiece tokenizer (vocab size of 50,000) on a corpus of 70+ million words in pre-modern Greek. |
|
|
|
## How to use |
|
|
|
Requirements: |
|
|
|
```python |
|
pip install transformers |
|
``` |
|
|
|
Load the model and tokenizer directly from the HuggingFace Model Hub: |
|
|
|
|
|
```python |
|
from transformers import BertTokenizer, BertForMaskedLM |
|
tokenizer = BertTokenizer.from_pretrained("princeton-logion/LOGION-50k_wordpiece") |
|
model = BertForMaskedLM.from_pretrained("princeton-logion/LOGION-50k_wordpiece") |
|
``` |
|
|
|
|
|
## Cite |
|
|
|
If you use this model in your research, please cite the paper: |
|
|
|
``` |
|
@inproceedings{cowen-breen-etal-2023-logion, |
|
title = "Logion: Machine-Learning Based Detection and Correction of Textual Errors in {G}reek Philology", |
|
author = "Cowen-Breen, Charlie and |
|
Brooks, Creston and |
|
Graziosi, Barbara and |
|
Haubold, Johannes", |
|
booktitle = "Proceedings of the Ancient Language Processing Workshop", |
|
year = "2023", |
|
url = "https://aclanthology.org/2023.alp-1.20", |
|
pages = "170--178", |
|
} |
|
``` |
|
|