DaidalosTeam
/

la_core_web_lg_3.7.4

Model card Files Files and versions Community

la_core_web_lg_3.7.4 / README.md

DaidalosTeam's picture

Update README.md

dc887e4 verified 15 days ago

|

history blame contribute delete

2.45 kB

	---
	license: mit
	language:
	- la
	base_model:
	- latincy/la_core_web_lg
	model-index:
	- name: la_core_web_lg_3.7.4
	results:
	- task:
	type: NER
	dataset:
	name: Herodotos_dataset
	type: Herodotos_dataset
	metrics:
	- name: macro F1
	type: macro F1
	value: 58
	source:
	name: SEFLAG
	url: https://bibbase.org/network/publication/schulz-deichsler-seflagsystematicevaluationframeworkfornlpmodelsanddatasetsinlatinandancientgreek-2024
	- task:
	type: lemmatization
	dataset:
	name: UD-Latin
	type: UD-Latin
	metrics:
	- name: accuracy
	type: accuracy
	value: 88
	source:
	name: SEFLAG
	url: https://bibbase.org/network/publication/schulz-deichsler-seflagsystematicevaluationframeworkfornlpmodelsanddatasetsinlatinandancientgreek-2024
	---
	la_core_web_lg

	- Person or organization developing model: [Patrick J. Burns; with
	Nora Bernhardt \[ner\], Tim Geelhaar \[tagger, morphologizer, parser,
	ner\], Vincent Koch \[ner\]](https://diyclassics.github.io/)

	- Model date: May 2023

	- Model version: 3.7.4

	- Model type: spaCy

	- **Information about training algorithms, parameters, fairness
	constraints or other applied approaches, and features:** For information on the training workflow see p.4-5 of LatinCy: Synthetic Trained Pipelines for Latin NLP
	(https://arxiv.org/pdf/2305.04365v1)

	- Paper or other resource for more information: *Burns, P.J. 2023.
	"LatinCy: Synthetic Trained Pipelines for Latin NLP." arXiv:2305.04365
	\[cs.CL\]. http://arxiv.org/abs/2305.04365.*

	- License: MIT

	- Where to send questions or comments about the model:
	https://diyclassics.github.io/

	Intended Use

	- Primary intended uses: Morphological analysis, POS-Tagging,
	Lemmatizing, Parsing, NER

	- Primary intended users: Classical Scholars

	- Out-of-scope use cases: unknown

	Data, Limitations, and Recommendations

	- Data selection for training: Training data consists of latin
	UD-Treebanks, Wikipedia and OSCAR sentence data, the CC-100 Latin
	dataset and the Herodotos Project NER dataset

	- Data selection for evaluation: Evaluation was done according to the
	spaCy workflow and is documented in the meta.json file found in the
	repository
	(https://huggingface.co./latincy/la_core_web_lg/blob/main/meta.json)

	- Limitations: unknown