arxiv:2207.13005

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Published on Jul 26, 2022

Upvote

Authors:

Zhenran Xu ,

Zifei Shan ,

Abstract

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2207.13005 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2207.13005 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.