---
language:
- yue
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:129371
- loss:CachedGISTEmbedLoss
base_model: hon9kon9ize/bert-large-cantonese-sts
widget:
- source_sentence: 'query: is ampulla of vater part of the pancreas'
sentences:
- 'document: Ampulla of Vater The ampulla of Vater, also known as the hepatopancreatic
ampulla or the hepatopancreatic duct, is formed by the union of the pancreatic
duct and the common bile duct. The ampulla is specifically located at the major
duodenal papilla.'
- 'document: 抗凝加化疗;化疗'
- 'document: Daylight saving time in Australia Daylight saving was first used in
Australia during World War I, and was applied in all states. It was used again
during the Second World War. A drought in Tasmania in 1967 led to the reintroduction
of daylight saving in that state during the summer, and this was repeated every
summer since then. In 1971, New South Wales, Victoria,[16] Queensland, South Australia,
and the Australian Capital Territory followed Tasmania by observing daylight saving.
Western Australia and the Northern Territory did not. Queensland abandoned daylight
saving time in 1972.[17]'
- source_sentence: 'query: henry''s law states that the solubility of a gas in a liquid'
sentences:
- 'document: Henry''s law In chemistry, Henry''s law is a gas law that states that
the amount of dissolved gas is proportional to its partial pressure in the gas
phase. The proportionality factor is called the Henry''s law constant. It was
formulated by the English chemist William Henry, who studied the topic in the
early 19th century. In his publication about the quantity of gases absorbed by
water,[1] he described the results of his experiments:'
- 'document: Saint Stephen''s Day Saint Stephen''s Day, or the Feast of Saint Stephen,
is a Christian saint''s day to commemorate Saint Stephen, the first Christian
martyr or protomartyr, celebrated on 26 December in the Latin Church and 27 December
in Eastern Christianity. The Eastern Orthodox Church adheres to the Julian calendar
and mark Saint Stephen''s Day on 27 December according to that calendar, which
places it on 9 January of the Gregorian calendar used in secular contexts. In
Latin Christian denominations, Saint Stephen''s Day marks the second day of Christmastide.[1][2]'
- 'document: American Revolutionary War The American Revolutionary War (1775–1783),
also known as the American War of Independence,[40] was a global war that began
as a conflict between Great Britain and its Thirteen Colonies which declared independence
as the United States of America.[N 1]'
- source_sentence: 'query: what is the plot of american horror story hotel'
sentences:
- 'document: American Horror Story: Hotel The plot centers around the enigmatic
Hotel Cortez in Los Angeles, California, that catches the eye of an intrepid homicide
detective (Bentley). The Cortez is host to the strange and bizarre, spearheaded
by its owner, The Countess (Gaga), who is a bloodsucking fashionista. The hotel
is loosely based on an actual hotel built in 1893 by H. H. Holmes in Chicago,
Il. for the 1893 World''s Columbian Exposition. It became known as the ''Murder
Castle'' as it was built for Holmes to torture, murder, and dispose of evidence
just as is the Cortez. This season features two murderous threats in the form
of the Ten Commandments Killer, a serial offender who selects his victims in accordance
with biblical teachings, and "the Addiction Demon", who roams the hotel armed
with a drill bit dildo.'
- 'document: Book of Job Rabbinic tradition ascribes the authorship of Job to Moses,
but scholars generally agree that it was written between the 7th and 4th centuries
BCE, with the 6th century BCE as the most likely period for various reasons.[17]
The anonymous author was almost certainly an Israelite, although he has set his
story outside Israel, in southern Edom or northern Arabia, and makes allusion
to places as far apart as Mesopotamia and Egypt.[18] According to the 6th-century
BCE prophet Ezekiel, Job was a man of antiquity renowned for his righteousness,[19]
and the book''s author has chosen this legendary hero for his parable.[20]'
- 'document: Galešnjak Galešnjak (also called Island of Love, Lover''s Island, Otok
za zaljubljene) is located in the Pašman channel of the Adriatic, between the
islands of Pašman and the town of Turanj on mainland Croatia. It is one of the
world''s few naturally occurring heart-shaped objects such as the Heart Reef in
the Whitsundays.'
- source_sentence: 'query: what historical event inspired wollstonecraft''s book a
vindication of the rights of woman'
sentences:
- 'document: 銀河嘅獨特外形自古以嚟就引起人類嘅幻想。例如中國就有牛郎織女嘅故事,相傳身為人類嘅牛郎同身為仙女嘅織女相遇並且墮入愛河,但因為人仙相戀犯天規而俾天界阻止,王母娘娘變條銀河出嚟分隔佢哋,限佢哋淨係喺每年嘅農曆七月初七先可以喺條鵲橋上面相會-呢個傳說就係傳統節日七姐誕嘅起源。'
- 'document: Rock Star (2001 film) The singing voice for Wahlberg''s character was
provided by Steelheart frontman Miljenko Matijevic for the Steel Dragon Songs,
the final number was dubbed by Brian Vander Ark. Jeff Scott Soto (of Talisman,
Yngwie Malmsteen, Soul SirkUS, and Journey) provided the voice of the singer Wahlberg''s
character replaces. Kennedy is the only actor whose actual voice is used.[citation
needed]. Ralph Saenz (Steel Panther) also appears briefly, as the singer auditioning
ahead of Chris at the studio.'
- 'document: A Vindication of the Rights of Woman Wollstonecraft was prompted to
write the Rights of Woman after reading Charles Maurice de Talleyrand-Périgord''s
1791 report to the French National Assembly, which stated that women should only
receive a domestic education; she used her commentary on this specific event to
launch a broad attack against sexual double standards and to indict men for encouraging
women to indulge in excessive emotion. Wollstonecraft wrote the Rights of Woman
hurriedly to respond directly to ongoing events; she intended to write a more
thoughtful second volume but died before completing it.'
- source_sentence: 'query: when did england change from fahrenheit to celsius'
sentences:
- 'document: Periodic table Importantly, the organization of the periodic table
can be utilized to derive relationships between various element properties, but
also predicted chemical properties and behaviours of undiscovered or newly synthesized
elements. Russian chemist Dmitri Mendeleev was first to publish a recognizable
periodic table in 1869, developed mainly to illustrate periodic trends of the
then-known elements. He also predicted some properties of unidentified elements
that were expected to fill gaps within this table. Most of his forecasts proved
to be correct. Mendeleev''s idea has been slowly expanded and refined with the
discovery or synthesis of further new elements and by developing new theoretical
models to explain chemical behaviour. The modern periodic table now provides a
useful framework for analyzing chemical reactions, and continues to be widely
adopted in chemistry, nuclear physics and other sciences.'
- 'document: How to Train Your Dragon (franchise) The How to Train Your Dragon franchise
from DreamWorks Animation consists of two feature films How to Train Your Dragon
(2010) and How to Train Your Dragon 2 (2014), with a third feature film, How to
Train Your Dragon: The Hidden World, set for a 2019 release. The franchise is
inspired by the British book series of the same name by Cressida Cowell. The franchise
also consists of four short films: Legend of the Boneknapper Dragon (2010), Book
of Dragons (2011), Gift of the Night Fury (2011) and Dawn of the Dragon Racers
(2014). A television series following the events of the first film, Dragons: Riders
of Berk, began airing on Cartoon Network in September 2012. Its second season
was renamed Dragons: Defenders of Berk. Set several years later, and as a more
immediate prequel to the second film, a new television series, titled Dragons:
Race to the Edge, aired on Netflix in June 2015.[1] The second season of the show
was added to Netflix in January 2016 and a third season in June 2016. A fourth
season aired on Netflix in February 2017, a fifth season in August 2017, and a
sixth and final season on February 16, 2018.'
- 'document: Metrication in the United Kingdom Adopting the metric system was discussed
in Parliament as early as 1818 and some industries and even some government agencies
had metricated, or were in the process of metricating by the mid 1960s. A formal
government policy to support metrication was agreed by 1965. This policy, initiated
in response to requests from industry, was to support voluntary metrication, with
costs picked up where they fell. In 1969 the government created the Metrication
Board as a quango to promote and coordinate metrication. In 1978, after some carpet
retailers reverted to pricing by the square yard rather than the square metre,
government policy shifted, and they started issuing orders making metrication
mandatory in certain sectors. In 1980 government policy shifted again to prefer
voluntary metrication, and the Metrication Board was abolished. By the time the
Metrication Board was wound up, all the economic sectors that fell within its
remit except road signage and parts of the retail trade sector had metricated.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: Bert base fine-tuned with Cantonese and English mixed STS dataset
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoClimateFEVER
type: NanoClimateFEVER
metrics:
- type: cosine_accuracy@1
value: 0.06
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.2
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.22
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.26
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.06
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.06666666666666667
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.05200000000000001
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.032
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.035
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.105
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.12666666666666665
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.14400000000000002
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.10738523976006756
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.12305555555555553
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.08386746046821102
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoDBPedia
type: NanoDBPedia
metrics:
- type: cosine_accuracy@1
value: 0.1
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.26
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.44
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.52
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.1
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.12666666666666665
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.15200000000000002
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.154
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.005776685612719247
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.025711996601987995
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.04879480020144454
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.08175565470928514
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.1564753058784049
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.22302380952380954
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.08481993410477483
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoFEVER
type: NanoFEVER
metrics:
- type: cosine_accuracy@1
value: 0.06
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.12
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.06
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.03333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.02
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.012000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.05
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.09
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.09
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.11
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.07804424038166692
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.07533333333333334
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.07658274436198606
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoFiQA2018
type: NanoFiQA2018
metrics:
- type: cosine_accuracy@1
value: 0.12
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.22
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.26
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.36
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.12
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.07999999999999999
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.064
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.046000000000000006
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.07085714285714287
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.13621428571428573
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.14993650793650792
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.21193650793650792
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.15989208858068493
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.18794444444444444
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.1278932041519149
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoHotpotQA
type: NanoHotpotQA
metrics:
- type: cosine_accuracy@1
value: 0.18
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.38
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.4
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.44
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.18
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.13333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.084
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.05200000000000001
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.09
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.2
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.21
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.26
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.21524243911000313
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.2793333333333333
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.16949818775802034
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoMSMARCO
type: NanoMSMARCO
metrics:
- type: cosine_accuracy@1
value: 0.08
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.16
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.2
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.24
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.08
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.05333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.04
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.024000000000000004
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.08
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.16
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.2
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.24
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.155021218726892
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.12816666666666665
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.14387227309213746
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoNFCorpus
type: NanoNFCorpus
metrics:
- type: cosine_accuracy@1
value: 0.1
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.12
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.18
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.1
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.06
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.05600000000000001
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.042
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.0023944899556066555
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.004511202133435534
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.005335271278326478
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.006887081773042016
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.0513758550014842
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.11271428571428571
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.011178329865269043
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoNQ
type: NanoNQ
metrics:
- type: cosine_accuracy@1
value: 0.12
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.26
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.38
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.44
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.12
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.08666666666666666
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.08
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.04800000000000001
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.11
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.24
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.37
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.43
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.26691470842049086
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.21954761904761902
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.22127704921258506
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoQuoraRetrieval
type: NanoQuoraRetrieval
metrics:
- type: cosine_accuracy@1
value: 0.56
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.66
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.68
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.56
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.25333333333333335
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.16
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.092
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.49
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.6073333333333334
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.634
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7406666666666666
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6315714749064664
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6265555555555555
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6007758177607536
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoSCIDOCS
type: NanoSCIDOCS
metrics:
- type: cosine_accuracy@1
value: 0.06
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.12
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.14
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.22
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.06
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.05333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.036000000000000004
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.026000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.015666666666666666
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.03666666666666667
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.04066666666666666
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.05666666666666667
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.05444580189319236
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.10085714285714287
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.03825732082321992
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoArguAna
type: NanoArguAna
metrics:
- type: cosine_accuracy@1
value: 0.12
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.34
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.52
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.64
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.12
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.11333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.10400000000000001
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.064
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.12
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.34
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.52
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.64
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.36676045848370026
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.2815
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.28967419376346565
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoSciFact
type: NanoSciFact
metrics:
- type: cosine_accuracy@1
value: 0.18
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.22
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.32
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.36
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.18
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.07999999999999999
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.068
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.04
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.165
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.21
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.3
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.345
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.24854556538285397
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.22416666666666665
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.23077037853195492
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoTouche2020
type: NanoTouche2020
metrics:
- type: cosine_accuracy@1
value: 0.3469387755102041
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7142857142857143
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.7959183673469388
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9387755102040817
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.3469387755102041
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.32653061224489793
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.30612244897959184
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.2714285714285714
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.01725883684742171
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.06000832753846316
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.10128699807186763
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.17048580946181527
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.29344650277463163
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.5436912860382248
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.18279928418932134
name: Cosine Map@100
- task:
type: nano-beir
name: Nano BEIR
dataset:
name: NanoBEIR mean
type: NanoBEIR_mean
metrics:
- type: cosine_accuracy@1
value: 0.1605337519623234
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.2872527472527473
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.35199372056514916
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.42452119309262165
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.1605337519623234
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.11281004709576138
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.0940094191522763
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0694945054945055
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.09630414014919672
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.17041890861447484
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.21512976237088305
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.2644152605549218
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.21424006917696453
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.2404530537489721
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.17394355216027804
name: Cosine Map@100
---
# Bert base fine-tuned with Cantonese and English mixed STS dataset
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [hon9kon9ize/bert-large-cantonese-sts](https://huggingface.co./hon9kon9ize/bert-large-cantonese-sts). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [hon9kon9ize/bert-large-cantonese-sts](https://huggingface.co./hon9kon9ize/bert-large-cantonese-sts)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
- **Language:** yue
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co./models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("hon9kon9ize/yue-embed")
# Run inference
sentences = [
'query: when did england change from fahrenheit to celsius',
'document: Metrication in the United Kingdom Adopting the metric system was discussed in Parliament as early as 1818 and some industries and even some government agencies had metricated, or were in the process of metricating by the mid 1960s. A formal government policy to support metrication was agreed by 1965. This policy, initiated in response to requests from industry, was to support voluntary metrication, with costs picked up where they fell. In 1969 the government created the Metrication Board as a quango to promote and coordinate metrication. In 1978, after some carpet retailers reverted to pricing by the square yard rather than the square metre, government policy shifted, and they started issuing orders making metrication mandatory in certain sectors. In 1980 government policy shifted again to prefer voluntary metrication, and the Metrication Board was abolished. By the time the Metrication Board was wound up, all the economic sectors that fell within its remit except road signage and parts of the retail trade sector had metricated.',
"document: Periodic table Importantly, the organization of the periodic table can be utilized to derive relationships between various element properties, but also predicted chemical properties and behaviours of undiscovered or newly synthesized elements. Russian chemist Dmitri Mendeleev was first to publish a recognizable periodic table in 1869, developed mainly to illustrate periodic trends of the then-known elements. He also predicted some properties of unidentified elements that were expected to fill gaps within this table. Most of his forecasts proved to be correct. Mendeleev's idea has been slowly expanded and refined with the discovery or synthesis of further new elements and by developing new theoretical models to explain chemical behaviour. The modern periodic table now provides a useful framework for analyzing chemical reactions, and continues to be widely adopted in chemistry, nuclear physics and other sciences.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Datasets: `NanoClimateFEVER`, `NanoDBPedia`, `NanoFEVER`, `NanoFiQA2018`, `NanoHotpotQA`, `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoQuoraRetrieval`, `NanoSCIDOCS`, `NanoArguAna`, `NanoSciFact` and `NanoTouche2020`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
|:--------------------|:-----------------|:------------|:----------|:-------------|:-------------|:------------|:-------------|:-----------|:-------------------|:------------|:------------|:------------|:---------------|
| cosine_accuracy@1 | 0.06 | 0.1 | 0.06 | 0.12 | 0.18 | 0.08 | 0.1 | 0.12 | 0.56 | 0.06 | 0.12 | 0.18 | 0.3469 |
| cosine_accuracy@3 | 0.2 | 0.26 | 0.1 | 0.22 | 0.38 | 0.16 | 0.1 | 0.26 | 0.66 | 0.12 | 0.34 | 0.22 | 0.7143 |
| cosine_accuracy@5 | 0.22 | 0.44 | 0.1 | 0.26 | 0.4 | 0.2 | 0.12 | 0.38 | 0.68 | 0.14 | 0.52 | 0.32 | 0.7959 |
| cosine_accuracy@10 | 0.26 | 0.52 | 0.12 | 0.36 | 0.44 | 0.24 | 0.18 | 0.44 | 0.8 | 0.22 | 0.64 | 0.36 | 0.9388 |
| cosine_precision@1 | 0.06 | 0.1 | 0.06 | 0.12 | 0.18 | 0.08 | 0.1 | 0.12 | 0.56 | 0.06 | 0.12 | 0.18 | 0.3469 |
| cosine_precision@3 | 0.0667 | 0.1267 | 0.0333 | 0.08 | 0.1333 | 0.0533 | 0.06 | 0.0867 | 0.2533 | 0.0533 | 0.1133 | 0.08 | 0.3265 |
| cosine_precision@5 | 0.052 | 0.152 | 0.02 | 0.064 | 0.084 | 0.04 | 0.056 | 0.08 | 0.16 | 0.036 | 0.104 | 0.068 | 0.3061 |
| cosine_precision@10 | 0.032 | 0.154 | 0.012 | 0.046 | 0.052 | 0.024 | 0.042 | 0.048 | 0.092 | 0.026 | 0.064 | 0.04 | 0.2714 |
| cosine_recall@1 | 0.035 | 0.0058 | 0.05 | 0.0709 | 0.09 | 0.08 | 0.0024 | 0.11 | 0.49 | 0.0157 | 0.12 | 0.165 | 0.0173 |
| cosine_recall@3 | 0.105 | 0.0257 | 0.09 | 0.1362 | 0.2 | 0.16 | 0.0045 | 0.24 | 0.6073 | 0.0367 | 0.34 | 0.21 | 0.06 |
| cosine_recall@5 | 0.1267 | 0.0488 | 0.09 | 0.1499 | 0.21 | 0.2 | 0.0053 | 0.37 | 0.634 | 0.0407 | 0.52 | 0.3 | 0.1013 |
| cosine_recall@10 | 0.144 | 0.0818 | 0.11 | 0.2119 | 0.26 | 0.24 | 0.0069 | 0.43 | 0.7407 | 0.0567 | 0.64 | 0.345 | 0.1705 |
| **cosine_ndcg@10** | **0.1074** | **0.1565** | **0.078** | **0.1599** | **0.2152** | **0.155** | **0.0514** | **0.2669** | **0.6316** | **0.0544** | **0.3668** | **0.2485** | **0.2934** |
| cosine_mrr@10 | 0.1231 | 0.223 | 0.0753 | 0.1879 | 0.2793 | 0.1282 | 0.1127 | 0.2195 | 0.6266 | 0.1009 | 0.2815 | 0.2242 | 0.5437 |
| cosine_map@100 | 0.0839 | 0.0848 | 0.0766 | 0.1279 | 0.1695 | 0.1439 | 0.0112 | 0.2213 | 0.6008 | 0.0383 | 0.2897 | 0.2308 | 0.1828 |
#### Nano BEIR
* Dataset: `NanoBEIR_mean`
* Evaluated with [NanoBEIREvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.NanoBEIREvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.1605 |
| cosine_accuracy@3 | 0.2873 |
| cosine_accuracy@5 | 0.352 |
| cosine_accuracy@10 | 0.4245 |
| cosine_precision@1 | 0.1605 |
| cosine_precision@3 | 0.1128 |
| cosine_precision@5 | 0.094 |
| cosine_precision@10 | 0.0695 |
| cosine_recall@1 | 0.0963 |
| cosine_recall@3 | 0.1704 |
| cosine_recall@5 | 0.2151 |
| cosine_recall@10 | 0.2644 |
| **cosine_ndcg@10** | **0.2142** |
| cosine_mrr@10 | 0.2405 |
| cosine_map@100 | 0.1739 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 129,371 training samples
* Columns: query
and answer
* Approximate statistics based on the first 1000 samples:
| | query | answer |
|:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
query: hotel and restaurant employees and bartenders international union
| document: Hotel Employees and Restaurant Employees Union The Hotel Employees and Restaurant Employees Union (HERE) was a United States labor union representing workers of the hospitality industry, formed in 1891. In 2004, HERE merged with the Union of Needletrades, Industrial, and Textile Employees (UNITE) to form UNITE HERE. HERE notably organized the staff of Yale University in 1984. Other major employers that contracted with this union included several large casinos (Harrah's, Caesars Palace, and Wynn Resorts); hotels (Hilton, Hyatt and Starwood), and Walt Disney World. HERE was affiliated with the AFL-CIO.
|
| query: 多肢离断伤的并发症是什么?
| document: 失血性休克;血循环危象;急性肾功能衰竭
|
| query: who is the father of kelly taylor's son on 90210
| document: Kelly Taylor (90210) In 2008, Kelly Taylor returned in the spin-off 90210, now working as a guidance counselor at her alma mater West Beverly Hills High School. It was revealed that in the intervening years, she attained a master's degree and had a son named Sammy with Dylan. She and Dylan ended their relationship soon after. It was also revealed that West Beverly principal Harry Wilson was Kelly's neighbor growing up.[39]
|
* Loss: [CachedGISTEmbedLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
```json
{'guide': SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
), 'temperature': 0.01}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 1,000 evaluation samples
* Columns: query
and answer
* Approximate statistics based on the first 1000 samples:
| | query | answer |
|:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details | query: 微创经皮肾镜手术的推荐药有些什么?
| document: 阿司匹林
|
| query: why are the fires in ca called the thomas fires
| document: Thomas Fire On December 4, 2017, the Thomas Fire was reported at 6:26 p.m. PST,[36] to the north of Santa Paula, near Steckel Park and Thomas Aquinas College,[3][24] after which the fire is named.[37] That night, the small brush fire exploded in size and raced through the rugged mountain terrain that lies west of Santa Paula, between Ventura and Ojai.[19][38] Officials blamed strong Santa Ana winds that gusted up to 60 miles per hour (97 km/h) for the sudden expansion.[28][39] Soon after the fire had started, a second blaze was ignited nearly 30 minutes later, about 4 miles (6.4 km) to the north in Upper Ojai at the top of Koenigstein Road.[40] According to eyewitnesses, this second fire was sparked by an explosion in the power line over the area. The second fire was rapidly expanded by the strong Santa Ana winds, and soon merged into the Thomas Fire later that night.[40]
|
| query: which mountain man rediscovered south pass and brought back important information about this trail
| document: Jedediah Smith Jedediah Strong Smith (January 6, 1799 – May 27, 1831), was a clerk, frontiersman, hunter, trapper, author, cartographer, and explorer of the Rocky Mountains, the North American West, and the Southwest during the early 19th century. After 75 years of obscurity following his death, Smith was rediscovered as the American whose explorations led to the use of the 20-mile (32 km)-wide South Pass as the dominant point of crossing the Continental Divide for pioneers on the Oregon Trail.
|
* Loss: [CachedGISTEmbedLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
```json
{'guide': SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
), 'temperature': 0.01}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `learning_rate`: 2e-05
- `num_train_epochs`: 2
- `warmup_ratio`: 0.05
- `seed`: 12
- `bf16`: True
- `prompts`: {'query': 'query: ', 'answer': 'document: '}
- `batch_sampler`: no_duplicates
#### All Hyperparameters