language:
- ja
- en
license: mit
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:16897699
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
datasets:
- sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1
- sentence-transformers/squad
- sentence-transformers/all-nli
- sentence-transformers/trivia-qa
- nthakur/swim-ir-monolingual
- sentence-transformers/miracl
- sentence-transformers/mr-tydi
- hotchpotch/sentence_transformer_japanese
library_name: sentence-transformers
以äžã®æç« ã¯ãããã°èšäºâïžããã®è»¢èŒã§ãã
100åéã§å®çšçãªæç« ãã¯ãã«ãäœãããæ¥æ¬èª StaticEmbedding ãå ¬é
æç« ã®å¯ãã¯ãã«ã¯ãæ å ±æ€çŽ¢ã»æç« å€å¥ã»é¡äŒŒæç« æœåºãªã©ãããŸããŸãªçšéã«äœ¿ãããšãã§ããŸããããããªããæå 端ã®Transformerã¢ãã«ã¯å°ããã¢ãã«ã§ãããšãããCPUã§ã¯é ããå€æé床ãå®çšã§ãªãããšããã°ãã°ã§ãã
ããããªãããå æ¥å ¬éãããTransformerã¢ãã«ãã§ã¯ãªãã StaticEmbeddingã¯ãäŸãã° intfloat/multilingual-e5-small (以äžmE5-small)ãšã®ãã³ãããŒã¯æ¯èŒã§ã¯85%ã®ã¹ã³ã¢ãšããå®çšã§ããæ§èœã§ããã€CPUã§åäœæã«126åé«éã«æãã¯ãã«ãäœæããããšãã§ããããšããé©ãã®é床ã§ãã
ãšããããã§ãæ©éæ¥æ¬èª(ãšè±èª)ã§åŠç¿ãããã¢ãã« sentence-embedding-japanese ãäœæããå ¬éããŸããã
æ¥æ¬èªã®æç« ãã¯ãã«ã®æ§èœãè©äŸ¡ãã JMTEB ã®çµæã¯ä»¥äžã§ãã確ãã« mE5-small ã«ã¯è¥å¹²åã°ãªããŸã§ããã¿ã¹ã¯ã«ãã£ãŠã¯åã£ãŠãããããŸãããä»ã®æ¥æ¬èªbaseãµã€ãºbertã¢ãã«ãããã¹ã³ã¢ãé«ãããšãããããããæäœéå®çšã«éããŠããæ§èœãåºãŠããŸãããæ¬åœã«ãããªã«æ§èœåºãã®ãå®éã«åŠç¿ãããŠã¿ããŸã§åä¿¡åçã§ãããããããã§ããã
Model | Avg(micro) | Retrieval | STS | Classification | Reranking | Clustering | PairClassification |
---|---|---|---|---|---|---|---|
text-embedding-3-small | 69.18 | 66.39 | 79.46 | 73.06 | 92.92 | 51.06 | 62.27 |
multilingual-e5-small | 67.71 | 67.27 | 80.07 | 67.62 | 93.03 | 46.91 | 62.19 |
static-embedding-japanese | 66.66 | 67.92 | 80.16 | 67.96 | 91.87 | 35.83 | 62.37 |
ãªããStaticEmbedding æ¥æ¬èªã¢ãã«åŠç¿ãªã©ã®æè¡çãªããšã¯èšäºã®åŸåã«æžããŠããã®ã§ãèå³ãããæ¹ã¯ã©ããã
å©çšæ¹æ³
å©çšã¯ç°¡åãSentenceTransformer ã䜿ã£ãŠãã€ãã®æ¹æ³ã§æãã¯ãã«ãäœããŸããä»åã¯GPUã䜿ãããCPUã§å®è¡ããŠã¿ãŸãããããªã SentenceTransformer 㯠3.3.1 ã§è©ŠããŠããŸãã
pip install "sentence-transformers>=3.3.1"
from sentence_transformers import SentenceTransformer
model_name = "hotchpotch/static-embedding-japanese"
model = SentenceTransformer(model_name, device="cpu")
query = "çŸå³ããã©ãŒã¡ã³å±ã«è¡ããã"
docs = [
"çŽ æµãªã«ãã§ãè¿æã«ããããèœã¡çããé°å²æ°ã§ãã£ããã§ããããçªéã®åžããã¯å
¬åã®æ¯è²ãèŠãããã ã",
"æ°é®®ãªéä»ãæäŸããåºã§ããå°å
ã®æŒåž«ããçŽæ¥ä»å
¥ããŠããã®ã§é®®åºŠã¯æ矀ã§ãããæç人ã®è
ã確ãã§ãã",
"ãããã¯è¡ãã«ãããã©ãé ããè±éªšã®ååºã ããã¹ãŒããæé«ã ãã麺ã®ç¡¬ãã奜ã¿ã",
"ããããã®äžè¯ãã°ã®åºãæããŠãããããšããããã£ãŒã·ã¥ãŒãæäœãã§æããããŠãžã¥ãŒã·ãŒãªãã ã",
]
embeddings = model.encode([query] + docs)
print(embeddings.shape)
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
print(f"{similarity:.04f}: {docs[i]}")
(5, 1024)
0.1040: çŽ æµãªã«ãã§ãè¿æã«ããããèœã¡çããé°å²æ°ã§ãã£ããã§ããããçªéã®åžããã¯å
¬åã®æ¯è²ãèŠãããã ã
0.2521: æ°é®®ãªéä»ãæäŸããåºã§ããå°å
ã®æŒåž«ããçŽæ¥ä»å
¥ããŠããã®ã§é®®åºŠã¯æ矀ã§ãããæç人ã®è
ã確ãã§ãã
0.4835: ãããã¯è¡ãã«ãããã©ãé ããè±éªšã®ååºã ããã¹ãŒããæé«ã ãã麺ã®ç¡¬ãã奜ã¿ã
0.3199: ããããã®äžè¯ãã°ã®åºãæããŠãããããšããããã£ãŒã·ã¥ãŒãæäœãã§æããããŠãžã¥ãŒã·ãŒãªãã ã
ãã®ããã«ãqueryã«ãããããæç« ã®ã¹ã³ã¢ãé«ããªãããã«èšç®ã§ããŠãŸããããã®äŸæã§ã¯ãäŸãã°BM25ã§ã¯queryã«å«ãŸãããã©ãŒã¡ã³ãã®ãããªçŽæ¥çãªåèªãæç« ã«åºãŠããªããããããŸãããããããããšãé£ããã§ãããã
ãŸãé床ããCPUã§æãã¯ãã«ãäœã£ãæ¹ã¯å°ãªãæç« éã§ãã ãã¶æéãããããªããšããçµéšããããæ¹ãå€ããšæããŸãããStaticEmbedding ã¢ãã«ã§ã¯CPUãããããéããã°äžç¬ã§çµãããšæããŸããããã100åéã
ãªãCPUã§æšè«ãé«éãªã®ïŒ
StaticEmbedding ã¯Transformerã¢ãã«ã§ã¯ãããŸãããã€ãŸãTrasformerã®ç¹åŸŽã§ããã¢ãã³ã·ã§ã³ã®èšç®ãäžåãªãã§ããæç« ã«åºãŠããåèªããŒã¯ã³ã1024次å ã®ããŒãã«ã«ä¿åããŠãæãã¯ãã«ã§ã¯ããã®å¹³åããšã£ãŠããã ãã§ãããªããã¢ãã³ã·ã§ã³ããªãã®ã§ãæèã®ç解ãªã©ã¯ããŠããŸããã
ãŸã PyTorch ã® nn.EmbeddingBag ã䜿ã£ãŠãå šãŠãé£çµããããŒã¯ã³ãšãªãã»ãããæž¡ããŠåŠçããããšã§ãPyTorch ã®æé©åã§é«éãªCPU䞊ååŠçãšã¡ã¢ãªã¢ã¯ã»ã¹ããããŠããããã§ãã
å èšäºã®é床è©äŸ¡çµæã«ãããšCPUã§ã¯mE5-smallãšæ¯ã¹ãŠ126åéãããã§ããã
è©äŸ¡çµæ
JMTEBã§ã®å šãŠã®è©äŸ¡çµæã¯ãã¡ãJSONãã¡ã€ã«ã«èšèŒããŠããŸããJMTEB Leaderboardã§èŠæ¯ã¹ããšãå·®ããããã§ããããJMTEBã®å šäœã®è©äŸ¡çµæã¯ã¢ãã«ãµã€ãºãèãããšãããã¶ãè¯å¥œã§ãããªããJMTEB ã§è©äŸ¡ãããæ¹ã¯ãmr-tidy ã¿ã¹ã¯ã®700äžæç« ã®ãã¯ãã«åã«æéãããªãããã(ã¢ãã«ã«ããããŸããRTX4090ã§1~4æéã»ã©)ãšæããŸãããããStaticEmbeddingsã§ã¯éåžžã«éããRTX4090ã§ã¯çŽ4åã§åŠççµããããšãã§ããŸããã
æ å ±æ€çŽ¢ã§BM25ã®çœ®ãæããã§ãããã?
JMTEBã®äžã®æ å ±æ€çŽ¢ã¿ã¹ã¯ã®Retrievalã®çµæãèŠãŠã¿ãŸããããStaticEmbedding ã§ã¯ mr-tidy ã®é ç®ãèããæªãã§ãããmr-tidyã¯ä»ã®ã¿ã¹ã¯ã«æ¯ã¹ãŠæç« éãå§åçã«å€ã(700äžæç« )ãã€ãŸãæ倧éã®æç« ãæ€çŽ¢ãããããªã¿ã¹ã¯ã§ã¯çµæãæªãå¯èœæ§ãããããã§ããæèãç¡èŠãããåçŽãªããŒã¯ã³ã®å¹³åãªã®ã§ãå¢ããã°å¢ããã»ã©äŒŒãå¹³åã®æç« ãåºãŠãããšãããšãããããçµæã«ããªãåŸããã§ããã
ã®ã§ã倧éã®æç« ã®å ŽåãBM25ãããã ãã¶æ§èœãæªãå¯èœæ§ãããããã§ãããã ãå°ãªãæç« ã§ããã°ãã®åèªããããå°ãªãå Žåã¯ãBM25ãããè¯å¥œãªçµæã«ãªãããšãå€ããã§ããã
ãªãæ å ±æ€çŽ¢ã¿ã¹ã¯ã® jaqket ã®çµæãä»ã®ã¢ãã«ã«å¯ŸããŠãããè¯ãã®ã¯ãJQaRa (dev, unused)ãåŠç¿ããŠãããããšãã£ãŠãé«ãããæãã§è¬ã§ããtest ã®æ å ±ãªãŒã¯ã¯ããŠããªããšã¯æãã®ã§ããâŠã
ã¯ã©ã¹ã¿ãªã³ã°çµæãæªã
ãã¡ãã詳现ã¯è¿œã£ãããŠããŸããããã¹ã³ã¢çã«ã¯ä»ã®ã¢ãã«ãããã ãã¶æªãçµæã§ãããã¯ã©ã¹åé¡ã¿ã¹ã¯ã¯æªããªãã®ã§äžæè°ã§ããåã蟌ã¿ç©ºéããããªã§ãŒã·ã«è¡šçŸåŠç¿ã§äœããã圱é¿ãããã®ã§ããããã
JQaRA, JaCWIR ã§ã®ãªã©ã³ãã³ã°ã¿ã¹ã¯
JQaRA ã®çµæã¯ãã¡ãã
model_names | ndcg@10 | mrr@10 |
---|---|---|
static-embedding-japanese | 0.4704 | 0.6814 |
bm25 | 0.458 | 0.702 |
multilingual-e5-small | 0.4917 | 0.7291 |
JaCWIR ã®çµæã¯ãã¡ãã
model_names | map@10 | hits@10 |
---|---|---|
static-embedding-japanese | 0.7642 | 0.9266 |
bm25 | 0.8408 | 0.9528 |
multilingual-e5-small | 0.869 | 0.97 |
JQaRa è©äŸ¡ã¯ BM25 ããã¯è¥å¹²è¯ããmE5-small ããã¯è¥å¹²äœããJaCWIR 㯠BM25, mE5ããã ãã¶äœãæãã®çµæã«ãªããŸããã
JaCWIR ã¯Webæç« ã®ã¿ã€ãã«ãšæŠèŠæãªã®ã§ãããããã綺éºãªãæç« ã§ã¯ãªãã±ãŒã¹ãå€ããtransformerã¢ãã«ã¯ãã€ãºã«åŒ·ãã®ã§ãåçŽãªããŒã¯ã³å¹³åã®StaticEmbeddingã§ã¯æªãçµæã«ãªãããã§ããBM25ã¯ç¹åŸŽçãªåèªã«ãããããããã®ã§ãJaCWIR ã§ããã€ãºãšãªããããªåèªã¯ã¯ãšãªã«ãããããªããããTransformer ã¢ãã«ãšç«¶äºåã®ããçµæ§è¯ãçµæãæ®ããŸãã
ãã®çµæãããStaticEmbedding 㯠Transformer / BM25 ã«æ¯ã¹ããã€ãºãå€ãå«ãæç« ã®å Žåã¯ã¹ã³ã¢ãæªãå¯èœæ§ããããŸãã
åºå次å ã®åæž
StaticEmbedding ã§åºåããã次å ã¯ãåŠç¿æ¬¡ç¬¬ã§ããä»åäœæãããã®ã¯1024次å ãšããããã®ãµã€ãºã§ãã次å æ°ã倧ãããšãæšè«åŸã®ã¿ã¹ã¯(ã¯ã©ã¹ã¿ãªã³ã°ãæ å ±æ€çŽ¢ãªã©)ã«èšç®ã³ã¹ããããã£ãŠããŸããŸããããããªãããåŠç¿æã«ãããªã§ãŒã·ã«è¡šçŸåŠç¿(Matryoshka Representation Learning(MRL))ãããŠããããã1024次å ãããã«å°ããªæ¬¡å ãžãšç°¡åã«æ¬¡å åæžãã§ããŸãã
MRLã¯ãåŠç¿æã«å é ã®ãã¯ãã«ã»ã©éèŠãªæ¬¡å ãæã£ãŠããããšã§ãäŸãã°1024次å ã§ãå é ã®32,64,128,256...次å ã ãã䜿ã£ãŠåŸããåãæšãŠãã ãã§ãããçšåºŠè¯å¥œãªçµæã瀺ããŠããŸãã
ãã®ã°ã©ãåç §å ã®StaticEmbedding ã®èšäºã«ãããšã128次å ã§91.87%, 256次å ã§95.79%, 512次å ã§98.53%ã®æ§èœãç¶æããŠããããã§ãã粟床ã«ãããŸã§ã·ãã¢ã§ã¯ãªããããã®åŸã®èšç®ã³ã¹ããäžãããå Žåãã¬ããšæ¬¡å åæžããŠäœ¿ãããšããçšéã«ã䜿ãããã§ããã
StaticEmbedding ã¢ãã«ãäœã£ãŠã¿ãŠ
æ£çŽãåçŽãªããŒã¯ã³ã®embeddingsã®å¹³åã§ãããªã«æ§èœåºãã®ãåä¿¡åçã ã£ãã®ã§ãããå®éã«åŠç¿ãããŠã¿ãŠã·ã³ãã«ãªã¢ãŒããã¯ãã£ãªã®ã«æ§èœã®é«ãã«ã³ã£ããããŸãããTransformer å šçã®ãã®æ代ã«ãå€ãè¯ãåèªåã蟌ã¿ã®æŽ»çšã¢ãã«ã§ãå®äžçã§å©æŽ»çšã§ããããªã¢ãã«ã®åºçŸã«é©ããé ããŸããã
CPUã§ã®æšè«é床ãéãæãã¯ãã«äœæã¢ãã«ã¯ãããŒã«ã«CPUç°å¢ã§å€§éã®æç« ã®å€æãªã©ã¯ããšããããšããžããã€ã¹ã ã£ãããããã¯ãŒã¯ãé ã(ãªã¢ãŒãã®æšè«ãµãŒããå©ããªã)ç°å¢ã ã£ãããè²ã ãšæŽ»çšããããããããã§ããã
StaticEmbedding æ¥æ¬èªã¢ãã«åŠç¿ã®ãã¯ãã«ã«ããŒã
ãªãããŸãåŠç¿ã§ããã®ã
StaticEmbedding ã¯éåžžã«ã·ã³ãã«ã§ãæç« ãããŒã¯ãã€ãºããIDã§åèªã®åã蟌ã¿ãã¯ãã«ãæ ŒçŽãããŠããEmbeddingBagããŒãã«ããN次å (ä»åã¯1024次å )ã®ãã¯ãã«ãååŸãããã®å¹³åãåãã ãã§ãã
ãããŸã§ãåèªåã蟌ã¿ãã¯ãã«ãšããã°ãword2vec ã GloVe ã®ããã« Skip-gram ã CBOW ãçšããŠåèªã®åšèŸºãåŠç¿ããŠããŸãããããããStaticEmbedding ã§ã¯æç« å šäœãçšããŠåŠç¿ããŠããŸãããŸããå¯Ÿç §åŠç¿ã䜿ã£ãŠå€§éã®æç« ã巚倧ãããã§åŠç¿ããŠãããè¯ãåèªã®åã蟌ã¿è¡šçŸã®åŠç¿ã«æåããŠããããã§ãã
åŠç¿ããŒã¿ã»ãã
æ¥æ¬èªã¢ãã«åŠç¿ã«ããããå¯Ÿç §åŠç¿ã§å©çšã§ããããŒã¿ã»ãããšããŠã以äžãäœæã䜿çšããŸããã
- hotchpotch/sentence_transformer_japanese
- SentenceTransformer ã§åŠç¿ããããã«ã©ã åãšæ§é ã«æŽãããã®ã§ãã
(anchor, positive)
,(anchor, positive, negative)
,(anchor, positive, negative_1, ..., negative_n)
ãšãã£ãæ§é ã«ãªã£ãŠããŸãã
- 以äžã®ããŒã¿ã»ãããåºã« hotchpotch/sentence_transformer_japanese ãäœæããŸãããæ¯åºŠãªããããŒã¿ã»ããã®äœè
ã®æ¹ã
ã»ãšããã hpprc æ°ã«æè¬ã§ãã
- https://huggingface.co./datasets/hpprc/emb
- https://huggingface.co./datasets/hotchpotch/hpprc_emb-scores ã®ãªã©ã³ã«ãŒã¹ã³ã¢ã䜿çšããpositive(>=0.7) / negative(<=0.3) ã®ãã£ã«ã¿ãªã³ã°ãè¡ããŸããã
- https://huggingface.co./datasets/hpprc/llmjp-kaken
- https://huggingface.co./datasets/hpprc/msmarco-ja
- https://huggingface.co./datasets/hotchpotch/msmarco-ja-hard-negatives ã®ãªã©ã³ã«ãŒã¹ã³ã¢ãçšããŠãpositive(>=0.7) / negative(<=0.3) ã®ãã£ã«ã¿ãªã³ã°ãè¡ããŸããã
- https://huggingface.co./datasets/hpprc/mqa-ja
- https://huggingface.co./datasets/hpprc/llmjp-warp-html
- https://huggingface.co./datasets/hpprc/emb
- SentenceTransformer ã§åŠç¿ããããã«ã©ã åãšæ§é ã«æŽãããã®ã§ãã
- äžèšã®äœæããããŒã¿ã»ããã®äžã§ã以äžã䜿çšããŸããããªããæ
å ±æ€çŽ¢ã匷åãããã£ããããæ
å ±æ€çŽ¢ã«é©ããããŒã¿ã»ããã®ããŒã¿ã¯ãªãŒã®ã¥ã¡ã³ããŒã·ã§ã³ã§ä»¶æ°ãå€ãã«åŠç¿ãããŠããŸãã
- httprc_auto-wiki-nli-triplet
- httprc_auto-wiki-qa
- httprc_auto-wiki-qa-nemotron
- httprc_auto-wiki-qa-pair
- httprc_baobab-wiki-retrieval
- httprc_janli-triplet
- httprc_jaquad
- httprc_jqara
- httprc_jsnli-triplet
- httprc_jsquad
- httprc_miracl
- httprc_mkqa
- httprc_mkqa-triplet
- httprc_mr-tydi
- httprc_nu-mnli-triplet
- httprc_nu-snli-triplet
- httprc_quiz-no-mori
- httprc_quiz-works
- httprc_snow-triplet
- httprc_llmjp-kaken
- httprc_llmjp_warp_html
- httprc_mqa_ja
- httprc_msmarco_ja
- è±èªããŒã¿ã»ããã«ã¯ã以äžã®ããŒã¿ã»ãããå©çšããŠããŸãã
æ¥æ¬èªããŒã¯ãã€ã¶
StaticEmbedding ãåŠç¿ããããã«ã¯ãHuggingFace ã®ããŒã¯ãã€ã¶ã©ã€ãã©ãªã® tokenizer.json 圢åŒã§åŠçå¯èœãªããŒã¯ãã€ã¶ã䜿ããšç°¡åããã ã£ãã®ã§ã hotchpotch/xlm-roberta-japanese-tokenizer ãšããããŒã¯ãã€ã¶ãäœæããŸãããèªåœæ°ã¯ 32,768 ã§ãã
ãã®ããŒã¯ãã€ã¶ã¯ãwikipedia æ¥æ¬èªãwikipedia è±èª(ãµã³ããªã³ã°)ãcc-100(æ¥æ¬èª, ãµã³ããªã³ã°)ã®ããŒã¿ã unidic ã§åå²ããsentencepiece unigram ã§åŠç¿ãããã®ã§ããXLM-Roberta 圢åŒã®æ¥æ¬èªããŒã¯ãã€ã¶ãšããŠãæ©èœããŸããä»åã¯ãã®ããŒã¯ãã€ã¶ãå©çšããŸããã
ãã€ããŒãã©ã¡ãŒã¿
倧å ã®åŠç¿ã³ãŒããšã®å€æŽç¹ãã¡ã¢ã¯ä»¥äžã®éãã§ãã
- batch_size ã倧å
ã® 2048 ãã 6072 ã«èšå®ããŸããã
- å¯Ÿç §åŠç¿ã§å·šå€§ãªããããåŠçãããšããåäžãããå ã«ããžãã£ããšãã¬ãã£ããå«ãŸãããšåŠç¿ã«æªåœ±é¿ãäžããå¯èœæ§ããããŸãããããé²ãããã« BatchSamplers.NO_DUPLICATES ãªãã·ã§ã³ããããŸããããããããããµã€ãºã巚倧ã ãšåäžãããã«å«ããªãããã®ãµã³ããªã³ã°åŠçã«æéããããããšããããŸãã
- ä»åã¯
BatchSamplers.NO_DUPLICATES
ãæå®ããRTX4090 ã® 24GB ã«åãŸã 6072 ã«èšå®ããŸãããããããµã€ãºã¯ããã«å€§ããæ¹ãçµæãè¯ãå¯èœæ§ããããŸãã
- epochæ°ã1ãã2ã«å€æŽããŸãã
- 1ããã2ã®æ¹ãè¯ãçµæã«ãªããŸããããã ããããŒã¿ãµã€ãºããã£ãšå€§ãããã°ã1ã®æ¹ãè¯ãå¯èœæ§ããããŸãã
- ã¹ã±ãžã¥ãŒã©
- æšæºã®linearãããçµéšåã§ããè¯ããšæããcosineã«å€æŽããŸããã
- ãªããã£ãã€ã¶
- æšæºã®AdamW ã®ãŸãŸã§ããadafactorã«å€æŽããå ŽåãåæãæªããªããŸããã
- learning_rate
- 2e-1 ã®ãŸãŸã§ããå€ã巚倧ãããã®ã§ã¯ãªãããšçåã«æããŸããããäœããããšçµæãæªåããŸããã
- dataloader_prefetch_factor=4
- dataloader_num_workers=15
- ããŒã¯ãã€ãºãšããããµã³ãã©ã®ãµã³ããªã³ã°ã«æéããããããã倧ããã«èšå®ããŸããã
åŠç¿ãªãœãŒã¹
- CPU
- Ryzen9 7950X
- GPU
- RTX4090
- memory
- 64GB
ãã®ãã·ã³ãªãœãŒã¹ã§ã®åŠç¿ã«ããã£ãæéã¯çŽ4æéã§ãããGPUã®ã³ã¢è² è·ã¯éåžžã«å°ãããä»ã®transformerã¢ãã«ã§ã¯åŠç¿æã«90%ååŸã§åŒµãä»ãã®ã«å¯ŸããŠãStaticEmbeddingã§ã¯ã»ãšãã©0%ã§ãããããã¯ã巚倧ãªããããGPUã¡ã¢ãªã«è»¢éããæéã倧åãå ããŠããããããšæãããŸãããã®ãããGPUã¡ã¢ãªã®åž¯åå¹ ãéããªãã°ãåŠç¿é床ãããã«åäžããå¯èœæ§ããããŸãã
ãããªãæ§èœåäžãž
ä»åå©çšããããŒã¯ãã€ã¶ã¯StaticEmbeddingåãã«ç¹åãããã®ã§ã¯ãªããããããé©ããããŒã¯ãã€ã¶ã䜿çšããã°æ§èœãåäžããå¯èœæ§ããããŸããããããµã€ãºãããã«å·šå€§åããããšã§ãåŠç¿ã®å®å®æ§ãåäžããæ§èœåäžãèŠèŸŒãããããããŸããã
ãŸããããŸããŸãªãã¡ã€ã³ãåæããŒã¿ã»ãããå©çšãããªã©ãããå¹ åºãæç« ãªãœãŒã¹ãåŠç¿ã«çµã¿èŸŒãããšã§ããããªãæ§èœåäžãæåŸ ã§ããŸãã
倧å ã®åŠç¿ã³ãŒã
åŠç¿ã«äœ¿çšããã³ãŒãã¯ã以äžã§ MIT ã©ã€ã»ã³ã¹ã§å ¬éããŠããŸããã¹ã¯ãªãããå®è¡ããã°åçŸã§ãããã¯ã...!
ã©ã€ã»ã³ã¹
static-embedding-japanese 㯠MIT ã©ã€ã»ã³ã¹ã§å ¬éããŠããŸãã