language: | |
- zh | |
# Word2vec | |
## Train script | |
https://github.com/zake7749/word2vec-tutorial | |
## Dataset | |
wiki-zh_tw 2022/12 | |
## Use | |
```python | |
import gensim | |
import jieba | |
import numpy as np | |
sentence = "今天天氣真好" | |
words = jieba.cut(sentence, cut_all=False) | |
model = gensim.models.KeyedVectors.load(str(Path(path, "model.kv"))) | |
word_vec = list() | |
for word in words: | |
word_vec.append(model.get_vector(word, norm=True)) | |
sentence_vec = np.array(word_vec, dtype="float32").mean(0) | |
``` |