what is the best chunk size

#9
by hulianxue - opened

considering such application scenario:
i have long-content text which of-course could not be input into embedding model at a time.
so i have to cut text into chunks; embed them; and push embeddings into vector-recall system.

so in order to achieve best recalling performance, what is the best chunk size ?
do you have any experiment on this?
or any suggestion about this according to your training data distribution?

thx!

Sign up or log in to comment