what is the best chunk size

by hulianxue - opened Jul 2, 2024

Jul 2, 2024

considering such application scenario:
i have long-content text which of-course could not be input into embedding model at a time.
so i have to cut text into chunks; embed them; and push embeddings into vector-recall system.

so in order to achieve best recalling performance, what is the best chunk size ?
do you have any experiment on this？
or any suggestion about this according to your training data distribution?

thx!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment