Question about MTEB Benchmark Settings : 'max_seq_length'😭

#25
by george31 - opened

I noticed that in MTEB benchmark implementations(eval_mteb.py), the 'max_seq_length' is set to 512 tokens by default, even the model that support much longer sequences (like 32K tokens).

For example, when benchmarking embedding models with MTEB:

  • Default max_seq_length: 512
  • Actual model capacity: 32K tokens

This seems to potentially underutilize the model's capabilities and might not provide a fair comparison, especially for tasks involving longer documents.

Questions:

  1. Is this a common practice in the industry? If so, what's the rationale behind it?
  2. Wouldn't it be more appropriate to use the model's full sequence length capability for fair benchmarking?
  3. Are there any specific technical or practical reasons why 512 tokens became the de facto standard for MTEB benchmarks?

I'd appreciate any insights from the community on this benchmarking practice.
image.png

george31 changed discussion status to closed
george31 changed discussion status to open
Alibaba-NLP org

This is because most texts on MTEB are shorter than 512 tokens. We have verified that using a larger max length does not yield significantly different results on MTEB compared to setting the max length to 512. Therefore, to reduce testing time, we set the max length to 512.

Sign up or log in to comment