Consider including OpenChat 3 models for human evaluation

#2
by imone - opened

OpenChat 3 is based on Llama-2, which is the best 13B model on AlpacaEval GPT-4 instruction evaluation, and greatly outperforms the existing open-source dialogue models. Considering including it in human evaluation?

Sign up or log in to comment