arxiv:2308.04823

Evaluating the Generation Capabilities of Large Chinese Language Models

Published on Aug 9, 2023

Authors:

Abstract

This paper presents CG-Eval, the first comprehensive evaluation of the generation capabilities of large Chinese language models across a wide range of academic disciplines. The models' performance was assessed based on their ability to generate accurate and relevant responses to different types of questions in six disciplines, namely, Science and Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical Practitioner Qualification Examination, Judicial Examination, and Certified Public Accountant Examination. This paper also presents Gscore, a composite index derived from the weighted sum of multiple metrics to measure the quality of model's generation against a reference. The test data and test results can be found at http://cgeval.besteasy.com/.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2308.04823 in a model README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2308.04823 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.