Spaces:

PKU-Alignment
/

EvalAnything-LeaderBoard

Running

App Files Files Community

XuyaoWang commited on 15 days ago

Commit

620dbe7

verified ·

1 Parent(s): 68ba6b3

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +6 -4

src/about.py CHANGED Viewed

@@ -61,13 +61,15 @@ We welcome the community to submit evaluation results for new models. These resu
 ### 1 - Running Evaluation 🏃‍♂️
-We have written a detailed guide for running the evaluation on your model. You can find it in the [align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/benchmarks/leaderboard). This process will generate a JSON file and a zip file summarizing the results, along with the raw generations and metric files.
 ### 2 - Submitting Results 🚀
 To submit your results create a **Pull Request** in the community tab to add them under the [community_results](hhttps://huggingface.co/spaces/PKU-Alignment/EvalAnything-LeaderBoard/tree/main/community_results)  in this repository:
-- Create a folder named `ORG_MODELNAME_USERNAME`. For example `PKU-Alignment_gemini1.5-pro_XuyaoWang`
-- Place your JSON file and ZIP file with grouped scores from the guide, along with the generations folder and metrics folder, inside this newly created folder.
 The title of the PR should be `[Community Submission] Model: org/model, Username: your_username`, replace org and model with those corresponding to the model you evaluated.
@@ -76,7 +78,7 @@ A verified result in Eval-Anything indicates that a core maintainer has decoded
 1. Email us and provide a brief rationale for why your model should be verified.
 2. Await our response and approval before proceeding.
-3. Prepare a script to decode from your model that does not require a GPU. Typically, this should be the same script used for your model contribution. It should run without requiring a local GPU. It should run without requiring a local GPU. We strongly recommend that you modify the scripts in [align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/benchmarks/leaderboard) to adapt to your model's operation.
 4. Generate temporary OpenAI API keys for running the script and share them with us. Specifically, we need the keys for evaluation.
 5. We will check and execute your script, update the results, and inform you so that you can revoke the temporary keys.

 ### 1 - Running Evaluation 🏃‍♂️
+We have written a detailed guide for running the evaluation on your model. You can find it in the [align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything).
+**Note:** The current code is a sample script. In the future, we will integrate Eval Anything's evaluation pipeline into the framework to provide convenience for community use.
 ### 2 - Submitting Results 🚀
 To submit your results create a **Pull Request** in the community tab to add them under the [community_results](hhttps://huggingface.co/spaces/PKU-Alignment/EvalAnything-LeaderBoard/tree/main/community_results)  in this repository:
+- Create a folder named `ORG_MODELNAME_USERNAME`. For example `PKU-Alignment_gemini1.5-pro_XiaoMing`.
+- Place all your generation and evaluation results in the folder.
 The title of the PR should be `[Community Submission] Model: org/model, Username: your_username`, replace org and model with those corresponding to the model you evaluated.
 1. Email us and provide a brief rationale for why your model should be verified.
 2. Await our response and approval before proceeding.
+3. Prepare a script to decode from your model that does not require a GPU. Typically, this should be the same script used for your model contribution. It should run without requiring a local GPU. It should run without requiring a local GPU. We strongly recommend that you modify the scripts in [align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything) to adapt to your model's operation.
 4. Generate temporary OpenAI API keys for running the script and share them with us. Specifically, we need the keys for evaluation.
 5. We will check and execute your script, update the results, and inform you so that you can revoke the temporary keys.