performance-improvement

#705
by alozowski HF staff - opened
Open LLM Leaderboard org
No description provided.
Open LLM Leaderboard org

Changes to pyproject.toml:

  • corrected ruff settings to work with VSCode

Changes to src/envs.py:

  • Removed formatted string literal in the print statement, replacing f-string with a plain string for constant messages.

Changes to src/leaderboard/read_evals.py:

  • Class EvalResult:
    • Changed some instance variables to optionally include types or use defaults.
    • Replaced tags default from None to an empty list using field(default_factory=list).
    • Refactored init_from_json_file method to handle the new config structure and use cls instead of self.
    • Extracted result processing into a new method extract_results.
    • Implemented structured error handling and refined the update methods.
  • Functionality:
    • Redefined how request files are selected and validated using Pathlib and added more structured checks.
    • Enhanced error handling across methods with specific exceptions and logging for errors.

❗This is a first commit, I'm going to improve existing functionality in the next commits

Open LLM Leaderboard org

Do we need to see the list of flagged models from src/leaderboard/filter_models.py line 144? More no than yes, so I commented it out

Open LLM Leaderboard org

Key changes for src/leaderboard/read_evals.py:

  • Replaced the method for sorting JSON files based on datetime embedded in their filenames. The new method uses a list of expected datetime formats to parse these strings, and logs an error if none of the formats match, defaulting to a Unix start time for legacy files with incorrect time formats.
  • Introduced error handling during the construction of evaluation results dictionary to log missing keys specifically, improving debugging capabilities.
  • Wrapped the iteration over model files with tqdm for a visual progress indicator during execution.
  • Added handling within the logging scope for tqdm to ensure progress output and log messages don't conflict, improving the clarity of console output during execution.

Notes

My concern is how will tqdm behave in an ephemeral space? I need to check

alozowski changed pull request status to open
Open LLM Leaderboard org

Is this one reviewable? :)

Open LLM Leaderboard org

Aha, you can review it @clefourrier , I’ll appreciate it! :3

Open LLM Leaderboard org

@Wauplin trying to tag you to get the ephemeral space to manifest again XD

Open LLM Leaderboard org

General comments

  • nice system with the exponential backoff
  • cool work on the type hinting
  • careful, you removed some docstrings

Specific comments

src/leaderboard/filter_models

Feel free to remove the "flagged models" log

src/leaderboard/read_evals

  • please revert the change for result_key as the new system with the join is considerably less clear to read/edit if needed
  • truthfulqa and NaNs > could be interesting to set any NaN value to 0, no matter the eval, it will also make the code more readable (but add in comment that it's mostly for truthfulqa)
  • l.79: add a comment to explain the system
  • add the comments back in extract_results
  • nice exception management in update_with_request_file
  • parse_datetime could go in utils
Open LLM Leaderboard org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-705 has been updated.
(This is an automated message.)

Open LLM Leaderboard org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-705 has been updated.
(This is an automated message.)

Open LLM Leaderboard org
edited Apr 30

Don't mind the above commit, it's a WIP one

Open LLM Leaderboard org

It manifested! ( @Wauplin this is so random XD)

Open LLM Leaderboard org

Finished with the changes, I'm ready to merge!

Open LLM Leaderboard org

LGTM!

clefourrier changed pull request status to merged

Sign up or log in to comment