Thought I'd share this recent idea and workflow for fun!

Workflow for Automating Model Evaluation and Selection

Step 1. Export CSV Data from Another-LLM-LeaderBoards

Go to our Another-LLM-LeaderBoards and click the export csv data button. Save it to /tmp/models.csv.

Step 2: Examine CSV Data

Run a script for extracting model names, benchmark scores, and model page link from the CSV data.

import re
from huggingface_hub import ModelCard
import pandas as pd

# Load the CSV data
df = pd.read_csv('/tmp/models.csv')

# Sort the data by the second column (assuming the column name is 'Average')
df_sorted = df.sort_values(by='Average', ascending=False)

# Open the file in append mode
with open('configurations.txt', 'a') as file:
    # Get model cards for the top 20 entries
    for index, row in df_sorted.head(20).iterrows():
        model_name = row['Model'].rstrip()
        card = ModelCard.load(model_name)
        file.write(f'Model Name: {model_name}\n')
        file.write(f'Scores: {row["Average"]}\n')  # Assuming 'Average' is the benchmark score
        file.write(f'AGIEval: {row["AGIEval"]}\n')
        file.write(f'GPT4All: {row["GPT4All"]}\n')
        file.write(f'TruthfulQA: {row["TruthfulQA"]}\n')
        file.write(f'Bigbench: {row["Bigbench"]}\n')
        file.write(f'Model Card: {card}\n')

Step 3: Feed the Discovered Models, Scores and Configurations to LLM-client (shell-gpt)

Run your local LLM-client by feeding it all the discovered merged models, their benchmark scores and if found the configurations used to merge them. Provide it with an instruction similar to this:

cat /tmp/configurations2.txt | sgpt --chat config "Based on the merged models that are provided here, along with their respective benchmark achievements and the configurations used in merging them, your task is to come up with a new configuration for a new merged model that will outperform all others. In your thought process, argue and reflect on your own choices to improve your thinking process and outcome"

Step 4: (Optional) Reflect on Initial Configuration Suggested by Chat-GPT

If you wanted to get particularly naughty, you could add a step like this where you make Chat-GPT rethink and reflect on the configuration it initially comes up with based on the information you gave it.

for i in $(seq 1 3); do echo "$i" && sgpt --chat config "Repeat the process from before and again reflect and improve on your suggested configuration"; sleep 20; done

Step 5: Wait for Chat-GPT to give you a LeaderBoard-topping merge configuration

Wait for Chat-GPT to provide a new merge configuration.

Step 6: Enter the Configuration in Automergekit NoteBook

Fire up your automergekit NoteBook and enter in the configuration that was just so generously provided to you by Chat-GPT.

Step 7: Evaluate the New Merge using auto-llm-eval notebook

Fire up your auto-llm-eval notebook to see if the merge that Chat-GPT came up with is actually making any sense and performing well.

Step 8: Repeat the Process

Repeat this process for a few times every day, learning from each new model created.

Step 9: Rank the New Number One Model

Rank the new number one model and top your own LeaderBoard: (Model: CultriX/MergeCeption-7B-v3)

## Step 10: Automate the Process with Cronjob
Create a cronjob that automates this process 5 times every day, only to then learn from the models that it has created in order to create even better ones and I'm telling you that you better prepare yourself for some non-neglectable increases in benchmark scores for the near future.

Cheers,
CultriX

Spaces:
Duplicated from mlabonne/Yet_Another_LLM_Leaderboard

CultriX
/

Alt_LLM_LeaderBoard

Running

Auto-generate winning configurations