Spaces:
Running
Auto-generate winning configurations
Thought I'd share this recent idea and workflow for fun!
Workflow for Automating Model Evaluation and Selection
Step 1. Export CSV Data from Another-LLM-LeaderBoards
Go to our Another-LLM-LeaderBoards and click the export csv data button. Save it to /tmp/models.csv
.
Step 2: Examine CSV Data
Run a script for extracting model names, benchmark scores, and model page link from the CSV data.
import re
from huggingface_hub import ModelCard
import pandas as pd
# Load the CSV data
df = pd.read_csv('/tmp/models.csv')
# Sort the data by the second column (assuming the column name is 'Average')
df_sorted = df.sort_values(by='Average', ascending=False)
# Open the file in append mode
with open('configurations.txt', 'a') as file:
# Get model cards for the top 20 entries
for index, row in df_sorted.head(20).iterrows():
model_name = row['Model'].rstrip()
card = ModelCard.load(model_name)
file.write(f'Model Name: {model_name}\n')
file.write(f'Scores: {row["Average"]}\n') # Assuming 'Average' is the benchmark score
file.write(f'AGIEval: {row["AGIEval"]}\n')
file.write(f'GPT4All: {row["GPT4All"]}\n')
file.write(f'TruthfulQA: {row["TruthfulQA"]}\n')
file.write(f'Bigbench: {row["Bigbench"]}\n')
file.write(f'Model Card: {card}\n')
Step 3: Feed the Discovered Models, Scores and Configurations to LLM-client (shell-gpt)
Run your local LLM-client by feeding it all the discovered merged models, their benchmark scores and if found the configurations used to merge them. Provide it with an instruction similar to this:
cat /tmp/configurations2.txt | sgpt --chat config "Based on the merged models that are provided here, along with their respective benchmark achievements and the configurations used in merging them, your task is to come up with a new configuration for a new merged model that will outperform all others. In your thought process, argue and reflect on your own choices to improve your thinking process and outcome"
Step 4: (Optional) Reflect on Initial Configuration Suggested by Chat-GPT
If you wanted to get particularly naughty, you could add a step like this where you make Chat-GPT rethink and reflect on the configuration it initially comes up with based on the information you gave it.
for i in $(seq 1 3); do echo "$i" && sgpt --chat config "Repeat the process from before and again reflect and improve on your suggested configuration"; sleep 20; done
Step 5: Wait for Chat-GPT to give you a LeaderBoard-topping merge configuration
Wait for Chat-GPT to provide a new merge configuration.
Step 6: Enter the Configuration in Automergekit NoteBook
Fire up your automergekit NoteBook and enter in the configuration that was just so generously provided to you by Chat-GPT.
Step 7: Evaluate the New Merge using auto-llm-eval notebook
Fire up your auto-llm-eval notebook to see if the merge that Chat-GPT came up with is actually making any sense and performing well.
Step 8: Repeat the Process
Repeat this process for a few times every day, learning from each new model created.
Step 9: Rank the New Number One Model
Rank the new number one model and top your own LeaderBoard: (Model: CultriX/MergeCeption-7B-v3)
## Step 10: Automate the Process with Cronjob
Create a cronjob that automates this process 5 times every day, only to then learn from the models that it has created in order to create even better ones and I'm telling you that you better prepare yourself for some non-neglectable increases in benchmark scores for the near future.
Cheers,
CultriX