Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -339,6 +339,78 @@ def main():
|
|
339 |
* Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
|
340 |
* Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
|
341 |
A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
342 |
''')
|
343 |
|
344 |
|
|
|
339 |
* Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
|
340 |
* Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
|
341 |
A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
|
342 |
+
|
343 |
+
|
344 |
+
# Bonus: Workflow for Automating Model Evaluation and Selection
|
345 |
+
|
346 |
+
## Step 1. Export CSV Data from Another-LLM-LeaderBoards
|
347 |
+
Go to our [Another-LLM-LeaderBoards](https://leaderboards.example.com) and click the export csv data button. Save it to `/tmp/models.csv`.
|
348 |
+
|
349 |
+
## Step 2: Examine CSV Data
|
350 |
+
Run a script for extracting model names, benchmark scores, and model page link from the CSV data.
|
351 |
+
|
352 |
+
```python
|
353 |
+
import re
|
354 |
+
from huggingface_hub import ModelCard
|
355 |
+
import pandas as pd
|
356 |
+
|
357 |
+
# Load the CSV data
|
358 |
+
df = pd.read_csv('/tmp/models.csv')
|
359 |
+
|
360 |
+
# Sort the data by the second column (assuming the column name is 'Average')
|
361 |
+
df_sorted = df.sort_values(by='Average', ascending=False)
|
362 |
+
|
363 |
+
# Open the file in append mode
|
364 |
+
with open('configurations.txt', 'a') as file:
|
365 |
+
# Get model cards for the top 20 entries
|
366 |
+
for index, row in df_sorted.head(20).iterrows():
|
367 |
+
model_name = row['Model'].rstrip()
|
368 |
+
card = ModelCard.load(model_name)
|
369 |
+
file.write(f'Model Name: {model_name}\n')
|
370 |
+
file.write(f'Scores: {row["Average"]}\n') # Assuming 'Average' is the benchmark score
|
371 |
+
file.write(f'AGIEval: {row["AGIEval"]}\n')
|
372 |
+
file.write(f'GPT4All: {row["GPT4All"]}\n')
|
373 |
+
file.write(f'TruthfulQA: {row["TruthfulQA"]}\n')
|
374 |
+
file.write(f'Bigbench: {row["Bigbench"]}\n')
|
375 |
+
file.write(f'Model Card: {card}\n')
|
376 |
+
```
|
377 |
+
|
378 |
+
## Step 3: Feed the Discovered Models, Scores and Configurations to LLM-client (shell-gpt)
|
379 |
+
Run your local LLM-client by feeding it all the discovered merged models, their benchmark scores and if found the configurations used to merge them. Provide it with an instruction similar to this:
|
380 |
+
|
381 |
+
```bash
|
382 |
+
cat /tmp/configurations2.txt | sgpt --chat config "Based on the merged models that are provided here, along with their respective benchmark achievements and the configurations used in merging them, your task is to come up with a new configuration for a new merged model that will outperform all others. In your thought process, argue and reflect on your own choices to improve your thinking process and outcome"
|
383 |
+
```
|
384 |
+
|
385 |
+
## Step 4: (Optional) Reflect on Initial Configuration Suggested by Chat-GPT
|
386 |
+
If you wanted to get particularly naughty, you could add a step like this where you make Chat-GPT rethink and reflect on the configuration it initially comes up with based on the information you gave it.
|
387 |
+
|
388 |
+
```bash
|
389 |
+
for i in $(seq 1 3); do echo "$i" && sgpt --chat config "Repeat the process from before and again reflect and improve on your suggested configuration"; sleep 20; done
|
390 |
+
```
|
391 |
+
|
392 |
+
## Step 5: Wait for Chat-GPT to give you a LeaderBoard-topping merge configuration
|
393 |
+
Wait for Chat-GPT to provide a new merge configuration.
|
394 |
+
|
395 |
+
## Step 6: Enter the Configuration in Automergekit NoteBook
|
396 |
+
Fire up your automergekit NoteBook and enter in the configuration that was just so generously provided to you by Chat-GPT.
|
397 |
+
|
398 |
+
## Step 7: Evaluate the New Merge using auto-llm-eval notebook
|
399 |
+
Fire up your auto-llm-eval notebook to see if the merge that Chat-GPT came up with is actually making any sense and performing well.
|
400 |
+
|
401 |
+
## Step 8: Repeat the Process
|
402 |
+
Repeat this process for a few times every day, learning from each new model created.
|
403 |
+
|
404 |
+
## Step 9: Rank the New Number One Model
|
405 |
+
Rank the new number one model and top your own LeaderBoard: (Model: CultriX/MergeCeption-7B-v3)
|
406 |
+

|
407 |
+
```
|
408 |
+
|
409 |
+
## Step 10: Automate the Process with Cronjob
|
410 |
+
Create a cronjob that automates this process 5 times every day, only to then learn from the models that it has created in order to create even better ones and I'm telling you that you better prepare yourself for some non-neglectable increases in benchmark scores for the near future.
|
411 |
+
|
412 |
+
Cheers,
|
413 |
+
CultriX
|
414 |
''')
|
415 |
|
416 |
|