SerialKicked commited on
Commit
7e65b3f
1 Parent(s): f6fde87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -35,7 +35,7 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
35
 
36
  ### DoggoEval
37
 
38
- The goal of this test featuring Rex (a dog), and his master (EsKa) is to determine if a model is good at obeying a system prompt and character card. The trick being that dogs can't talk, but LLM love to.
39
 
40
  - [Results and discussions are hosted in this thread](https://huggingface.co/SerialKicked/ModelTestingBed/discussions/1) ([old thread here](https://huggingface.co/LWDCLS/LLM-Discussions/discussions/13))
41
  - [Files, cards and settings can be found here](https://huggingface.co/SerialKicked/ModelTestingBed/tree/main/DoggoEval)
@@ -51,6 +51,6 @@ TODO: The goal of this test is to check if a model is able of following a very s
51
 
52
  # Limitations
53
 
54
- I'm testing for things I'm interested in. Do not ask for ERP-specific tests. I do not pretend any of this is very scientific or accurate: as much as I try to reduce the amount of variables, a small LLM is still a small LLM at the end of the day. The results for other seeds, or with the smallest of change, are bound to give very different results.
55
 
56
  I usually give the different models I'm testing a fair shake in a more casual settings. I regen tons of outputs with random seeds, and while there are (large) variations, it tends to even out to the results shown in testing. Otherwise I'll make a note of it.
 
35
 
36
  ### DoggoEval
37
 
38
+ The goal of this test, featuring a dog (Rex) and his owner (EsKa), is to determine if a model is good at obeying a system prompt and character card. The trick being that dogs can't talk, but LLM love to.
39
 
40
  - [Results and discussions are hosted in this thread](https://huggingface.co/SerialKicked/ModelTestingBed/discussions/1) ([old thread here](https://huggingface.co/LWDCLS/LLM-Discussions/discussions/13))
41
  - [Files, cards and settings can be found here](https://huggingface.co/SerialKicked/ModelTestingBed/tree/main/DoggoEval)
 
51
 
52
  # Limitations
53
 
54
+ I'm testing for things I'm interested in. I do not pretend any of this is very scientific or accurate: as much as I try to reduce the amount of variables, a small LLM is still a small LLM at the end of the day. The results for other seeds, or with the smallest of change, are bound to give very different results.
55
 
56
  I usually give the different models I'm testing a fair shake in a more casual settings. I regen tons of outputs with random seeds, and while there are (large) variations, it tends to even out to the results shown in testing. Otherwise I'll make a note of it.