llmware
/

bling-falcon-1b-0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

doberst commited on Nov 28, 2023

Commit

d7582af

·

1 Parent(s): 11dacd8

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -18,14 +18,17 @@ without using any advanced quantization optimizations.
 Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
 Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
---**Accuracy Score**:  **80.25** correct out of 100
---Not Found Classification:  40.0%
---Boolean:  41.5%
---Math/Logic:  7.5%
 --Complex Questions (1-5):  1 (Low)
 --Summarization Quality (1-5):  3 (Coherent, extractive)
 --Hallucinations:  No hallucinations observed in test runs.
 For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).

 Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
 Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
+--**Accuracy Score**:  **89.0** correct out of 100
+--Not Found Classification:  57.5%
+--Boolean:  57.5%
+--Math/Logic:  25%
 --Complex Questions (1-5):  1 (Low)
 --Summarization Quality (1-5):  3 (Coherent, extractive)
 --Hallucinations:  No hallucinations observed in test runs.
+Please note that these scoring results have been updated from the original (upward), as we corrected a small bug in the original test inference script for this model.
+The corrected test results are in the files repo, and have been generated with the test scripts in the repo.
 For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).