Update README.md
Browse files
README.md
CHANGED
@@ -116,11 +116,10 @@ The ICE methodology provides metrics for Usefulness and Functional Correctness a
|
|
116 |
* Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
|
117 |
|
118 |
|
119 |
-
We utilized GPT4 to measure the above metrics and provide a score from 0-4. This is the [test dataset](https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view) and
|
|
|
120 |
|
121 |
-
You can read more about the ICE methodology in this paper.
|
122 |
-
|
123 |
-
[https://openreview.net/pdf?id=RoGZaCsGUW]
|
124 |
|
125 |
|
126 |
| Model Name | Usefulness (0 - 4) | Functional Correctness (0 - 4) |
|
|
|
116 |
* Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
|
117 |
|
118 |
|
119 |
+
We utilized GPT4 to measure the above metrics and provide a score from 0-4. This is the [test dataset](https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view) and
|
120 |
+
[Jupyter notebook](https://colab.research.google.com/drive/1USuNLFxLex-C5tLHYET_nQfpM4ALCbc5?usp=sharing) we used to perform the benchmark.
|
121 |
|
122 |
+
You can read more about the ICE methodology in this [paper](https://openreview.net/pdf?id=RoGZaCsGUW)
|
|
|
|
|
123 |
|
124 |
|
125 |
| Model Name | Usefulness (0 - 4) | Functional Correctness (0 - 4) |
|