iterateai
/

Interplay-AppCoder

Text Generation

text-generation-inference

Model card Files Files and versions

iterateai commited on Oct 31, 2023

Commit

042f5d0

·

1 Parent(s): 29c0b27

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -116,11 +116,10 @@ The ICE methodology provides metrics for Usefulness and Functional Correctness a
 * Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
-We utilized GPT4 to measure the above metrics and provide a score from 0-4. This is the [test dataset](https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view) and [Jupyter notebook] (https://colab.research.google.com/drive/1USuNLFxLex-C5tLHYET_nQfpM4ALCbc5?usp=sharing#scrollTo=lNCZTBj1nBsJ) we used to perform the benchmark.
-You can read more about the ICE methodology in this paper.
-[https://openreview.net/pdf?id=RoGZaCsGUW]
 | Model Name	| Usefulness (0 - 4) | Functional Correctness (0 - 4) |

 * Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
+We utilized GPT4 to measure the above metrics and provide a score from 0-4. This is the [test dataset](https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view) and
+[Jupyter notebook](https://colab.research.google.com/drive/1USuNLFxLex-C5tLHYET_nQfpM4ALCbc5?usp=sharing) we used to perform the benchmark.
+You can read more about the ICE methodology in this [paper](https://openreview.net/pdf?id=RoGZaCsGUW)
 | Model Name	| Usefulness (0 - 4) | Functional Correctness (0 - 4) |