Spaces:
Runtime error
Runtime error
add an example and readme
Browse files- README.md +43 -1
- lingo_judge_metric.py +11 -8
README.md
CHANGED
@@ -7,6 +7,48 @@ sdk: gradio
|
|
7 |
sdk_version: 4.29.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
sdk_version: 4.29.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
tags:
|
11 |
+
- evaluate
|
12 |
+
- metric
|
13 |
---
|
14 |
|
15 |
+
## Metric Description
|
16 |
+
|
17 |
+
Lingo-Judge is an evaluation metric that aligns closely with human judgement on the LingoQA evaluation suite.
|
18 |
+
|
19 |
+
See the project's README at [LingoQA](https://github.com/wayveai/LingoQA) for more information.
|
20 |
+
|
21 |
+
## How to use
|
22 |
+
|
23 |
+
This metric requires questions, predictions and references as inputs.
|
24 |
+
|
25 |
+
```python
|
26 |
+
>>> metric = evaluate.load("maysonma/lingo_judge_metric")
|
27 |
+
>>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
|
28 |
+
>>> references = [["Yes, green."]]
|
29 |
+
>>> predictions = ["No."]
|
30 |
+
>>> results = metric.compute(questions=questions, predictions=predictions, references=references)
|
31 |
+
>>> print(results)
|
32 |
+
[-3.38348388671875]
|
33 |
+
```
|
34 |
+
|
35 |
+
### Inputs
|
36 |
+
|
37 |
+
- **questions** (`list` of `str`): Input questions.
|
38 |
+
- **predictions** (`list` of `str`): Model predictions.
|
39 |
+
- **references** (`list` of `list` of `str`): Multiple references per question.
|
40 |
+
|
41 |
+
### Output Values
|
42 |
+
|
43 |
+
- **scores** (`list` of `float`): Score indicating truthfulness.
|
44 |
+
|
45 |
+
## Citation
|
46 |
+
|
47 |
+
```bibtex
|
48 |
+
@article{marcu2023lingoqa,
|
49 |
+
title={LingoQA: Video Question Answering for Autonomous Driving},
|
50 |
+
author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
|
51 |
+
journal={arXiv preprint arXiv:2312.14115},
|
52 |
+
year={2023},
|
53 |
+
}
|
54 |
+
```
|
lingo_judge_metric.py
CHANGED
@@ -33,14 +33,17 @@ Returns:
|
|
33 |
`scores` (list of float): Score indicating truthfulness.
|
34 |
|
35 |
Examples:
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
|
|
|
|
|
|
44 |
"""
|
45 |
|
46 |
|
|
|
33 |
`scores` (list of float): Score indicating truthfulness.
|
34 |
|
35 |
Examples:
|
36 |
+
>>> metric = evaluate.load("maysonma/lingo_judge_metric")
|
37 |
+
>>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
|
38 |
+
>>> references = [["Yes, green."]]
|
39 |
+
>>> predictions = ["No."]
|
40 |
+
>>> results = metric.compute(questions=questions, predictions=predictions, references=references)
|
41 |
+
>>> print(results)
|
42 |
+
[-3.38348388671875]
|
43 |
+
>>> predictions = ["Yes, they are green."]
|
44 |
+
>>> results = metric.compute(questions=questions, predictions=predictions, references=references)
|
45 |
+
>>> print(results)
|
46 |
+
[2.818930149078369]
|
47 |
"""
|
48 |
|
49 |
|