maysonma commited on
Commit
5e64746
·
1 Parent(s): af81765

add an example and readme

Browse files
Files changed (2) hide show
  1. README.md +43 -1
  2. lingo_judge_metric.py +11 -8
README.md CHANGED
@@ -7,6 +7,48 @@ sdk: gradio
7
  sdk_version: 4.29.0
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  sdk_version: 4.29.0
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ ## Metric Description
16
+
17
+ Lingo-Judge is an evaluation metric that aligns closely with human judgement on the LingoQA evaluation suite.
18
+
19
+ See the project's README at [LingoQA](https://github.com/wayveai/LingoQA) for more information.
20
+
21
+ ## How to use
22
+
23
+ This metric requires questions, predictions and references as inputs.
24
+
25
+ ```python
26
+ >>> metric = evaluate.load("maysonma/lingo_judge_metric")
27
+ >>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
28
+ >>> references = [["Yes, green."]]
29
+ >>> predictions = ["No."]
30
+ >>> results = metric.compute(questions=questions, predictions=predictions, references=references)
31
+ >>> print(results)
32
+ [-3.38348388671875]
33
+ ```
34
+
35
+ ### Inputs
36
+
37
+ - **questions** (`list` of `str`): Input questions.
38
+ - **predictions** (`list` of `str`): Model predictions.
39
+ - **references** (`list` of `list` of `str`): Multiple references per question.
40
+
41
+ ### Output Values
42
+
43
+ - **scores** (`list` of `float`): Score indicating truthfulness.
44
+
45
+ ## Citation
46
+
47
+ ```bibtex
48
+ @article{marcu2023lingoqa,
49
+ title={LingoQA: Video Question Answering for Autonomous Driving},
50
+ author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
51
+ journal={arXiv preprint arXiv:2312.14115},
52
+ year={2023},
53
+ }
54
+ ```
lingo_judge_metric.py CHANGED
@@ -33,14 +33,17 @@ Returns:
33
  `scores` (list of float): Score indicating truthfulness.
34
 
35
  Examples:
36
-
37
- Example 1
38
- >>> metric = evaluate.load("maysonma/lingo_judge_metric")
39
- >>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
40
- >>> references = [["Yes, green."]]
41
- >>> predictions = ["No."]
42
- >>> results = metric.compute(questions=questions, predictions=predictions, references=references)
43
- >>> print(results)
 
 
 
44
  """
45
 
46
 
 
33
  `scores` (list of float): Score indicating truthfulness.
34
 
35
  Examples:
36
+ >>> metric = evaluate.load("maysonma/lingo_judge_metric")
37
+ >>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
38
+ >>> references = [["Yes, green."]]
39
+ >>> predictions = ["No."]
40
+ >>> results = metric.compute(questions=questions, predictions=predictions, references=references)
41
+ >>> print(results)
42
+ [-3.38348388671875]
43
+ >>> predictions = ["Yes, they are green."]
44
+ >>> results = metric.compute(questions=questions, predictions=predictions, references=references)
45
+ >>> print(results)
46
+ [2.818930149078369]
47
  """
48
 
49