Spaces:
Running
Running
File size: 4,278 Bytes
468ed0c 5b46ada c3569bd 468ed0c b56e99c 468ed0c c3569bd 468ed0c 5b46ada |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
title: TREC Eval
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
The TREC Eval metric combines a number of information retrieval metrics such as precision and nDCG. It is used to score rankings of retrieved documents with reference values.
---
# Metric Card for TREC Eval
## Metric Description
The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values.
## How to Use
```Python
from evaluate import load
trec_eval = load("trec_eval")
results = trec_eval.compute(predictions=[run], references=[qrel])
```
### Inputs
- **predictions** *(dict): a single retrieval run.*
- **query** *(int): Query ID.*
- **q0** *(str): Literal `"q0"`.*
- **docid** *(str): Document ID.*
- **rank** *(int): Rank of document.*
- **score** *(float): Score of document.*
- **system** *(str): Tag for current run.*
- **references** *(dict): a single qrel.*
- **query** *(int): Query ID.*
- **q0** *(str): Literal `"q0"`.*
- **docid** *(str): Document ID.*
- **rel** *(int): Relevance of document.*
### Output Values
- **runid** *(str): Run name.*
- **num_ret** *(int): Number of retrieved documents.*
- **num_rel** *(int): Number of relevant documents.*
- **num_rel_ret** *(int): Number of retrieved relevant documents.*
- **num_q** *(int): Number of queries.*
- **map** *(float): Mean average precision.*
- **gm_map** *(float): geometric mean average precision.*
- **bpref** *(float): binary preference score.*
- **Rprec** *(float): precision@R, where R is number of relevant documents.*
- **recip_rank** *(float): reciprocal rank*
- **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
- **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
### Examples
A minimal example of looks as follows:
```Python
qrel = {
"query": [0],
"q0": ["q0"],
"docid": ["doc_1"],
"rel": [2]
}
run = {
"query": [0, 0],
"q0": ["q0", "q0"],
"docid": ["doc_2", "doc_1"],
"rank": [0, 1],
"score": [1.5, 1.2],
"system": ["test", "test"]
}
trec_eval = evaluate.load("trec_eval")
results = trec_eval.compute(references=[qrel], predictions=[run])
results["P@5"]
0.2
```
A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools):
```python
qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"])
qrel["q0"] = qrel["q0"].astype(str)
qrel = qrel.to_dict(orient="list")
run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"])
run = run.to_dict(orient="list")
trec_eval = evaluate.load("trec_eval")
result = trec_eval.compute(run=[run], qrel=[qrel])
```
```python
result
{'runid': 'InexpC2',
'num_ret': 100000,
'num_rel': 6074,
'num_rel_ret': 3198,
'num_q': 100,
'map': 0.22485930431817494,
'gm_map': 0.10411523825735523,
'bpref': 0.217511695914079,
'Rprec': 0.2502547201167236,
'recip_rank': 0.6646545943335417,
'P@5': 0.44,
'P@10': 0.37,
'P@15': 0.34600000000000003,
'P@20': 0.30999999999999994,
'P@30': 0.2563333333333333,
'P@100': 0.1428,
'P@200': 0.09510000000000002,
'P@500': 0.05242,
'P@1000': 0.03198,
'NDCG@5': 0.4101480395089769,
'NDCG@10': 0.3806761417784469,
'NDCG@15': 0.37819463408955706,
'NDCG@20': 0.3686080836061317,
'NDCG@30': 0.352474353427451,
'NDCG@100': 0.3778329431025776,
'NDCG@200': 0.4119129817248979,
'NDCG@500': 0.4585354576461375,
'NDCG@1000': 0.49092149290805653}
```
## Limitations and Bias
The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references.
## Citation
```bibtex
@inproceedings{palotti2019,
author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
series = {SIGIR'19},
year = {2019},
location = {Paris, France},
publisher = {ACM}
}
```
## Further References
- Homepage: https://github.com/joaopalotti/trectools |