File size: 1,532 Bytes
680f43b
 
 
3e8abc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: mit
---
# Translation Tables for Probablistic Structured Queries

This repository contains the raw translation tables for tha package [`fast_psq`](https://github.com/hltcoe/PSQ). 
Please refer to the GitHub for more information. 
The following is a brief example for using the tables. 

## Get started

`fast_psq` is available on PyPI.
```bash
pip install fast_psq ir_datasets ir_measures
```

The following is an example indexing command.
```bash
python -m fast_psq.index \
--doc_file irds:neuclir/1/zh/trec-2022 \
--lang zh \
--psq_file hltcoe/psq_translation_tables:zh.table.dict.gz \
--min_translation_prob 0.00010 \
--max_translation_alternatives 64 \
--max_translation_cdf 0.99 \
--docid doc_id \
--title title \
--body text \
--min_translation_prob 1e-4 \
--max_translation_alternatives 64 \
--output_dir ./indexes/neuclir-zh.f32/ \
--compression \
--nworkers 64
```

The following command is an example for searching. 
```bash
python -m fast_psq.search \
--query_source irds:neuclir/1/zh/trec-2022 \
--query_field title \
--index_dir ./indexes/neuclir-zh.f32/ \
--qrels irds:neuclir/1/zh/trec-2022 \
--query_lang en \
--output_file ./neuclir-zh.en.title.f32.trec
```


## Citation

```bibtex
@article{psq-repro,
    title = {Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval},
    author = {Eugene Yang and Suraj Nair and Dawn Lawrie and James Mayfield and Douglas W. Oard and Kevin Duh},
    journal = {arXiv preprint arXiv},
    year = {2024}
}
```