eugene-yang
commited on
Commit
•
3e8abc2
1
Parent(s):
8203834
update README
Browse files
README.md
CHANGED
@@ -1,3 +1,57 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
# Translation Tables for Probablistic Structured Queries
|
5 |
+
|
6 |
+
This repository contains the raw translation tables for tha package [`fast_psq`](https://github.com/hltcoe/PSQ).
|
7 |
+
Please refer to the GitHub for more information.
|
8 |
+
The following is a brief example for using the tables.
|
9 |
+
|
10 |
+
## Get started
|
11 |
+
|
12 |
+
`fast_psq` is available on PyPI.
|
13 |
+
```bash
|
14 |
+
pip install fast_psq ir_datasets ir_measures
|
15 |
+
```
|
16 |
+
|
17 |
+
The following is an example indexing command.
|
18 |
+
```bash
|
19 |
+
python -m fast_psq.index \
|
20 |
+
--doc_file irds:neuclir/1/zh/trec-2022 \
|
21 |
+
--lang zh \
|
22 |
+
--psq_file hltcoe/psq_translation_tables:zh.table.dict.gz \
|
23 |
+
--min_translation_prob 0.00010 \
|
24 |
+
--max_translation_alternatives 64 \
|
25 |
+
--max_translation_cdf 0.99 \
|
26 |
+
--docid doc_id \
|
27 |
+
--title title \
|
28 |
+
--body text \
|
29 |
+
--min_translation_prob 1e-4 \
|
30 |
+
--max_translation_alternatives 64 \
|
31 |
+
--output_dir ./indexes/neuclir-zh.f32/ \
|
32 |
+
--compression \
|
33 |
+
--nworkers 64
|
34 |
+
```
|
35 |
+
|
36 |
+
The following command is an example for searching.
|
37 |
+
```bash
|
38 |
+
python -m fast_psq.search \
|
39 |
+
--query_source irds:neuclir/1/zh/trec-2022 \
|
40 |
+
--query_field title \
|
41 |
+
--index_dir ./indexes/neuclir-zh.f32/ \
|
42 |
+
--qrels irds:neuclir/1/zh/trec-2022 \
|
43 |
+
--query_lang en \
|
44 |
+
--output_file ./neuclir-zh.en.title.f32.trec
|
45 |
+
```
|
46 |
+
|
47 |
+
|
48 |
+
## Citation
|
49 |
+
|
50 |
+
```bibtex
|
51 |
+
@article{psq-repro,
|
52 |
+
title = {Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval},
|
53 |
+
author = {Eugene Yang and Suraj Nair and Dawn Lawrie and James Mayfield and Douglas W. Oard and Kevin Duh},
|
54 |
+
journal = {arXiv preprint arXiv},
|
55 |
+
year = {2024}
|
56 |
+
}
|
57 |
+
```
|