|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
|
|
inference: |
|
parameters: |
|
do_sample: true |
|
max_length: 64 |
|
top_k: 10 |
|
temperature: 1 |
|
num_return_sequences: 10 |
|
widget: |
|
- text: >- |
|
Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data. |
|
|
|
- text: >- |
|
Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data. |
|
--- |
|
|
|
## Model description |
|
|
|
mT5-large query generation model that is trained with XOR QA data. |
|
|
|
Used in paper [Bridging the Gap Between Indexing and Retrieval for |
|
Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf) |
|
|
|
and [Augmenting Passage Representations with Query Generation |
|
for Enhanced Cross-Lingual Dense Retrieval](https://arxiv.org/pdf/2305.03950.pdf) |
|
|
|
### How to use |
|
```python |
|
from transformers import pipeline |
|
|
|
lang2mT5 = dict( |
|
ar='Arabic', |
|
bn='Bengali', |
|
fi='Finnish', |
|
ja='Japanese', |
|
ko='Korean', |
|
ru='Russian', |
|
te='Telugu' |
|
) |
|
PROMPT = 'Generate a {lang} question for this passage: {title} {passage}' |
|
|
|
title = 'Transformer (machine learning model)' |
|
passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \ |
|
'weighting the significance of each part of the input (which includes the recursive output) data.' |
|
|
|
|
|
model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-large' |
|
input_text = PROMPT.format_map({'lang': lang2mT5['ja'], |
|
'title': title, |
|
'passage': passage}) |
|
|
|
generator = pipeline(model=model_name_or_path, |
|
task='text2text-generation', |
|
device="cuda:0", |
|
) |
|
|
|
results = generator(input_text, |
|
do_sample=True, |
|
max_length=64, |
|
num_return_sequences=10, |
|
) |
|
|
|
for i, result in enumerate(results): |
|
print(f'{i + 1}. {result["generated_text"]}') |
|
``` |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@article{zhuang2022bridging, |
|
title={Bridging the gap between indexing and retrieval for differentiable search index with query generation}, |
|
author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin}, |
|
journal={arXiv preprint arXiv:2206.10128}, |
|
year={2022} |
|
} |
|
|
|
@inproceedings{zhuang2023augmenting, |
|
title={Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval}, |
|
author={Zhuang, Shengyao and Shou, Linjun and Zuccon, Guido}, |
|
booktitle={Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval}, |
|
year={2023} |
|
} |
|
``` |