ArvinZhuang's picture
Update README.md
9cc3c09
---
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation
inference:
parameters:
do_sample: true
max_length: 64
top_k: 10
temperature: 1
num_return_sequences: 10
widget:
- text: >-
Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
- text: >-
Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
---
## Model description
mT5-large query generation model that is trained with XOR QA data.
Used in paper [Bridging the Gap Between Indexing and Retrieval for
Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf)
and [Augmenting Passage Representations with Query Generation
for Enhanced Cross-Lingual Dense Retrieval](https://arxiv.org/pdf/2305.03950.pdf)
### How to use
```python
from transformers import pipeline
lang2mT5 = dict(
ar='Arabic',
bn='Bengali',
fi='Finnish',
ja='Japanese',
ko='Korean',
ru='Russian',
te='Telugu'
)
PROMPT = 'Generate a {lang} question for this passage: {title} {passage}'
title = 'Transformer (machine learning model)'
passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \
'weighting the significance of each part of the input (which includes the recursive output) data.'
model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-large'
input_text = PROMPT.format_map({'lang': lang2mT5['ja'],
'title': title,
'passage': passage})
generator = pipeline(model=model_name_or_path,
task='text2text-generation',
device="cuda:0",
)
results = generator(input_text,
do_sample=True,
max_length=64,
num_return_sequences=10,
)
for i, result in enumerate(results):
print(f'{i + 1}. {result["generated_text"]}')
```
### BibTeX entry and citation info
```bibtex
@article{zhuang2022bridging,
title={Bridging the gap between indexing and retrieval for differentiable search index with query generation},
author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin},
journal={arXiv preprint arXiv:2206.10128},
year={2022}
}
@inproceedings{zhuang2023augmenting,
title={Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval},
author={Zhuang, Shengyao and Shou, Linjun and Zuccon, Guido},
booktitle={Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval},
year={2023}
}
```