marcospiau commited on
Commit
2271450
·
verified ·
1 Parent(s): b5d397f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - unicamp-dl/mmarco
4
+ language:
5
+ - pt
6
+ pipeline_tag: text2text-generation
7
+ base_model: unicamp-dl/ptt5-v2-large
8
+ ---
9
+
10
+ ## Introduction
11
+ MonoPTT5 models are T5 rerankers for the Portuguese language. Starting from [ptt5-v2 checkpoints](https://huggingface.co/collections/unicamp-dl/ptt5-v2-666538a650188ba00aa8d2d0), they were trained for 100k steps on a mixture of Portuguese and English data from the mMARCO dataset.
12
+ For further information on the training and evaluation of these models, please refer to our paper, [ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language](https://arxiv.org/abs/2008.09144).
13
+
14
+ ## Usage
15
+ The easiest way to use our models is through the `rerankers` package. After installing the package using `pip install rerankers[transformers]`, the following code can be used as a minimal working example:
16
+
17
+ ```python
18
+ from rerankers import Reranker
19
+
20
+ query = "O futebol é uma paixão nacional"
21
+ docs = [
22
+ "O futebol é superestimado e não deveria receber tanta atenção.",
23
+ "O futebol é uma parte essencial da cultura brasileira e une as pessoas.",
24
+ ]
25
+
26
+ ranker = Reranker(
27
+ "unicamp-dl/monoptt5-small",
28
+ inputs_template="Pergunta: {query} Documento: {text} Relevante:",
29
+ )
30
+ # Relevant logging:
31
+ # Loading T5Ranker model unicamp-dl/monoptt5-small
32
+ # No device set
33
+ # Using device cpu
34
+ # No dtype set
35
+ # Device set to `cpu`, setting dtype to `float32`
36
+ # Using dtype torch.float32
37
+ # Loading model unicamp-dl/monoptt5-small, this might take a while...
38
+ # Using device cpu.
39
+ # Using dtype torch.float32.
40
+ # T5 true token set to ▁Sim
41
+ # T5 false token set to ▁Não
42
+ # Returning normalised scores...
43
+ # Inputs template set to Pergunta: {query} Documento: {text} Relevante:
44
+
45
+ results = ranker.rerank(query, docs)
46
+ # Results should be something like (can vary depending on the model, the example below uses the "unicamp-dl/monoptt5-small" model)
47
+ RankedResults(
48
+ results=[
49
+ Result(
50
+ document=Document(
51
+ text="O futebol é uma parte essencial da cultura brasileira e une as pessoas.",
52
+ doc_id=1,
53
+ metadata={},
54
+ ),
55
+ score=0.91943359375,
56
+ rank=1,
57
+ ),
58
+ Result(
59
+ document=Document(
60
+ text="O futebol é superestimado e não deveria receber tanta atenção.",
61
+ doc_id=0,
62
+ metadata={},
63
+ ),
64
+ score=0.0267486572265625,
65
+ rank=2,
66
+ ),
67
+ ],
68
+ query="O futebol é uma paixão nacional",
69
+ has_scores=True,
70
+ )
71
+
72
+ ```
73
+
74
+ For additional configurations and more advanced usage, consult the rerankers documentation.
75
+
76
+ # Citation
77
+ If you use our models, please cite:
78
+
79
+ @article{ptt5_2020,
80
+ title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data},
81
+ author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto},
82
+ journal={arXiv preprint arXiv:2008.09144},
83
+ year={2020}
84
+ }