julianrisch commited on
Commit
dafe599
1 Parent(s): 82faf62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -9
README.md CHANGED
@@ -29,7 +29,7 @@ model-index:
29
  verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGE5MWJmZGUxMGMwNWFhYzVhZjQwZGEwOWQ4N2Q2Yjg5NzdjNDFiNDhiYTQ1Y2E5ZWJkOTFhYmI1Y2Q2ZGYwOCIsInZlcnNpb24iOjF9.TIdH-tOx3kEMDs5wK1r6iwZqqSjNGlBrpawrsE917j1F3UFJVnQ7wJwaj0OIgmC4iw8OQeLZL56ucBcLApa-AQ
30
  ---
31
 
32
- # Multilingual XLM-RoBERTa large for QA on various languages
33
 
34
  ## Overview
35
  **Language model:** xlm-roberta-large
@@ -38,6 +38,7 @@ model-index:
38
  **Training data:** SQuAD 2.0
39
  **Eval data:** SQuAD dev set - German MLQA - German XQuAD
40
  **Training run:** [MLFlow link](https://public-mlflow.deepset.ai/#/experiments/124/runs/3a540e3f3ecf4dd98eae8fc6d457ff20)
 
41
  **Infrastructure**: 4x Tesla v100
42
 
43
  ## Hyperparameters
@@ -52,7 +53,51 @@ lr_schedule = LinearWarmup
52
  warmup_proportion = 0.2
53
  doc_stride=128
54
  max_query_length=64
55
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ## Performance
58
  Evaluated on the SQuAD 2.0 English dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
@@ -118,6 +163,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
118
  **Tanay Soni:** [email protected]
119
 
120
  ## About us
 
121
  <div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
122
  <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
123
  <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
@@ -127,13 +173,12 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
127
  </div>
128
  </div>
129
 
130
- [deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.
131
-
132
 
133
  Some of our other work:
134
- - [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")]([https://huggingface.co/deepset/tinyroberta-squad2)
135
- - [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
136
- - [GermanQuAD and GermanDPR datasets and models (aka "gelectra-base-germanquad", "gbert-base-germandpr")](https://deepset.ai/germanquad)
137
 
138
  ## Get in touch and join the Haystack community
139
 
@@ -141,6 +186,6 @@ Some of our other work:
141
 
142
  We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community">Discord community open to everyone!</a></strong></p>
143
 
144
- [Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)
145
 
146
- By the way: [we're hiring!](http://www.deepset.ai/jobs)
 
29
  verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGE5MWJmZGUxMGMwNWFhYzVhZjQwZGEwOWQ4N2Q2Yjg5NzdjNDFiNDhiYTQ1Y2E5ZWJkOTFhYmI1Y2Q2ZGYwOCIsInZlcnNpb24iOjF9.TIdH-tOx3kEMDs5wK1r6iwZqqSjNGlBrpawrsE917j1F3UFJVnQ7wJwaj0OIgmC4iw8OQeLZL56ucBcLApa-AQ
30
  ---
31
 
32
+ # Multilingual XLM-RoBERTa large for Extractive QA on various languages
33
 
34
  ## Overview
35
  **Language model:** xlm-roberta-large
 
38
  **Training data:** SQuAD 2.0
39
  **Eval data:** SQuAD dev set - German MLQA - German XQuAD
40
  **Training run:** [MLFlow link](https://public-mlflow.deepset.ai/#/experiments/124/runs/3a540e3f3ecf4dd98eae8fc6d457ff20)
41
+ **Code:** See [an example extractive QA pipeline built with Haystack](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline)
42
  **Infrastructure**: 4x Tesla v100
43
 
44
  ## Hyperparameters
 
53
  warmup_proportion = 0.2
54
  doc_stride=128
55
  max_query_length=64
56
+ ```
57
+
58
+ ## Usage
59
+
60
+ ### In Haystack
61
+ Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents.
62
+ To load and run the model with [Haystack](https://github.com/deepset-ai/haystack/):
63
+ ```python
64
+ # After running pip install haystack-ai "transformers[torch,sentencepiece]"
65
+
66
+ from haystack import Document
67
+ from haystack.components.readers import ExtractiveReader
68
+
69
+ docs = [
70
+ Document(content="Python is a popular programming language"),
71
+ Document(content="python ist eine beliebte Programmiersprache"),
72
+ ]
73
+
74
+ reader = ExtractiveReader(model="deepset/xlm-roberta-large-squad2")
75
+ reader.warm_up()
76
+
77
+ question = "What is a popular programming language?"
78
+ result = reader.run(query=question, documents=docs)
79
+ # {'answers': [ExtractedAnswer(query='What is a popular programming language?', score=0.5740374326705933, data='python', document=Document(id=..., content: '...'), context=None, document_offset=ExtractedAnswer.Span(start=0, end=6),...)]}
80
+ ```
81
+ For a complete example with an extractive question answering pipeline that scales over many documents, check out the [corresponding Haystack tutorial](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline).
82
+
83
+ ### In Transformers
84
+ ```python
85
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
86
+
87
+ model_name = "deepset/xlm-roberta-large-squad2"
88
+
89
+ # a) Get predictions
90
+ nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
91
+ QA_input = {
92
+ 'question': 'Why is model conversion important?',
93
+ 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
94
+ }
95
+ res = nlp(QA_input)
96
+
97
+ # b) Load model & tokenizer
98
+ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
99
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
100
+ ```
101
 
102
  ## Performance
103
  Evaluated on the SQuAD 2.0 English dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
 
163
  **Tanay Soni:** [email protected]
164
 
165
  ## About us
166
+
167
  <div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
168
  <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
169
  <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
 
173
  </div>
174
  </div>
175
 
176
+ [deepset](http://deepset.ai/) is the company behind the production-ready open-source AI framework [Haystack](https://haystack.deepset.ai/).
 
177
 
178
  Some of our other work:
179
+ - [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")](https://huggingface.co/deepset/tinyroberta-squad2)
180
+ - [German BERT](https://deepset.ai/german-bert), [GermanQuAD and GermanDPR](https://deepset.ai/germanquad), [German embedding model](https://huggingface.co/mixedbread-ai/deepset-mxbai-embed-de-large-v1)
181
+ - [deepset Cloud](https://www.deepset.ai/deepset-cloud-product), [deepset Studio](https://www.deepset.ai/deepset-studio)
182
 
183
  ## Get in touch and join the Haystack community
184
 
 
186
 
187
  We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community">Discord community open to everyone!</a></strong></p>
188
 
189
+ [Twitter](https://twitter.com/Haystack_AI) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://haystack.deepset.ai/) | [YouTube](https://www.youtube.com/@deepset_ai)
190
 
191
+ By the way: [we're hiring!](http://www.deepset.ai/jobs)