|
--- |
|
library_name: transformers |
|
language: |
|
- he |
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is the model card of a 馃 transformers model that has been pushed on the Hub. |
|
|
|
- **Model type:** CrossEncoder |
|
- **Language(s) (NLP):** Hebrew |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** [DictaBERT](https://huggingface.co/dicta-il/dictabert) |
|
|
|
|
|
## Uses |
|
|
|
Model was trained for ranking task as a part of a Hebrew semantic search engine. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
query = "注诇 诪讛 诇讗 讛住讻讬诐 讚讜讚 讘谉 讙讜专讬讜谉 诇讜讜转专?" |
|
doc1 = """ |
|
诪诇讞诪转 住讬谞讬 讛住转讬讬诪讛 讘转讘讜住讛 砖诇 讛讻讜讞讜转 讛诪爪专讬讬诐, 讗讱 讘专讬转 讛诪讜注爪讜转 讜讗专爪讜转 讛讘专讬转 讛驻注讬诇讜 诇讞抓 讻讘讚 注诇 讬砖专讗诇 诇住讙转 诪讞爪讬 讛讗讬 住讬谞讬. |
|
专讗砖 诪诪砖诇转 讬砖专讗诇, 讚讜讚 讘谉-讙讜专讬讜谉, 讛住讻讬诐, 讘注拽讘讜转 讛诇讞抓 砖诇 砖转讬 讛诪注爪诪讜转, |
|
诇驻谞讜转 讗转 讞爪讬 讛讗讬 住讬谞讬 讜专爪讜注转 注讝讛 讘转讛诇讬讱 砖讛住转讬讬诐 讘诪专抓 1957, |
|
讗讱 讛讜讚讬注 砖住讙讬专讛 砖诇 诪爪专讬 讟讬专讗谉 诇砖讬讟 讬砖专讗诇讬 转讛讜讜讛 注讬诇讛 诇诪诇讞诪讛. |
|
讗专爪讜转 讛讘专讬转 讛转讞讬讬讘讛 诇讚讗讜讙 诇讛讘讟讞转 讞讜驻砖 讛诪注讘专 砖诇 讬砖专讗诇 讘诪爪专讬 讟讬专讗谉. |
|
讻讜讞 讞讬专讜诐 讘讬谞诇讗讜诪讬 砖诇 讛讗讜"诐 讛讜爪讘 讘爪讚 讛诪爪专讬 砖诇 讛讙讘讜诇 注诐 讬砖专讗诇 讜讘砖讗专诐 讗-砖讬讬讞' 讜讻转讜爪讗讛 诪讻讱 谞砖讗专 谞转讬讘 讛砖讬讟 讘诪驻专抓 讗讬诇转 驻转讜讞 诇砖讬讟 讛讬砖专讗诇讬. |
|
""" |
|
doc2 = """ |
|
讬诐 住讜祝 诪讛讜讜讛 诪讜拽讚 讞砖讜讘 诇转讬讬专讜转 诪专讞讘讬 讛注讜诇诐. |
|
诪讝讙 讛讗讜讜讬专 讛谞讜讞 讘注讜谞转 讛讞讜专祝, 讛讞讜驻讬诐 讛讬驻讬诐, 讛讬诐 讛爪诇讜诇 讜讗转专讬 讛爪诇讬诇讛 讛诪专讛讬讘讬诐 诇讞讜驻讬 住讬谞讬, |
|
诪爪专讬诐, 讜住讜讚讗谉 讛讜驻讻讬诐 讗转 讞讜驻讬 讬诐 住讜祝 诇讬注讚 转讬讬专讜转 诪讘讜拽砖. |
|
专讗住 诪讜讞诪讚 讜讛讞讜专 讛讻讞讜诇 讘住讬谞讬, 讬讚讜注讬诐 讻讗转专讬 爪诇讬诇讛 诪讛诪专讛讬讘讬诐 讘注讜诇诐. |
|
诪讗讝 讛住讻诐 讛砖诇讜诐 讘讬谉 讬砖专讗诇 诇诪爪专讬诐 驻讬转讞讛 诪爪专讬诐 诪讗讜讚 讗转 讛转讬讬专讜转 诇讗讜专讱 讞讜驻讬 讬诐 住讜祝, |
|
讜讘诪讬讜讞讚 讘住讬谞讬, 讜讘谞转讛 注砖专讜转 讗转专讬 转讬讬专讜转 讜诪讗讜转 诪诇讜谞讜转 讜讻驻专讬 谞讜驻砖. |
|
转讬讬专讜转 讝讜 谞驻讙注讛 拽砖讜转 诪讗讝 讛诪讛驻讻讛 砖诇 2011 讘诪爪专讬诐, |
|
注诐 注诇讬讬讛 讞讚讛 讘转拽专讬讜转 讟专讜专 诪爪讚 讗专讙讜谞讬诐 讗住诇讗诪讬讬诐 拽讬爪讜谞讬讬诐 讘住讬谞讬. |
|
""" |
|
|
|
model = CrossEncoder("haguy77/dictabert-ce") |
|
|
|
scores = model.predict([[query, doc1], [query, doc2]]) # Note: query should ALWAYS be the first of each pair |
|
# array([0.02000629, 0.00031683], dtype=float32) |
|
|
|
results = model.rank(query, [doc2, doc1]) |
|
# [{'corpus_id': 1, 'score': 0.020006292}, {'corpus_id': 0, 'score': 0.00031683326}] |
|
``` |
|
|
|
### Training Data |
|
|
|
[Hebrew Question Answering Dataset (HeQ)](https://github.com/NNLP-IL/Hebrew-Question-Answering-Dataset) |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{shmidman2023dictabert, |
|
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, |
|
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel}, |
|
year={2023}, |
|
eprint={2308.16687}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
```bibtex |
|
@inproceedings{cohen2023heq, |
|
title={Heq: a large and diverse hebrew reading comprehension benchmark}, |
|
author={Cohen, Amir and Merhav-Fine, Hilla and Goldberg, Yoav and Tsarfaty, Reut}, |
|
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023}, |
|
pages={13693--13705}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
**APA:** |
|
```apa |
|
Shmidman, S., Shmidman, A., & Koppel, M. (2023). DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew. arXiv preprint arXiv:2308.16687. |
|
|
|
Cohen, A., Merhav-Fine, H., Goldberg, Y., & Tsarfaty, R. (2023, December). Heq: a large and diverse hebrew reading comprehension benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 13693-13705). |
|
``` |