metadata
library_name: transformers
language:
- he
Model Details
Model Description
This is the model card of a 馃 transformers model that has been pushed on the Hub.
- Model type: CrossEncoder
- Language(s) (NLP): Hebrew
- License: [More Information Needed]
- Finetuned from model [optional]: DictaBERT
Uses
Model was trained for ranking task as a part of a Hebrew semantic search engine.
How to Get Started with the Model
Use the code below to get started with the model.
from sentence_transformers import CrossEncoder
query = "注诇 诪讛 诇讗 讛住讻讬诐 讚讜讚 讘谉 讙讜专讬讜谉 诇讜讜转专?"
doc1 = """
诪诇讞诪转 住讬谞讬 讛住转讬讬诪讛 讘转讘讜住讛 砖诇 讛讻讜讞讜转 讛诪爪专讬讬诐, 讗讱 讘专讬转 讛诪讜注爪讜转 讜讗专爪讜转 讛讘专讬转 讛驻注讬诇讜 诇讞抓 讻讘讚 注诇 讬砖专讗诇 诇住讙转 诪讞爪讬 讛讗讬 住讬谞讬.
专讗砖 诪诪砖诇转 讬砖专讗诇, 讚讜讚 讘谉-讙讜专讬讜谉, 讛住讻讬诐, 讘注拽讘讜转 讛诇讞抓 砖诇 砖转讬 讛诪注爪诪讜转,
诇驻谞讜转 讗转 讞爪讬 讛讗讬 住讬谞讬 讜专爪讜注转 注讝讛 讘转讛诇讬讱 砖讛住转讬讬诐 讘诪专抓 1957,
讗讱 讛讜讚讬注 砖住讙讬专讛 砖诇 诪爪专讬 讟讬专讗谉 诇砖讬讟 讬砖专讗诇讬 转讛讜讜讛 注讬诇讛 诇诪诇讞诪讛.
讗专爪讜转 讛讘专讬转 讛转讞讬讬讘讛 诇讚讗讜讙 诇讛讘讟讞转 讞讜驻砖 讛诪注讘专 砖诇 讬砖专讗诇 讘诪爪专讬 讟讬专讗谉.
讻讜讞 讞讬专讜诐 讘讬谞诇讗讜诪讬 砖诇 讛讗讜"诐 讛讜爪讘 讘爪讚 讛诪爪专讬 砖诇 讛讙讘讜诇 注诐 讬砖专讗诇 讜讘砖讗专诐 讗-砖讬讬讞' 讜讻转讜爪讗讛 诪讻讱 谞砖讗专 谞转讬讘 讛砖讬讟 讘诪驻专抓 讗讬诇转 驻转讜讞 诇砖讬讟 讛讬砖专讗诇讬.
"""
doc2 = """
讬诐 住讜祝 诪讛讜讜讛 诪讜拽讚 讞砖讜讘 诇转讬讬专讜转 诪专讞讘讬 讛注讜诇诐.
诪讝讙 讛讗讜讜讬专 讛谞讜讞 讘注讜谞转 讛讞讜专祝, 讛讞讜驻讬诐 讛讬驻讬诐, 讛讬诐 讛爪诇讜诇 讜讗转专讬 讛爪诇讬诇讛 讛诪专讛讬讘讬诐 诇讞讜驻讬 住讬谞讬,
诪爪专讬诐, 讜住讜讚讗谉 讛讜驻讻讬诐 讗转 讞讜驻讬 讬诐 住讜祝 诇讬注讚 转讬讬专讜转 诪讘讜拽砖.
专讗住 诪讜讞诪讚 讜讛讞讜专 讛讻讞讜诇 讘住讬谞讬, 讬讚讜注讬诐 讻讗转专讬 爪诇讬诇讛 诪讛诪专讛讬讘讬诐 讘注讜诇诐.
诪讗讝 讛住讻诐 讛砖诇讜诐 讘讬谉 讬砖专讗诇 诇诪爪专讬诐 驻讬转讞讛 诪爪专讬诐 诪讗讜讚 讗转 讛转讬讬专讜转 诇讗讜专讱 讞讜驻讬 讬诐 住讜祝,
讜讘诪讬讜讞讚 讘住讬谞讬, 讜讘谞转讛 注砖专讜转 讗转专讬 转讬讬专讜转 讜诪讗讜转 诪诇讜谞讜转 讜讻驻专讬 谞讜驻砖.
转讬讬专讜转 讝讜 谞驻讙注讛 拽砖讜转 诪讗讝 讛诪讛驻讻讛 砖诇 2011 讘诪爪专讬诐,
注诐 注诇讬讬讛 讞讚讛 讘转拽专讬讜转 讟专讜专 诪爪讚 讗专讙讜谞讬诐 讗住诇讗诪讬讬诐 拽讬爪讜谞讬讬诐 讘住讬谞讬.
"""
model = CrossEncoder("haguy77/dictabert-ce")
scores = model.predict([[query, doc1], [query, doc2]]) # Note: query should ALWAYS be the first of each pair
# array([0.02000629, 0.00031683], dtype=float32)
results = model.rank(query, [doc2, doc1])
# [{'corpus_id': 1, 'score': 0.020006292}, {'corpus_id': 0, 'score': 0.00031683326}]
Training Data
Hebrew Question Answering Dataset (HeQ)
Citation
BibTeX:
@misc{shmidman2023dictabert,
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
year={2023},
eprint={2308.16687},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{cohen2023heq,
title={Heq: a large and diverse hebrew reading comprehension benchmark},
author={Cohen, Amir and Merhav-Fine, Hilla and Goldberg, Yoav and Tsarfaty, Reut},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
pages={13693--13705},
year={2023}
}
APA:
Shmidman, S., Shmidman, A., & Koppel, M. (2023). DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew. arXiv preprint arXiv:2308.16687.
Cohen, A., Merhav-Fine, H., Goldberg, Y., & Tsarfaty, R. (2023, December). Heq: a large and diverse hebrew reading comprehension benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 13693-13705).