File size: 993 Bytes
c7b84fc 9ad9da9 c7b84fc 9ad9da9 c7b84fc c296b3f c7b84fc 9ad9da9 c7b84fc 9ad9da9 c7b84fc 9ad9da9 b4c270e 9ad9da9 9c96694 c7b84fc 9ad9da9 c7b84fc 9ad9da9 c7b84fc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
base_model: dicta-il/dictalm2.0-instruct
license: mit
datasets:
- HeNLP/HeDC4
language:
- he
---
# LLM2Vec applied on DictaLM-2.0
This is a Hebrew encoder model achieved by applying the [LLM2Vec](https://arxiv.org/abs/2404.05961) method onto [DictaLM-2.0](https://huggingface.co/dicta-il/dictalm2.0),
utilizing the [HeDC4](https://huggingface.co/datasets/HeNLP/HeDC4) dataset.
## Usage
```python
import torch
from llm2vec import LLM2Vec
def get_device() -> str:
if torch.backends.mps.is_available():
return "mps"
elif torch.cuda.is_available():
return "cuda"
return "cpu"
l2v = LLM2Vec.from_pretrained(
base_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp",
peft_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp-unsup-simcse",
device_map=get_device(),
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
texts = [
"ืืื ืื ืงืืจื?",
"ืืื ืืื ืืืชื?"
]
results = l2v.encode(texts)
print(results)
```
|