File size: 993 Bytes
c7b84fc
 
9ad9da9
 
 
 
 
c7b84fc
 
9ad9da9
c7b84fc
c296b3f
 
c7b84fc
9ad9da9
c7b84fc
9ad9da9
 
 
c7b84fc
9ad9da9
 
b4c270e
9ad9da9
 
9c96694
c7b84fc
 
9ad9da9
 
 
 
 
 
 
c7b84fc
9ad9da9
 
 
 
 
 
 
c7b84fc
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
base_model: dicta-il/dictalm2.0-instruct
license: mit
datasets:
- HeNLP/HeDC4
language:
- he
---

# LLM2Vec applied on DictaLM-2.0

This is a Hebrew encoder model achieved by applying the [LLM2Vec](https://arxiv.org/abs/2404.05961) method onto [DictaLM-2.0](https://huggingface.co/dicta-il/dictalm2.0),
utilizing the [HeDC4](https://huggingface.co/datasets/HeNLP/HeDC4) dataset.

## Usage

```python
import torch
from llm2vec import LLM2Vec

def get_device() -> str:
    if torch.backends.mps.is_available():
        return "mps"
    elif torch.cuda.is_available():
        return "cuda"
    return "cpu"


l2v = LLM2Vec.from_pretrained(
    base_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp",
    peft_model_name_or_path="omriel1/LLM2Vec-DictaLM2.0-mntp-unsup-simcse",
    device_map=get_device(),
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

texts = [
    "ื”ื™ื™ ืžื” ืงื•ืจื”?",
    "ื”ื›ืœ ื˜ื•ื‘ ืื™ืชืš?"
]
results = l2v.encode(texts)
print(results)
```