File size: 2,547 Bytes
50ee307
 
1e476b6
 
 
 
50ee307
 
6e12f13
50ee307
 
93d9bf2
 
50ee307
bfd9226
50ee307
1e476b6
 
50ee307
 
1e476b6
50ee307
6e12f13
50ee307
1e476b6
 
 
50ee307
6e12f13
50ee307
1e476b6
 
 
50ee307
1e476b6
50ee307
1e476b6
50ee307
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
 
50ee307
 
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
50ee307
1e476b6
 
 
 
 
 
 
 
 
cea0e16
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
 
cea0e16
 
1e476b6
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
license: mit
datasets:
- WhereIsAI/github-issue-similarity
language:
- en
---

# SeanLee97/UAE-GIS-Large-V1


This model is trained on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871). 
It can be used for measuring **code/issue similarity**.

Results (test set):

- Spearman correlation: 71.19
- Accuracy: 84.37


## Usage

### 1. Install

```
python -m pip install -U angle-emb
```

### 2. Example

```python
from scipy import spatial
from angle_emb import AnglE

model = AnglE.from_pretrained('SeanLee97/UAE-GIS-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): print("hello world")',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))

```

output:

```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```

# Citation

```bibtex
@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}
```