File size: 2,547 Bytes
50ee307 1e476b6 50ee307 6e12f13 50ee307 93d9bf2 50ee307 bfd9226 50ee307 1e476b6 50ee307 1e476b6 50ee307 6e12f13 50ee307 1e476b6 50ee307 6e12f13 50ee307 1e476b6 50ee307 1e476b6 50ee307 1e476b6 50ee307 1e476b6 50ee307 1e476b6 50ee307 1e476b6 cea0e16 1e476b6 cea0e16 1e476b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
license: mit
datasets:
- WhereIsAI/github-issue-similarity
language:
- en
---
# SeanLee97/UAE-GIS-Large-V1
This model is trained on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871).
It can be used for measuring **code/issue similarity**.
Results (test set):
- Spearman correlation: 71.19
- Accuracy: 84.37
## Usage
### 1. Install
```
python -m pip install -U angle-emb
```
### 2. Example
```python
from scipy import spatial
from angle_emb import AnglE
model = AnglE.from_pretrained('SeanLee97/UAE-GIS-Large-V1').cuda()
quick_sort = '''# Approach 2: Quicksort using list comprehension
def quicksort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
left = [x for x in arr[1:] if x < pivot]
right = [x for x in arr[1:] if x >= pivot]
return quicksort(left) + [pivot] + quicksort(right)
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''
bubble_sort = '''def bubblesort(elements):
# Looping from size of array from last index[-1] to index [0]
for n in range(len(elements)-1, 0, -1):
swapped = False
for i in range(n):
if elements[i] > elements[i + 1]:
swapped = True
# swapping data if the element is less than next element in the array
elements[i], elements[i + 1] = elements[i + 1], elements[i]
if not swapped:
# exiting the function if we didn't make a single swap
# meaning that the array is already sorted.
return
elements = [39, 12, 18, 85, 72, 10, 2, 18]
print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''
vecs = model.encode([
'def echo(): print("hello world")',
quick_sort,
bubble_sort
])
print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))
```
output:
```
cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796
```
# Citation
```bibtex
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
```
|