File size: 1,655 Bytes
7b0d06b
 
 
aedc80c
 
7b0d06b
 
 
 
 
 
 
67931fb
09b90bc
 
 
 
 
67931fb
 
 
 
 
 
 
09b90bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: RepoSnipy
emoji: πŸπŸ”«
colorFrom: gray
colorTo: gray
sdk: streamlit
sdk_version: 1.21.0
python_version: 3.11.3
app_file: app.py
pinned: true
license: mit
---
# RepoSnipy πŸπŸ”«

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/Lazyhope/RepoSnipy)

Neural search engine for discovering semantically similar Python repositories on GitHub.

## Demo

Searching an indexed repository:

![Search Indexed Repo Demo](assets/search.gif)


## About

RepoSnipy is a neural search engine built with [streamlit](https://github.com/streamlit/streamlit) and [docarray](https://github.com/docarray/docarray). You can query a public Python repository hosted on GitHub and find popular repositories that are semantically similar to it.

It uses the [RepoSim](https://github.com/RepoAnalysis/RepoSim/) pipeline to create embeddings for Python repositories. We have created a [vector dataset](data/index.bin) (stored as docarray index) of over 9700 GitHub Python repositories that has license and over 300 stars by the time of 20th May, 2023.

## Running Locally

Download the repository and install the required packages:

```bash
git clone https://github.com/RepoAnalysis/RepoSnipy
cd RepoSnipy
pip install -r requirements.txt
```

Then run the app on your local machine using:

```bash
streamlit run app.py
```

## License

Distributed under the MIT License. See [LICENSE](LICENSE) for more information.

## Acknowledgments

The model and the fine-tuning dataset used:

* [UniXCoder](https://arxiv.org/abs/2203.03850)
* [AdvTest](https://arxiv.org/abs/1909.09436)