shrivasd commited on
Commit
3f894c9
·
1 Parent(s): b7807c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -7,4 +7,25 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- THis is the RepoFusion organization page.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # RepoFusion: Training Code Models to Understand Your Repository
11
+ Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
12
+
13
+ This space contains the released resources for our paper "RepoFusion: Training Code Models to Understand Your Repository". A block diagram of our approach can be found below. For more details, refer to the paper.
14
+
15
+ <p align="center">
16
+ <img src="block_diagram.png" width=1000>
17
+ </p>
18
+
19
+ ## Data
20
+ Stack-Repo can be downloaded from the [Datasets](https://huggingface.co/datasets/RepoFusion/Stack-Repo) section of this space. It contains three folders corresponding to our train, validation and test splits. Each split contains separate folder for a repository where each repository contains all .java files in the repository in the original directory structure along with three .json files corresponding to the Prompt Proposal, BM25 and RandomNN repo contexts. Please see the README for the Datasets section for organization and details of accessing our dataset.
21
+
22
+ ## Trained Checkpoints
23
+ The trained checkpoints can be downloaded from the [Models](https://huggingface.co/RepoFusion/trained_checkpoints). We have released the following checkpoints:
24
+ - `RepoFusion_PP_contexts`: RepoFusion model trained with prompt proposal repo contexts. This is our best performing model.
25
+ - `RepoFusion_BM25_contexts`: RepoFusion model trained with BM25 repo contexts.
26
+ - `RepoFusion_RandomNN_contexts`: RepoFusion model trained with RandomNN repo contexts.
27
+ - `finetuned_codet5base`: Our finetuned CodeT5-base model. This was used as initialization for our RepoFusion models.
28
+ - `finetuned_codet5large`: Our finetuned CodeT5-large model. This was used as a baseline.
29
+
30
+ ## Code
31
+ We will be releasing the code for training and evaluating RepoFusion, finetuning CodeT5, and details of running the scripts shortly. Watch out this space for updates.