tals commited on
Commit
6dabb9a
·
1 Parent(s): 1bdcc0a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # roberta_python
2
+ ---
3
+ language: python
4
+ datasets:
5
+ - code_search_net
6
+ - Fraser/python-lines
7
+ tags:
8
+ - python
9
+ - code
10
+ - masked-lm
11
+ widget:
12
+ - text "assert 6 == sum([i for i in range(<mask>)])"
13
+ ---
14
+ # Details
15
+ This is a roBERTa-base model trained on the python part of [CodeSearchNet](https://github.com/github/CodeSearchNet) and reached a dev perplexity of 3.296
16
+
17
+ This model was used for the Programming Puzzles enumerative solver baseline detailed in [Programming Puzzles paper](https://arxiv.org/abs/2106.05784).
18
+
19
+ See also the [Python Programming Puzzles (P3) Repository](https://github.com/microsoft/PythonProgrammingPuzzles) for more details.
20
+
21
+ # Usage
22
+
23
+ You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example:
24
+
25
+ ```python
26
+ from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
27
+
28
+ tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python")
29
+ model = AutoModelWithLMHead.from_pretrained("tals/roberta_python")
30
+
31
+ demo = pipeline("fill-mask", model=model, tokenizer=tokenizer)
32
+
33
+ code = """sum= 0
34
+ for i in range(<mask>):
35
+ sum += i
36
+ assert sum == 6
37
+ """
38
+ demo(code)
39
+ ```
40
+
41
+ # BibTeX entry and citation info
42
+
43
+ ```bibtex
44
+ @article{schuster2021programming,
45
+ title={Programming Puzzles},
46
+ author={Tal Schuster and Ashwin Kalyan and Oleksandr Polozov and Adam Tauman Kalai},
47
+ year={2021},
48
+ eprint={2106.05784},
49
+ archivePrefix={arXiv},
50
+ url={https://arxiv.org/abs/2106.05784}
51
+ }
52
+ ```