Triangle104 commited on
Commit
06f2a40
·
verified ·
1 Parent(s): c7c0bd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +373 -0
README.md CHANGED
@@ -13,6 +13,379 @@ license: apache-2.0
13
  This model was converted to GGUF format from [`TechxGenus/CursorCore-Yi-1.5B-LC`](https://huggingface.co/TechxGenus/CursorCore-Yi-1.5B-LC) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
14
  Refer to the [original model card](https://huggingface.co/TechxGenus/CursorCore-Yi-1.5B-LC) for more details on the model.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Use with llama.cpp
17
  Install llama.cpp through brew (works on Mac and Linux)
18
 
 
13
  This model was converted to GGUF format from [`TechxGenus/CursorCore-Yi-1.5B-LC`](https://huggingface.co/TechxGenus/CursorCore-Yi-1.5B-LC) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
14
  Refer to the [original model card](https://huggingface.co/TechxGenus/CursorCore-Yi-1.5B-LC) for more details on the model.
15
 
16
+ ---
17
+ Model details:
18
+ -
19
+ CursorCore: Assist Programming through Aligning Anything
20
+
21
+ [📄arXiv] | [🤗HF Paper] | [🤖Models] | [🛠️Code] | [Web] | [Discord]
22
+
23
+ CursorCore: Assist Programming through Aligning Anything
24
+ Introduction
25
+ Models
26
+ Usage
27
+ 1) Normal chat
28
+ 2) Assistant-Conversation
29
+ 3) Web Demo
30
+ Future Work
31
+ Citation
32
+ Contribution
33
+
34
+ Introduction
35
+
36
+ CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read our paper to learn more.
37
+
38
+ conversation
39
+
40
+ CursorWeb
41
+ Models
42
+
43
+ Our models have been open-sourced on Hugging Face. You can access our models here: CursorCore-Series. We also provide pre-quantized weights for GPTQ and AWQ here: CursorCore-Quantization
44
+ Usage
45
+
46
+ Here are some examples of how to use our model:
47
+ 1) Normal chat
48
+
49
+ Script:
50
+
51
+ import torch
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ "TechxGenus/CursorCore-Yi-9B",
57
+ torch_dtype=torch.bfloat16,
58
+ device_map="auto"
59
+ )
60
+
61
+ messages = [
62
+ {"role": "user", "content": "Hi!"},
63
+ ]
64
+ prompt = tokenizer.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+
70
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
71
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512)
72
+ print(tokenizer.decode(outputs[0]))
73
+
74
+ Output:
75
+
76
+ <|im_start|>system
77
+ You are a helpful programming assistant.<|im_end|>
78
+ <|im_start|>user
79
+ Hi!<|im_end|>
80
+ <|im_start|>assistant
81
+ Hello! I'm an AI language model and I can help you with any programming questions you might have. What specific problem or task are you trying to solve?<|im_end|>
82
+
83
+ 2) Assistant-Conversation
84
+
85
+ In our work, we introduce a new framework of AI-assisted programming task. It is designed for aligning anything during programming process, used for the implementation of features like Tab and Inline Chat.
86
+
87
+ Script 1:
88
+
89
+ import torch
90
+ from transformers import AutoTokenizer, AutoModelForCausalLM
91
+ from eval.utils import prepare_input_for_wf
92
+
93
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
94
+ model = AutoModelForCausalLM.from_pretrained(
95
+ "TechxGenus/CursorCore-Yi-9B",
96
+ torch_dtype=torch.bfloat16,
97
+ device_map="auto"
98
+ )
99
+ sample = {
100
+ "history": [
101
+ {
102
+ "type": "code",
103
+ "lang": "python",
104
+ "code": """def quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
105
+ }
106
+ ],
107
+ "current": {
108
+ "type": "code",
109
+ "lang": "python",
110
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
111
+ },
112
+ "user": ""
113
+ }
114
+
115
+ prompt = tokenizer.apply_chat_template(
116
+ prepare_input_for_wf(sample),
117
+ tokenize=False,
118
+ chat_template="assistant-conversation",
119
+ add_generation_prompt=True
120
+ )
121
+
122
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
123
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
124
+ print(tokenizer.decode(outputs[0]))
125
+
126
+ Output 1:
127
+
128
+ <|im_start|>system
129
+ You are a helpful programming assistant.<|im_end|>
130
+ <|im_start|>history
131
+ ```python
132
+ def quick_sort(arr):
133
+ if len(arr) <= 1:
134
+ return arr
135
+ pivot = arr[len(arr) // 2]
136
+ left = [x for x in arr if x < pivot]
137
+ middle = [x for x in arr if x == pivot]
138
+ right = [x for x in arr if x > pivot]
139
+ return quick_sort(left) + middle + quick_sort(right)
140
+ ```<|im_end|>
141
+ <|im_start|>current
142
+ ```python
143
+ def quick_sort(array):
144
+ if len(arr) <= 1:
145
+ return arr
146
+ pivot = arr[len(arr) // 2]
147
+ left = [x for x in arr if x < pivot]
148
+ middle = [x for x in arr if x == pivot]
149
+ right = [x for x in arr if x > pivot]
150
+ return quick_sort(left) + middle + quick_sort(right)
151
+ ```<|im_end|>
152
+ <|im_start|>assistant
153
+ <|next_start|>```python
154
+ def quick_sort(array):
155
+ if len(array) <= 1:
156
+ return array
157
+ pivot = array[len(array) // 2]
158
+ left = [x for x in array if x < pivot]
159
+ middle = [x for x in array if x == pivot]
160
+ right = [x for x in array if x > pivot]
161
+ return quick_sort(left) + middle + quick_sort(right)
162
+ ```<|next_end|>
163
+ The user has provided a revised code change that corrects the parameter name from `arr` to `array` in the `quick_sort` function. This change ensures consistency in the function definition and avoids potential confusion or errors.
164
+
165
+ To implement this, we will:
166
+ 1. Update the parameter name in the function definition from `arr` to `array`.
167
+ 2. Ensure that all references to `arr` within the function are updated to `array`.
168
+
169
+ This will make the function definition and internal references consistent, improving code readability and maintainability.<|im_end|>
170
+
171
+ Script 2:
172
+
173
+ import torch
174
+ from transformers import AutoTokenizer, AutoModelForCausalLM
175
+ from eval.utils import prepare_input_for_wf
176
+
177
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-9B")
178
+ model = AutoModelForCausalLM.from_pretrained(
179
+ "TechxGenus/CursorCore-Yi-9B",
180
+ torch_dtype=torch.bfloat16,
181
+ device_map="auto"
182
+ )
183
+ sample = {
184
+ "history": [],
185
+ "current": {
186
+ "type": "code",
187
+ "lang": "python",
188
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
189
+ },
190
+ "user": "Add Docstring."
191
+ }
192
+
193
+ prompt = tokenizer.apply_chat_template(
194
+ prepare_input_for_wf(sample),
195
+ tokenize=False,
196
+ chat_template="assistant-conversation",
197
+ add_generation_prompt=True
198
+ )
199
+
200
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
201
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
202
+ print(tokenizer.decode(outputs[0]))
203
+
204
+ Output 2:
205
+
206
+ <|im_start|>system
207
+ You are a helpful programming assistant.<|im_end|>
208
+ <|im_start|>current
209
+ ```python
210
+ def quick_sort(array):
211
+ if len(arr) <= 1:
212
+ return arr
213
+ pivot = arr[len(arr) // 2]
214
+ left = [x for x in arr if x < pivot]
215
+ middle = [x for x in arr if x == pivot]
216
+ right = [x for x in arr if x > pivot]
217
+ return quick_sort(left) + middle + quick_sort(right)
218
+ ```<|im_end|>
219
+ <|im_start|>user
220
+ Add Docstring.<|im_end|>
221
+ <|im_start|>assistant
222
+ <|next_start|>```python
223
+ def quick_sort(array):
224
+ """
225
+ This is an implementation of the quick sort algorithm.
226
+ """
227
+ if len(arr) <= 1:
228
+ return arr
229
+ pivot = arr[len(arr) // 2]
230
+ left = [x for x in arr if x < pivot]
231
+ middle = [x for x in arr if x == pivot]
232
+ right = [x for x in arr if x > pivot]
233
+ return quick_sort(left) + middle + quick_sort(right)
234
+ ```<|next_end|><|im_end|>
235
+
236
+ For models in Locate-and-Change (LC) and Search-and-Replace (SR) formats, the output examples are as follows:
237
+
238
+ Script for LC:
239
+
240
+ import torch
241
+ from transformers import AutoTokenizer, AutoModelForCausalLM
242
+ from eval.utils import prepare_input_for_lc
243
+
244
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-LC")
245
+ model = AutoModelForCausalLM.from_pretrained(
246
+ "TechxGenus/CursorCore-Yi-1.5B-LC",
247
+ torch_dtype=torch.bfloat16,
248
+ device_map="auto"
249
+ )
250
+ sample = {
251
+ "history": [],
252
+ "current": {
253
+ "type": "code",
254
+ "lang": "python",
255
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
256
+ },
257
+ "user": "Add Docstring."
258
+ }
259
+
260
+ prompt = tokenizer.apply_chat_template(
261
+ prepare_input_for_lc(sample),
262
+ tokenize=False,
263
+ chat_template="assistant-conversation",
264
+ add_generation_prompt=True
265
+ )
266
+
267
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
268
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
269
+ print(tokenizer.decode(outputs[0]))
270
+
271
+ Output for LC:
272
+
273
+ <|im_start|>system
274
+ You are a helpful programming assistant.<|im_end|>
275
+ <|im_start|>current
276
+ ```python
277
+ 1 def quick_sort(array):
278
+ 2 if len(arr) <= 1:
279
+ 3 return arr
280
+ 4 pivot = arr[len(arr) // 2]
281
+ 5 left = [x for x in arr if x < pivot]
282
+ 6 middle = [x for x in arr if x == pivot]
283
+ 7 right = [x for x in arr if x > pivot]
284
+ 8 return quick_sort(left) + middle + quick_sort(right)
285
+ ```<|im_end|>
286
+ <|im_start|>user
287
+ Add Docstring.<|im_end|>
288
+ <|im_start|>assistant
289
+ <|next_start|>1,1
290
+ ```
291
+ '''This function will sort an array using quick sort algorithm'''
292
+ ```<|next_end|>
293
+ To enhance the readability and maintainability of the code, we should add a docstring to the `quick_sort` function. A docstring provides a clear description of what the function does, which is particularly useful for other developers who might use or modify this code in the future.
294
+
295
+ The docstring will be added immediately after the function definition, explaining that the function uses the quick sort algorithm to sort an array. This will make the code more self-explanatory and easier to understand.
296
+
297
+ Here's the plan:
298
+ 1. Add a docstring at the beginning of the `quick_sort` function.
299
+ 2. Ensure the docstring is clear and concise, describing the purpose of the function.
300
+
301
+ This modification will improve the code's documentation without altering its functionality.<|im_end|>
302
+
303
+ Script for SR:
304
+
305
+ import torch
306
+ from transformers import AutoTokenizer, AutoModelForCausalLM
307
+ from eval.utils import prepare_input_for_sr
308
+
309
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CursorCore-Yi-1.5B-SR")
310
+ model = AutoModelForCausalLM.from_pretrained(
311
+ "TechxGenus/CursorCore-Yi-1.5B-SR",
312
+ torch_dtype=torch.bfloat16,
313
+ device_map="auto"
314
+ )
315
+ sample = {
316
+ "history": [],
317
+ "current": {
318
+ "type": "code",
319
+ "lang": "python",
320
+ "code": """def quick_sort(array):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)"""
321
+ },
322
+ "user": "Add Docstring."
323
+ }
324
+
325
+ prompt = tokenizer.apply_chat_template(
326
+ prepare_input_for_sr(sample),
327
+ tokenize=False,
328
+ chat_template="assistant-conversation",
329
+ add_generation_prompt=True
330
+ )
331
+
332
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
333
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=512, do_sample=False)
334
+ print(tokenizer.decode(outputs[0]))
335
+
336
+ Output for SR:
337
+
338
+ <|im_start|>system
339
+ You are a helpful programming assistant.<|im_end|>
340
+ <|im_start|>current
341
+ ```python
342
+ def quick_sort(array):
343
+ if len(arr) <= 1:
344
+ return arr
345
+ pivot = arr[len(arr) // 2]
346
+ left = [x for x in arr if x < pivot]
347
+ middle = [x for x in arr if x == pivot]
348
+ right = [x for x in arr if x > pivot]
349
+ return quick_sort(left) + middle + quick_sort(right)
350
+ ```<|im_end|>
351
+ <|im_start|>user
352
+ Add Docstring.<|im_end|>
353
+ <|im_start|>assistant
354
+ <|next_start|>```python
355
+ def quick_sort(array):
356
+ <|search_and_replace|>
357
+ def quick_sort(array):
358
+ """
359
+ This function implements quick sort algorithm
360
+ """
361
+ ```<|next_end|><|im_end|>
362
+
363
+ 3) Web Demo
364
+
365
+ We create a web demo for CursorCore. Please visit CursorWeb for more details.
366
+ Future Work
367
+
368
+ CursorCore is still in a very early stage, and lots of work is needed to achieve a better user experience. For example:
369
+
370
+ Repository-level editing support
371
+ Better and faster editing formats
372
+ Better user interface and presentation
373
+ ...
374
+
375
+ Citation
376
+
377
+ @article{jiang2024cursorcore,
378
+ title = {CursorCore: Assist Programming through Aligning Anything},
379
+ author = {Hao Jiang and Qi Liu and Rui Li and Shengyu Ye and Shijin Wang},
380
+ year = {2024},
381
+ journal = {arXiv preprint arXiv: 2410.07002}
382
+ }
383
+
384
+ Contribution
385
+
386
+ Contributions are welcome! If you find any bugs or have suggestions for improvements, please open an issue or submit a pull request.
387
+
388
+ ---
389
  ## Use with llama.cpp
390
  Install llama.cpp through brew (works on Mac and Linux)
391