DeepakKumarMSL commited on
Commit
328056f
Β·
verified Β·
1 Parent(s): c64fe57

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Code Generation Model – Fine-Tuned `Salesforce/codegen-350M-multi`
2
+
3
+ This repository contains a fine-tuned version of the [`Salesforce/codegen-350M-multi`](https://huggingface.co/Salesforce/codegen-350M-multi) model. It generates code snippets based on natural language or function signature prompts.
4
+
5
+ ---
6
+
7
+ ## πŸ“¦ Base Model
8
+
9
+ - **Model**: `Salesforce/codegen-350M-multi`
10
+ - **Architecture**: Causal LM (Decoder-only Transformer)
11
+ - **Parameters**: ~350M
12
+ - **Supports**: Python, JavaScript, Java, and more
13
+ - **Quantized**: βœ… FP16 using `bitsandbytes` (optional)
14
+
15
+ ---
16
+
17
+ ## πŸ“š Dataset
18
+
19
+ ### Dataset: [code_x_glue_cc_code_to_text](https://huggingface.co/datasets/code_x_glue_cc_code_to_text)
20
+
21
+ - **Source**: Hugging Face Datasets
22
+ - **Description**: Dataset of code snippets (in Python) and corresponding natural language docstrings.
23
+
24
+ ```python
25
+ from datasets import load_dataset
26
+
27
+ dataset = load_dataset("code_x_glue_cc_code_to_text", "python")
28
+ ```
29
+
30
+ # πŸ“Š Evaluation (Scoring)
31
+ Metric: BLEU or CodeBLEU (you can also use exact match, ROUGE, etc.)
32
+
33
+ ```python
34
+
35
+ from datasets import load_metric
36
+
37
+ bleu = load_metric("bleu")
38
+ bleu_score = bleu.compute(predictions=["generated_code"], references=["reference_code"])
39
+ print("BLEU Score:", bleu_score)
40
+ ```
41
+
42
+ # πŸ“ Folder Structure
43
+
44
+ finetuned_codegen_350M/
45
+ β”œβ”€β”€ config.json
46
+ β”œβ”€β”€ pytorch_model.bin
47
+ β”œβ”€β”€ tokenizer_config.json
48
+ β”œβ”€β”€ tokenizer.json
49
+ β”œβ”€β”€ special_tokens_map.json
50
+ β”œβ”€β”€ vocab.json
51
+ β”œβ”€β”€ merges.txt
52
+ β”œβ”€β”€ training_args.bin
53
+ └── README.md
54
+
55
+ # πŸ’¬ Inference Example
56
+
57
+ ```python
58
+
59
+ from transformers import pipeline
60
+
61
+ pipe = pipeline("text-generation", model="./finetuned_codegen_350M", device=0)
62
+
63
+ prompt = "def is_prime(n):"
64
+ result = pipe(prompt, max_length=100, do_sample=True)
65
+ print(result[0]["generated_text"])
66
+