README.md · seniruk/qwen2.5coder-0.5B_commit

metadata

license: other
license_name: seniru-epasinghe
license_link: LICENSE
language:
  - en
base_model:
  - Qwen/Qwen2.5-Coder-0.5B
pipeline_tag: text-generation
datasets:
  - seniruk/git-diff_to_commit_msg_large

Finetuned-qwen2.5-coder-0.5B model on 100000 rows of a cutom dataset containing. git-differences and respective commit messages

Each row of the dataset was formatted as below to suit finetuning requirement of Qwen2.5-coder model

'### Instruction:\nGenerate a concise and meaningful commit message based on the provided git diff.\n\n### Git Diff:\n{a given git-difference as in the dataset rows}\n\n### Commit Message:\nAdding the squeezing in the cost fuction<|im_end|>'

Code for inference of the gguf model is given below

from llama_cpp import Llama

# Configuration
gguf_model_path = "qwen0.5-finetuned.gguf"  # Path to your GGUF file

# Define the commit message prompt (Minimal format, avoids assistant behavior)
commit_prompt = """Generate a meaningful commit message explaining all the changes in the provided Git diff.

### Git Diff:
{}

### Commit Message:"""  # Removed {} after "Commit Message:" to prevent pre-filled text.

# Git diff example for commit message generation
git_diff_example = """
diff --git a/index.html b/index.html
index 89abcde..f123456 100644
--- a/index.html
+++ b/index.html
@@ -5,16 +5,6 @@ <body>
     <h1>Welcome to My Page</h1>

-    <table border="1">
-        <tr>
-            <th>Name</th>
-            <th>Age</th>
-        </tr>
-        <tr>
-            <td>John Doe</td>
-            <td>30</td>
-        </tr>
-    </table>

+    <p>This is a newly added paragraph replacing the table.</p>
 </body>
</html>
"""

# Load the GGUF model with increased context size (32768)
modelGGUF = Llama(
    model_path=gguf_model_path,
    rope_scaling={"type": "linear", "factor": 2.0},
    chat_format=None,  # Disables any chat formatting
    n_ctx=32768,  # Set the context size explicitly
)

# Prepare the raw input prompt
input_prompt = commit_prompt.format(git_diff_example)

# Generate commit message
output = modelGGUF(
    input_prompt,
    max_tokens=64,
    temperature=0.6, # Balanced randomness
    top_p=0.8,      # Controls nucleus sampling
    top_k=50,       # Limits vocabulary selection
)

# Decode and print the output
commit_message = output["choices"][0]["text"].strip()

print("\nGenerated Commit Message:\n{}".format(commit_message))