meriemm6
/

commit-classification-logreg

Model card Files Files and versions

meriemm6 commited on Dec 10, 2024

Commit

8132247

·

verified ·

1 Parent(s): 90c5899

Update README.md

Files changed (1) hide show

README.md +24 -4

README.md CHANGED Viewed

@@ -1,15 +1,25 @@
 ---
 license: mit
 ---
-# Commit Classification Model
-This is a Logistic Regression model for multi-label classification of commit messages.
 ## Files
 - `logistic_model.joblib`: Trained Logistic Regression model.
 - `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
 - `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
 ## How to Use
 To use this model, load the files and preprocess your data as follows:
@@ -22,8 +32,18 @@ tfidf_vectorizer = load("tfidf_vectorizer.joblib")
 mlb = load("label_binarizer.joblib")
 # Example usage
-new_messages = ["Fix bug in login system"]
 X_new_tfidf = tfidf_vectorizer.transform(new_messages)
 predictions = model.predict(X_new_tfidf)
 predicted_labels = mlb.inverse_transform(predictions)
-print(predicted_labels)

 ---
 license: mit
 ---
+# Dockerfile Commit Classification Model
+This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.
 ## Files
 - `logistic_model.joblib`: Trained Logistic Regression model.
 - `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
 - `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
+## Features
+- **Hybrid Approach**: Combines machine learning with rule-based adjustments for better classification.
+- **Dockerfile-Specific Labels**: Categorizes commit messages into predefined classes:
+  - `bug fix`
+  - `code refactoring`
+  - `feature addition`
+  - `maintenance/other`
+  - `Not enough information`
+- **Multi-Label Support**: Each commit message can belong to multiple categories.
 ## How to Use
 To use this model, load the files and preprocess your data as follows:
 mlb = load("label_binarizer.joblib")
 # Example usage
+new_messages = [
+    "Fixed an issue with the base image in Dockerfile",
+    "Added multistage builds to reduce image size",
+    "Updated Python version in Dockerfile to 3.10"
+]
 X_new_tfidf = tfidf_vectorizer.transform(new_messages)
+# Predict the labels
 predictions = model.predict(X_new_tfidf)
 predicted_labels = mlb.inverse_transform(predictions)
+# Print results
+for msg, labels in zip(new_messages, predicted_labels):
+    print(f"Message: {msg}")
+    print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")