Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,25 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
-
# Commit Classification Model
|
5 |
|
6 |
-
This is a Logistic Regression model for multi-label classification of commit messages.
|
7 |
|
8 |
## Files
|
9 |
- `logistic_model.joblib`: Trained Logistic Regression model.
|
10 |
- `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
|
11 |
- `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## How to Use
|
14 |
To use this model, load the files and preprocess your data as follows:
|
15 |
|
@@ -22,8 +32,18 @@ tfidf_vectorizer = load("tfidf_vectorizer.joblib")
|
|
22 |
mlb = load("label_binarizer.joblib")
|
23 |
|
24 |
# Example usage
|
25 |
-
new_messages = [
|
|
|
|
|
|
|
|
|
26 |
X_new_tfidf = tfidf_vectorizer.transform(new_messages)
|
|
|
|
|
27 |
predictions = model.predict(X_new_tfidf)
|
28 |
predicted_labels = mlb.inverse_transform(predictions)
|
29 |
-
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
# Dockerfile Commit Classification Model
|
5 |
|
6 |
+
This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.
|
7 |
|
8 |
## Files
|
9 |
- `logistic_model.joblib`: Trained Logistic Regression model.
|
10 |
- `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
|
11 |
- `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
|
12 |
|
13 |
+
## Features
|
14 |
+
- **Hybrid Approach**: Combines machine learning with rule-based adjustments for better classification.
|
15 |
+
- **Dockerfile-Specific Labels**: Categorizes commit messages into predefined classes:
|
16 |
+
- `bug fix`
|
17 |
+
- `code refactoring`
|
18 |
+
- `feature addition`
|
19 |
+
- `maintenance/other`
|
20 |
+
- `Not enough information`
|
21 |
+
- **Multi-Label Support**: Each commit message can belong to multiple categories.
|
22 |
+
|
23 |
## How to Use
|
24 |
To use this model, load the files and preprocess your data as follows:
|
25 |
|
|
|
32 |
mlb = load("label_binarizer.joblib")
|
33 |
|
34 |
# Example usage
|
35 |
+
new_messages = [
|
36 |
+
"Fixed an issue with the base image in Dockerfile",
|
37 |
+
"Added multistage builds to reduce image size",
|
38 |
+
"Updated Python version in Dockerfile to 3.10"
|
39 |
+
]
|
40 |
X_new_tfidf = tfidf_vectorizer.transform(new_messages)
|
41 |
+
|
42 |
+
# Predict the labels
|
43 |
predictions = model.predict(X_new_tfidf)
|
44 |
predicted_labels = mlb.inverse_transform(predictions)
|
45 |
+
|
46 |
+
# Print results
|
47 |
+
for msg, labels in zip(new_messages, predicted_labels):
|
48 |
+
print(f"Message: {msg}")
|
49 |
+
print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")
|