meriemm6 commited on
Commit
8132247
·
verified ·
1 Parent(s): 90c5899

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -4
README.md CHANGED
@@ -1,15 +1,25 @@
1
  ---
2
  license: mit
3
  ---
4
- # Commit Classification Model
5
 
6
- This is a Logistic Regression model for multi-label classification of commit messages.
7
 
8
  ## Files
9
  - `logistic_model.joblib`: Trained Logistic Regression model.
10
  - `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
11
  - `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
12
 
 
 
 
 
 
 
 
 
 
 
13
  ## How to Use
14
  To use this model, load the files and preprocess your data as follows:
15
 
@@ -22,8 +32,18 @@ tfidf_vectorizer = load("tfidf_vectorizer.joblib")
22
  mlb = load("label_binarizer.joblib")
23
 
24
  # Example usage
25
- new_messages = ["Fix bug in login system"]
 
 
 
 
26
  X_new_tfidf = tfidf_vectorizer.transform(new_messages)
 
 
27
  predictions = model.predict(X_new_tfidf)
28
  predicted_labels = mlb.inverse_transform(predictions)
29
- print(predicted_labels)
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # Dockerfile Commit Classification Model
5
 
6
+ This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.
7
 
8
  ## Files
9
  - `logistic_model.joblib`: Trained Logistic Regression model.
10
  - `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing.
11
  - `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels.
12
 
13
+ ## Features
14
+ - **Hybrid Approach**: Combines machine learning with rule-based adjustments for better classification.
15
+ - **Dockerfile-Specific Labels**: Categorizes commit messages into predefined classes:
16
+ - `bug fix`
17
+ - `code refactoring`
18
+ - `feature addition`
19
+ - `maintenance/other`
20
+ - `Not enough information`
21
+ - **Multi-Label Support**: Each commit message can belong to multiple categories.
22
+
23
  ## How to Use
24
  To use this model, load the files and preprocess your data as follows:
25
 
 
32
  mlb = load("label_binarizer.joblib")
33
 
34
  # Example usage
35
+ new_messages = [
36
+ "Fixed an issue with the base image in Dockerfile",
37
+ "Added multistage builds to reduce image size",
38
+ "Updated Python version in Dockerfile to 3.10"
39
+ ]
40
  X_new_tfidf = tfidf_vectorizer.transform(new_messages)
41
+
42
+ # Predict the labels
43
  predictions = model.predict(X_new_tfidf)
44
  predicted_labels = mlb.inverse_transform(predictions)
45
+
46
+ # Print results
47
+ for msg, labels in zip(new_messages, predicted_labels):
48
+ print(f"Message: {msg}")
49
+ print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")