CodeHima commited on
Commit
f4a6c92
1 Parent(s): d02db41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -2
README.md CHANGED
@@ -1,3 +1,135 @@
1
- # TOSBertV2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This model is trained to classify clauses in Terms of Service (ToS) documents into three unfairness levels: clearly_fair, potentially_unfair, and clearly_unfair.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ widget:
8
+ - text: "You have the right to use CommunityConnect for its intended purpose of connecting with others, sharing content responsibly, and engaging in constructive dialogue. You are responsible for the content you post and must respect the rights and privacy of others."
9
+ example_title: "Fair Clause"
10
+ - text: " We reserve the right to suspend, terminate, or restrict your access to the platform at any time and for any reason, without prior notice or explanation. This includes but is not limited to violations of our community guidelines or terms of service, as determined solely by ConnectWorld."
11
+ example_title: "Unfair Clause"
12
+ library_name: transformers
13
+ pipeline_tag: text-classification
14
+ tags:
15
+ - nlp
16
+ - bert
17
+ - TOS
18
+ ---
19
+ # BertTOS v2: Terms of Service Unfairness Classifier
20
 
21
+ ## Model Details
22
+
23
+ - **Model Name:** BertTOS v2
24
+ - **Model Type:** Fine-tuned BERT for sequence classification
25
+ - **Version:** 2.0
26
+ - **Language(s):** English
27
+ - **License:** [MIT]
28
+ - **Developer:** [Himanshu Mohanty]
29
+
30
+ ## Model Description
31
+
32
+ BertTOS v2 is a fine-tuned BERT model designed to classify clauses in Terms of Service (ToS) documents based on their unfairness level. This model can help users identify potentially problematic clauses in legal documents, particularly in the context of consumer protection.
33
+
34
+ ### Task
35
+
36
+ The model performs multi-class classification on individual sentences or clauses, categorizing them into three levels of unfairness:
37
+
38
+ 1. Clearly Fair
39
+ 2. Potentially Unfair
40
+ 3. Clearly Unfair
41
+
42
+ ### Training Data
43
+
44
+ The model was trained on the [CodeHima/TOS_Dataset](https://huggingface.co/datasets/CodeHima/TOS_Dataset) dataset, which contains annotated sentences from Terms of Service documents. Each sentence is labeled with one of the three unfairness levels.
45
+
46
+ ### Model Architecture
47
+
48
+ - Base Model: BERT (bert-base-uncased)
49
+ - Fine-tuning: Sequence classification head
50
+ - Input: Tokenized text (max length 512 tokens)
51
+ - Output: Probabilities for each unfairness level
52
+
53
+ ## Performance
54
+
55
+ The model's performance metrics on the test set:
56
+
57
+ - Accuracy: [0.8795761078998073]
58
+ - F1 Score (weighted): [0.885282]
59
+ - Precision (weighted): [0.883729]
60
+ - Recall (weighted): [0.889157]
61
+
62
+ ## Limitations
63
+
64
+ - The model is trained on English language ToS documents and may not perform well on other languages or legal contexts.
65
+ - Performance may vary depending on the specific wording and context of clauses.
66
+ - The model should be used as a tool to assist human judgment, not as a definitive legal assessment.
67
+
68
+ ## Ethical Considerations
69
+
70
+ - This model is intended to help identify potentially unfair clauses, but it should not be considered as legal advice.
71
+ - Users should be aware of potential biases in the training data and model predictions.
72
+ - The model's output should be reviewed by legal professionals for critical applications.
73
+
74
+ ## How to Use
75
+
76
+ You can use this model directly with the Hugging Face `transformers` library:
77
+
78
+ ```python
79
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
80
+ import torch
81
+
82
+ # Load model and tokenizer
83
+ model_name = "YourHuggingFaceUsername/TOSBertV2"
84
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
85
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
86
+
87
+ # Function to predict unfairness level
88
+ def predict_unfairness(text):
89
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
90
+
91
+ model.eval()
92
+ with torch.no_grad():
93
+ outputs = model(**inputs)
94
+
95
+ probabilities = torch.softmax(outputs.logits, dim=-1).squeeze()
96
+ predicted_class = torch.argmax(probabilities).item()
97
+
98
+ label_mapping = {0: 'clearly_fair', 1: 'potentially_unfair', 2: 'clearly_unfair'}
99
+ predicted_label = label_mapping[predicted_class]
100
+
101
+ return predicted_label, probabilities.tolist()
102
+
103
+ # Example usage
104
+ clause = "The company reserves the right to change these terms at any time without notice."
105
+ predicted_label, probabilities = predict_unfairness(clause)
106
+
107
+ print(f"Predicted unfairness level: {predicted_label}")
108
+ print("Probabilities:")
109
+ for label, prob in zip(['clearly_fair', 'potentially_unfair', 'clearly_unfair'], probabilities):
110
+ print(f"{label}: {prob:.4f}")
111
+ ```
112
+
113
+ ## Training
114
+
115
+ The model was trained using the following hyperparameters:
116
+
117
+ - Epochs: 3
118
+ - Batch Size: 16
119
+ - Learning Rate: [ ]
120
+ - Optimizer: AdamW
121
+ - Weight Decay: 0.01
122
+
123
+ ## Citation
124
+
125
+ If you use this model in your research, please cite:
126
+
127
+ ```bibtex
128
+ @misc{TOSBertV2,
129
+ author = {Himanshu Mohanty},
130
+ title = {TOSBertV2: is a fine-tuned BERT model designed to classify clauses in Terms of Service},
131
+ year = {2024},
132
+ publisher = {Hugging Face},
133
+ journal = {Hugging Face Model Hub},
134
+ howpublished = {\url{https://huggingface.co/CodeHima/TOSBertV2}}
135
+ }