File size: 4,434 Bytes
89b4d66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262db71
89b4d66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: mit
language:
- en
widget:
  - text: "You have the right to use CommunityConnect for its intended purpose of connecting with others, sharing content responsibly, and engaging in constructive dialogue. You are responsible for the content you post and must respect the rights and privacy of others."
    example_title: "Fair Clause"
  - text: " We reserve the right to suspend, terminate, or restrict your access to the platform at any time and for any reason, without prior notice or explanation. This includes but is not limited to violations of our community guidelines or terms of service, as determined solely by ConnectWorld."
    example_title: "Unfair Clause"
metrics:
- accuracy
- precision
- f1
- recall
library_name: transformers
pipeline_tag: text-classification
---
# TOSRobertaV2: Terms of Service Fairness Classifier

## Model Description

TOSRobertaV2 is a fine-tuned RoBERTa-large model designed to classify clauses in Terms of Service (ToS) documents based on their fairness level. The model categorizes clauses into three classes: clearly fair, potentially unfair, and clearly unfair.

## Intended Use

This model is intended for:
- Analyzing Terms of Service documents for potential unfair clauses
- Assisting legal professionals in reviewing contracts
- Helping consumers understand the fairness of agreements they're entering into
- Supporting researchers studying fairness in legal documents

## Training Data

The model was trained on the CodeHima/TOS_DatasetV3, which contains labeled clauses from various Terms of Service documents.

## Training Procedure

- Base model: RoBERTa-large
- Training type: Fine-tuning
- Number of epochs: 5
- Optimizer: AdamW
- Learning rate: 2e-5
- Batch size: 8
- Weight decay: 0.01
- Training loss: 0.3851972973652529

## Evaluation Results

### Validation Set Performance

- Accuracy: 0.86
- F1 Score: 0.8588
- Precision: 0.8598
- Recall: 0.8600

### Test Set Performance

- Accuracy: 0.8651

### Training Progress

| Epoch | Training Loss | Validation Loss | Accuracy | F1     | Precision | Recall  |
|-------|---------------|-----------------|----------|--------|-----------|---------|
| 1     | 0.5391        | 0.493973        | 0.798095 | 0.7997 | 0.8056    | 0.79810 |
| 2     | 0.4621        | 0.489970        | 0.831429 | 0.8320 | 0.8330    | 0.83143 |
| 3     | 0.3954        | 0.674849        | 0.821905 | 0.8250 | 0.8349    | 0.82191 |
| 4     | 0.3783        | 0.717495        | 0.860000 | 0.8588 | 0.8598    | 0.86000 |
| 5     | 0.1542        | 0.881050        | 0.847619 | 0.8490 | 0.8514    | 0.84762 |

## Limitations

- The model's performance may vary on ToS documents from domains or industries not well-represented in the training data.
- It may struggle with highly complex or ambiguous clauses.
- The model's understanding of "fairness" is based on the training data and may not capture all nuances of legal fairness.

## Ethical Considerations

- This model should not be used as a substitute for professional legal advice.
- There may be biases present in the training data that could influence the model's judgments.
- Users should be aware that the concept of "fairness" in legal documents can be subjective and context-dependent.

## How to Use

You can use this model directly with the Hugging Face `transformers` library:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("CodeHima/TOSRobertaV2")
model = AutoModelForSequenceClassification.from_pretrained("CodeHima/TOSRobertaV2")

text = "Your clause here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

probabilities = torch.softmax(logits, dim=1)
predicted_class = torch.argmax(probabilities, dim=1).item()

classes = ['clearly fair', 'potentially unfair', 'clearly unfair']
print(f"Predicted class: {classes[predicted_class]}")
print(f"Probabilities: {probabilities[0].tolist()}")
```

## Citation

If you use this model in your research, please cite:

```
@misc{TOSRobertaV2,
  author = {CodeHima},
  title = {TOSRobertaV2: Terms of Service Fairness Classifier},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/CodeHima/TOSRobertaV2}}
}
```

## License

This model is released under the MIT license.