File size: 3,490 Bytes
c67ccda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0fbe754
c67ccda
 
 
 
 
 
 
 
 
 
 
 
 
3d45c2a
c67ccda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edeeba6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: mit
language:
- en
library_name: transformers
pipeline_tag: token-classification
tags:
- Social Bias
metrics:
- name: F1
  type: F1
  value: 0.7864
- name: Recall
  type: Recall
  value: 0.7617
base_model: "bert-base-uncased"
co2_eq_emissions:
  emissions: 8
  training_type: "fine-tuning"
  geographical_location: "Phoenix, AZ"
  hardware_used: "T4"
---

# Social Bias NER 

This NER model is fine-tuned from BERT, for *multi-label* token classification of:

- (GEN)eralizations
- (UNFAIR)ness
- (STEREO)types

You can [try it out in spaces](https://huggingface.co/spaces/ethical-spectacle/gusnet-v1-demo) :).

## How to Get Started with the Model

Transformers pipeline doesn't have a class for multi-label token classification, but you can use this code to load the model, and run it, and format the output.

```
import json
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
import gradio as gr

# init important things
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('ethical-spectacle/social-bias-ner')
model.eval()
model.to('cuda' if torch.cuda.is_available() else 'cpu')

# ids to labels we want to display
id2label = {
    0: 'O',
    1: 'B-STEREO',
    2: 'I-STEREO',
    3: 'B-GEN',
    4: 'I-GEN',
    5: 'B-UNFAIR',
    6: 'I-UNFAIR'
}

# predict function you'll want to use if using in your own code
def predict_ner_tags(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128)
    input_ids = inputs['input_ids'].to(model.device)
    attention_mask = inputs['attention_mask'].to(model.device)

    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        probabilities = torch.sigmoid(logits)
        predicted_labels = (probabilities > 0.5).int() # remember to try your own threshold

    result = []
    tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
    for i, token in enumerate(tokens):
        if token not in tokenizer.all_special_tokens:
            label_indices = (predicted_labels[0][i] == 1).nonzero(as_tuple=False).squeeze(-1)
            labels = [id2label[idx.item()] for idx in label_indices] if label_indices.numel() > 0 else ['O']
            result.append({"token": token, "labels": labels})

    return json.dumps(result, indent=4)
```



## GUS-Net Project Details:

#### Resources:

- Please visit this [collection](https://huggingface.co/collections/ethical-spectacle/gus-net-66edfe93801ea45d7a26a10f) for the datasets and model presented in the [GUS-Net paper](https://huggingface.co/papers/2410.08388).
- GUS-Net was implemented as part of [The Fair-ly Project](https://ethical-spectacle-research.gitbook.io/fair-ly), in a [Chrome Extension](https://chromewebstore.google.com/detail/fair-ly/geoaacpcopfegimhbdemjkocekpncfcc), and [PyPI package](https://ethical-spectacle-research.gitbook.io/fair-ly/toolkit/python-package).

#### Please cite: 
```
@article{powers2024gusnet,
  title={{GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes}},
  author={Maximus Powers and Umang Mavani and Harshitha Reddy Jonala and Ansh Tiwari and Hua Wei},
  journal={arXiv preprint arXiv:2410.08388},
  year={2024},
  url={https://arxiv.org/abs/2410.08388}
}
```

Give our research group, [Ethical Spectacle](https://huggingface.co/ethical-spectacle), a follow ;).