Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ur
5
+ ---
6
+ ----
7
+ mit
8
+ ---
9
+
10
+ # ayeshasameer/xlm-roberta-roman-urdu-sentiment
11
+
12
+ ## Model Description
13
+
14
+ The `ayeshasameer/xlm-roberta-roman-urdu-sentiment` model is a fine-tuned version of [XLM-RoBERTa](https://huggingface.co/xlm-roberta-base), specifically adapted for sentiment analysis tasks on Roman Urdu text. XLM-RoBERTa is a multilingual variant of RoBERTa, pre-trained on a diverse set of languages, making it highly versatile for various NLP tasks across multiple languages.
15
+
16
+ This model is trained to classify Roman Urdu text into three sentiment categories:
17
+ - Positive
18
+ - Neutral
19
+ - Negative
20
+
21
+ ## Model Architecture
22
+
23
+ - **Model Type:** XLM-RoBERTa
24
+ - **Number of Layers:** 12
25
+ - **Hidden Size:** 768
26
+ - **Number of Attention Heads:** 12
27
+ - **Intermediate Size:** 3072
28
+ - **Max Position Embeddings:** 514
29
+ - **Vocabulary Size:** 250002
30
+ - **Hidden Activation Function:** GELU
31
+ - **Hidden Dropout Probability:** 0.1
32
+ - **Attention Dropout Probability:** 0.1
33
+ - **Layer Norm Epsilon:** 1e-5
34
+
35
+ ## Training Data
36
+
37
+ The model was fine-tuned on a dataset of Roman Urdu text, labeled for sentiment analysis. The dataset includes text from social media, news comments, and other sources where Roman Urdu is commonly used. The labels for the dataset were:
38
+ - Positive
39
+ - Neutral
40
+ - Negative
41
+
42
+ ## Intended Use
43
+
44
+ The model is intended for sentiment analysis of Roman Urdu text, which is commonly used in informal settings like social media, chat applications, and user-generated content platforms. It can be used to understand the sentiment behind user comments, reviews, and other forms of text communication.
45
+
46
+ ## Example Usage
47
+
48
+ Here is an example of how to use this model with the Hugging Face Transformers library in Python:
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
52
+ import torch
53
+ from scipy.special import softmax
54
+
55
+ # Load the model and tokenizer
56
+ model_name = "ayeshasameer/xlm-roberta-roman-urdu-sentiment"
57
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
58
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
59
+
60
+ # Preprocess the input text
61
+ text = "Mein ek bahut acha insaan hon."
62
+ inputs = tokenizer(text, return_tensors="pt")
63
+
64
+ # Get model predictions
65
+ outputs = model(**inputs)
66
+ scores = outputs[0][0].detach().numpy()
67
+ scores = softmax(scores)
68
+
69
+ # Output the sentiment scores
70
+ sentiment = {
71
+ "Negative": scores[0],
72
+ "Neutral": scores[1],
73
+ "Positive": scores[2]
74
+ }
75
+ print(sentiment)
76
+
77
+ ```
78
+
79
+ ## Evaluation
80
+
81
+ The model was evaluated on a held-out test set of Roman Urdu text and achieved the following performance metrics:
82
+ - **Accuracy:** 0.XX
83
+ - **Precision:** 0.XX
84
+ - **Recall:** 0.XX
85
+ - **F1 Score:** 0.XX
86
+
87
+ These metrics indicate the model's ability to correctly classify the sentiment of Roman Urdu text.
88
+
89
+ ## Limitations
90
+
91
+ While the model performs well on the provided dataset, there are some limitations:
92
+ - The model may not generalize well to domains or types of text that were not represented in the training data.
93
+ - Misclassifications can occur, especially with text that contains sarcasm, slang, or context-specific language that the model was not trained on.
94
+ - The model's performance is dependent on the quality and representativeness of the training data.
95
+
96
+ ## Ethical Considerations
97
+
98
+ When using the model, it is essential to consider the ethical implications:
99
+ - Ensure that the text being analyzed does not contain sensitive or private information.
100
+ - Be mindful of potential biases in the training data, which could affect the model's predictions.
101
+ - Use the model responsibly, especially in applications that may impact individuals or communities.
102
+
103
+ ---