Hezam commited on
Commit
21d186e
1 Parent(s): 7b9a548

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -2
README.md CHANGED
@@ -2,10 +2,27 @@
2
  language:
3
  - ar
4
  metrics:
5
- - accuracy
6
  - bleu
 
 
7
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
 
8
  ---
 
 
 
 
 
 
9
  category_mapping = {
10
  'Politics':1,
11
  'Finance':2,
@@ -14,4 +31,49 @@ category_mapping = {
14
  'Culture':5,
15
  'Tech':6,
16
  'Religion':7
17
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - ar
4
  metrics:
 
5
  - bleu
6
+ - accuracy
7
+ library_name: transformers
8
  pipeline_tag: text-classification
9
+ tags:
10
+ - t5
11
+ - Classification
12
+ - ArabicT5
13
+ - Text Classification
14
+ widget:
15
+ - example_title: الثقافي
16
+ - text: >
17
+ الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا
18
+ المغربيه متابعه تفاجا زوار موقع القناه الاولي المغربي
19
  ---
20
+
21
+ # # Arabic text classification using deep learning (ArabicT5)
22
+ - SANAD: Single-label Arabic News Articles Dataset for automatic text categorization
23
+ [https://www.researchgate.net/publication/333605992_SANAD_Single-Label_Arabic_News_Articles_Dataset_for_Automatic_Text_Categorization]
24
+ [https://data.mendeley.com/datasets/57zpx667y9/2]
25
+
26
  category_mapping = {
27
  'Politics':1,
28
  'Finance':2,
 
31
  'Culture':5,
32
  'Tech':6,
33
  'Religion':7
34
+ }
35
+
36
+ # # Training parameters
37
+
38
+ | | |
39
+ | :-------------------: | :-----------:|
40
+ | Training batch size | `8` |
41
+ | Evaluation batch size | `8` |
42
+ | Learning rate | `1e-4` |
43
+ | Max length input | `128` |
44
+ | Max length target | `3` |
45
+ | Number workers | `4` |
46
+ | Epoch | `2` |
47
+ | | |
48
+
49
+ # # Results
50
+
51
+ | | |
52
+ | :---------------------: | :-----------: |
53
+ | Validation Loss | `0.0479` |
54
+ | Accuracy | `96.%` |
55
+ | BLeU | `96%` |
56
+
57
+ # # Example usage
58
+ ```python
59
+
60
+ from transformers import T5ForConditionalGeneration, T5Tokenizer, pipeline
61
+
62
+ model_name = "Hezam/ArabicT5_Classification"
63
+ model = T5ForConditionalGeneration.from_pretrained(model_name)
64
+ tokenizer = T5Tokenizer.from_pretrained(model_name)
65
+ generation_pipeline = pipeline("text-classification",model=model,tokenizer=tokenizer)
66
+
67
+ text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه متابعه تفاجا زوار موقع القناه الاولي المغربي"
68
+ output= generation_pipeline(text,
69
+ num_beams=10,
70
+ max_length=3,
71
+ top_p=0.9,
72
+ repetition_penalty = 3.0,
73
+ no_repeat_ngram_size = 3)
74
+
75
+ output
76
+
77
+ ```bash
78
+ 5
79
+ ```