naltukhov commited on
Commit
d6dd911
1 Parent(s): 8294091

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -1,3 +1,35 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: afl-3.0
3
+ language:
4
+ - ru
5
+ library_name: transformers
6
+ pipeline_tag: text2text-generation
7
+ tags:
8
+ - humor
9
+ - T5
10
+ - jokes-generation
11
  ---
12
+
13
+
14
+ ## Task
15
+ Model create for jokes generation task on Russian language.
16
+ Generate jokes from scratch is too difficult task. Too make it easier jokes was splitted into setup and punch pairs.
17
+ Each setup can produce infinite number of punches so inspiration was also introduced,
18
+ which means main idea (or main word) of punch for given setup. In the real world, jokes come in different qualities (bad, good, funny, ...).
19
+ Therefore, in order for the models to distinguish them from each other, a mark was introduced. It ranges from 0 (not a joke) to 5 (golden joke).
20
+
21
+
22
+ ## Info
23
+ Model trained using flax on huge dataset with jokes and anekdots on different tasks:
24
+ 1. Span masks (dataset size: 850K)
25
+ 2. Conditional generation: generate inspiration by given setup (dataset size: 230K)
26
+ 3. Conditional generation: generate punch by given setup and inspiration (dataset size: 240K)
27
+ 4. Conditional generation: generate mark by given setup and punch (dataset size: 200K)
28
+
29
+
30
+ ## Ethical considerations and risks
31
+ Model is fine-tuned on a large corpus of humorous text data scraped from from websites/telegram channels with anecdotes, shortliners, jokes.
32
+ Text was not filtered for explicit content or assessed for existing biases.
33
+ As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases
34
+ in the underlying data.
35
+ Please don't take it seriously.