naltukhov
/

joke-generator-rus-t5

Text2Text Generation

jokes-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

naltukhov commited on Jan 8, 2023

Commit

d6dd911

•

1 Parent(s): 8294091

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -1,3 +1,35 @@
 ---
-license: apache-2.0
 ---

 ---
+license: afl-3.0
+language:
+- ru
+library_name: transformers
+pipeline_tag: text2text-generation
+tags:
+- humor
+- T5
+- jokes-generation
 ---
+## Task
+Model create for jokes generation task on Russian language.
+Generate jokes from scratch is too difficult task. Too make it easier jokes was splitted into setup and punch pairs.
+Each setup can produce infinite number of punches so inspiration was also introduced,
+which means main idea (or main word) of punch for given setup. In the real world, jokes come in different qualities (bad, good, funny, ...).
+Therefore, in order for the models to distinguish them from each other, a mark was introduced. It ranges from 0 (not a joke) to 5 (golden joke).
+## Info
+Model trained using flax on huge dataset with jokes and anekdots on different tasks:
+1. Span masks (dataset size: 850K)
+2. Conditional generation: generate inspiration by given setup (dataset size: 230K)
+3. Conditional generation: generate punch by given setup and inspiration (dataset size: 240K)
+4. Conditional generation: generate mark by given setup and punch (dataset size: 200K)
+## Ethical considerations and risks
+Model is fine-tuned on a large corpus of humorous text data scraped from from websites/telegram channels with anecdotes, shortliners, jokes.
+Text was not filtered for explicit content or assessed for existing biases.
+As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases
+in the underlying data.
+Please don't take it seriously.