DeathReaper0965
/

flan-t5-samsum-lora-RLAIF-detoxified

Text2Text Generation

text-generation

toxicity-reduction

reinforcement-learning

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

DeathReaper0965 commited on Aug 17, 2023

Commit

e5ce719

•

1 Parent(s): ef71046

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ inference:
 ---
 # Flan-T5 (base-sized) Dialogue Summarization with reduced toxicity using RLAIF
-This model is a two-fold fine-tuned [Flan-T5 model](https://huggingface.co/google/flan-t5-base) firstly on the [SAMSUM](https://huggingface.co/datasets/samsum) dataset followed by further fine-tuning using Reinforcement Learning from AI Feedback(RLAIF) to detoxify model outputs. <br>
 Anthropic's Costitutional AI [paper](https://arxiv.org/abs/2212.08073) from 2022, provides some amazing insights on how RLAIF can be leveraged. Do check out if interested!<br>
 More, specifically I've fine-tuned this model on a single downstream task of Dialogue Summarization on the above mentioned dataset with a primary objective of reduced toxicity in generated summaries.

 ---
 # Flan-T5 (base-sized) Dialogue Summarization with reduced toxicity using RLAIF
+This model is a **two-fold fine-tuned** [Flan-T5 model](https://huggingface.co/google/flan-t5-base) firstly on the [SAMSUM](https://huggingface.co/datasets/samsum) dataset followed by further fine-tuning using **Reinforcement Learning from AI Feedback(RLAIF)** to detoxify model outputs. <br>
 Anthropic's Costitutional AI [paper](https://arxiv.org/abs/2212.08073) from 2022, provides some amazing insights on how RLAIF can be leveraged. Do check out if interested!<br>
 More, specifically I've fine-tuned this model on a single downstream task of Dialogue Summarization on the above mentioned dataset with a primary objective of reduced toxicity in generated summaries.