Constitutional AI

alignment-handbook 's Collections

updated Feb 1, 2024

A collection of datasets and models that accompany the Constitutional AI recipe. See hf.co/blog/constitutional-ai for more details.

Upvote

Build error

7

7

Constitutional AI Demo

📜
mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24, 2024 • 367k • • 3.62k

Note The base model we aligned with Constitutional AI
HuggingFaceH4/mistral-7b-grok

Text Generation • Updated Feb 1, 2024 • 277 • 45

Note A fine-tuned version of Mistral 7B that was aligned to mimic the style of xAI's Grok assistant.
HuggingFaceH4/mistral-7b-anthropic

Text Generation • Updated Feb 1, 2024 • 25 • 9

Note A fine-tuned version of Mistral 7B that was aligned with Anthropic's constitution to mimic the style of their assistants.
mistralai/Mistral-7B-Instruct-v0.1

Text Generation • Updated Aug 22, 2024 • 717k • • 1.58k

Note The chat model we used to generate the Constitutional AI datasets via self-critique
Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 14.7k • 1.28k

Note The source of prompts used to generate Constitutional AI datasets
HuggingFaceH4/cai-conversation-harmless

Viewer • Updated Feb 2, 2024 • 44.8k • 277 • 15

Note The SFT and preference dataset that was generated via Anthropic's constitution.
HuggingFaceH4/grok-conversation-harmless

Viewer • Updated Feb 2, 2024 • 44.8k • 246 • 23

Note The SFT and preference dataset that was generated by tweaking Anthropic's constitution to produces responses similar to Grok.
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2

Note The original recipe from Anthropic on Consitutional AI. Note they used PPO for alignments, while we chose to use DPO as we find it much simpler to use in practice.

Upvote

Constitutional AI Demo