Edit Models filters

Inference status

Misc

Inference Endpoints

AutoTrain Compatible

text-generation-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Misc with no match

text-embeddings-inference

Models

4,235

Full-text search

Active filters: dpo

DUAL-GPO/phi-2-dpo-chatml-lora-40k-60k-i2

Updated Sep 11, 2024

NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01

Text Generation • Updated Sep 11, 2024 • 13

vincentlinzhu/dspv1_dpo_dspfmt

Updated Sep 11, 2024

NicholasCorrado/zephyr-7b-uf-rc-small-dpo

Text Generation • Updated Sep 11, 2024 • 15

NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.1

Text Generation • Updated Sep 11, 2024 • 10

NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01-1e

Text Generation • Updated Sep 11, 2024 • 13

lewtun/tmp-dpo

Text Generation • Updated Sep 11, 2024 • 13

SongTonyLi/gemma-2b-it-SFT-D1_chosen-then-DPO-D2a-orca

Text Generation • Updated Sep 11, 2024 • 10

CharlesLi/OpenELM-1_1B-DPO-full-self-improve

Text Generation • Updated Sep 11, 2024 • 5

QinLiuNLP/llama3-sudo-dpo-instruct-5epochs-jxkey

Updated Sep 11, 2024

dmariko/SmolLM-360M-Instruct-dpo-16k

Updated Sep 12, 2024 • 2

dmariko/SmolLM-1.7B-Instruct-dpo-15k

Updated Sep 17, 2024 • 4

dmariko/SmolLM-1.7B-Instruct-dpo-16k

Updated Sep 17, 2024 • 3

QinLiuNLP/llama3-sudo-dpo-instruct-100epochs-jxkey

Updated Sep 14, 2024

DUAL-GPO/phi-2-dpo-chatml-lora-40k-60k-v2-i2

Updated Sep 12, 2024

vincentlinzhu/dspv1_dpo_dspfmt_medium

Updated Sep 12, 2024

SongTonyLi/gemma-2b-it-SFT-D1_chosen-then-DPO-D2a-distilabel-math-preference

Text Generation • Updated Sep 12, 2024 • 8

vincentlinzhu/dspv1_dpo_llemmafmt_medium

Updated Sep 12, 2024

DUAL-GPO/phi-2-dpo-chatml-lora-0k-20k-i2

Updated Sep 13, 2024

LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V1.0

Updated Sep 12, 2024 • 4

Huertas97/smollm-gec-sftt-dpo

Text Generation • Updated Sep 12, 2024 • 9

SameedHussain/gemma-2-2b-it-Flight-Multi-Turn-V2-DPO

Text Generation • Updated Sep 12, 2024 • 5

Siddartha10/outputs_dpo

Text Generation • Updated Sep 12, 2024 • 13

SongTonyLi/gemma-2b-it-SFT-D1_chosen-then-DPO-D2a-HuggingFaceH4-ultrafeedback_binarized-Xlarge

Text Generation • Updated Sep 13, 2024 • 13

CharlesLi/OpenELM-1_1B-DPO-full-llama-improve-openelm

Text Generation • Updated Sep 13, 2024 • 8

maxmyn/c4ai-takehome-model-dpo

Text Generation • Updated Sep 15, 2024 • 15

CharlesLi/OpenELM-1_1B-DPO-full-max-4-reward

Text Generation • Updated Oct 7, 2024 • 9

CharlesLi/OpenELM-1_1B-DPO-full-max-12-reward

Text Generation • Updated Oct 7, 2024 • 10

DUAL-GPO/phi-2-ipo-chatml-lora-i1

Updated Sep 14, 2024 • 13

DUAL-GPO/phi-2-ipo-chatml-lora-10k-30k-i1

Updated Sep 14, 2024 • 7