A newer version of the Streamlit SDK is available:
1.44.0
Modify DPO Training Configuration
This section introduces config parameters related to DPO (Direct Preference Optimization) training. For more details on XTuner config files, please refer to Modifying Training Configuration.
Loss Function
In DPO training, you can choose different types of loss functions according to your needs. XTuner provides various loss function options, such as sigmoid
, hinge
, ipo
, etc. You can select the desired loss function type by setting the dpo_loss_type
parameter.
Additionally, you can control the temperature coefficient in the loss function by adjusting the loss_beta
parameter. The label_smoothing
parameter can be used for smoothing labels.
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
dpo_loss_type = 'sigmoid' # One of ['sigmoid', 'hinge', 'ipo', 'kto_pair', 'sppo_hard', 'nca_pair', 'robust']
loss_beta = 0.1
label_smoothing = 0.0
Modifying the Model
Users can modify pretrained_model_name_or_path
to change the pretrained model.
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b-sft'
Training Data
In DPO training, you can specify the maximum number of tokens for a single sample sequence using the max_length
parameter. XTuner will automatically truncate or pad the data.
# Data
max_length = 2048
In the configuration file, we use the train_dataset
field to specify the training dataset. You can specify the dataset loading method using the dataset
field and the dataset mapping function using the dataset_map_fn
field.
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
sampler = SequenceParallelSampler \
if sequence_parallel_size > 1 else DefaultSampler
train_dataset = dict(
type=build_preference_dataset,
dataset=dict(type=load_dataset, path='mlabonne/orpo-dpo-mix-40k'),
tokenizer=tokenizer,
max_length=max_length,
dataset_map_fn=orpo_dpo_mix_40k_map_fn,
is_dpo=True,
is_reward=False,
reward_token_id=-1,
num_proc=32,
use_varlen_attn=use_varlen_attn,
max_packed_length=max_packed_length,
shuffle_before_pack=True,
)
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=train_dataset,
sampler=dict(type=sampler, shuffle=True),
collate_fn=dict(
type=preference_collate_fn, use_varlen_attn=use_varlen_attn))
In the above configuration, we use load_dataset
to load the mlabonne/orpo-dpo-mix-40k
dataset from Hugging Face and use orpo_dpo_mix_40k_map_fn
as the dataset mapping function.
For more information on handling datasets and writing dataset mapping functions, please refer to the Preference Dataset Section.
Accelerating Training
When training with preference data, we recommend enabling the Variable-Length Attention Mechanism to avoid memory waste caused by length differences between chosen and rejected samples within a single preference. You can enable the variable-length attention mechanism by setting use_varlen_attn=True
.
XTuner also supports many training acceleration methods. For details on how to use them, please refer to the Acceleration Strategies Section.