Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: mistralai/Mistral-7B-v0.1
hub_model_id: chaosIsRythmic/mimic3-mistral-7B-v0.1

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  # This will be the path used for the data when it is saved to the Volume in the cloud.
  - path: data.jsonl
    ds_type: json
    type:
      # JSONL file contains question, context, answer fields per line.
      # This gets mapped to instruction, input, output axolotl tags.
      field_instruction: question
      field_input: context
      field_output: answer
      # Format is used by axolotl to generate the prompt.
      format: |-
        [INST] Using the medical notes below, assign the right ICD-9 codes.
        {input}
        {instruction} [/INST]

tokens: # add new control tokens from the dataset to the model
  - "[INST]"
  - " [/INST]"
  - "[SQL]"
  - " [/SQL]"

dataset_prepared_path: last_run_prepared
val_set_size: 0.2
output_dir: ./lora-out

sequence_len: 4096
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
  - embed_tokens
  - lm_head

wandb_project:  mimic3
wandb_entity:
wandb_watch:
wandb_run_id:

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

gradient_accumulation_steps: 1
micro_batch_size: 6
num_epochs: 6
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0001

bf16: auto
fp16: false
tf32: false
train_on_inputs: false
group_by_length: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
saves_per_epoch: 1
evals_per_epoch: 4
eval_max_new_tokens: 128
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

mimic3-mistral-7B-v0.1

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6757

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 6
  • eval_batch_size: 6
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 12
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss
1.9923 0.0013 1 2.1006
0.3728 0.2506 200 0.3790
0.3122 0.5013 400 0.3571
0.305 0.7519 600 0.3203
0.2929 1.0025 800 0.3158
0.2873 1.2531 1000 0.3000
0.2654 1.5038 1200 0.2971
0.3343 1.7544 1400 0.2846
0.2272 2.0050 1600 0.2901
0.1976 2.2556 1800 0.2900
0.2315 2.5063 2000 0.2829
0.1913 2.7569 2200 0.2852
0.2578 3.0075 2400 0.2809
0.1614 3.2581 2600 0.3104
0.1526 3.5088 2800 0.3171
0.1712 3.7594 3000 0.3042
0.1016 4.0100 3200 0.3367
0.0658 4.2607 3400 0.4388
0.0636 4.5113 3600 0.4601
0.0534 4.7619 3800 0.4398
0.0363 5.0125 4000 0.4785
0.0016 5.2632 4200 0.6498
0.0183 5.5138 4400 0.6769
0.0185 5.7644 4600 0.6757

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chaosIsRythmic/mimic3-mistral-7B-v0.1

Adapter
(1390)
this model