skar0's picture
Upload from /root/refusal_direction/pipeline/runs/Qwen2.5-14B-Instruct/orthogonalized_model by /root/refusal_direction/pipeline/model_utils/model_base.py
ffd8ce2 verified
metadata
language: en
license: apache-2.0

Model Card

Metrics

  • position: -1
  • layer: 30
  • refusal_score: -13.144852638244629
  • refusal_score_baseline: 4.240177154541016
  • steering_score: 3.974774122238159
  • steering_score_baseline: -14.775822639465332
  • kl_div_score: 0.06443626040695713
  • no_filter: 7
  • nan_values: 0
  • late_layer: 50
  • high_kl: 135
  • low_refusal: 48