skar0's picture
Upload from /root/refusal_direction/pipeline/runs/gemma-2-9b-it/orthogonalized_model by /root/refusal_direction/pipeline/model_utils/model_base.py
7aba487 verified
metadata
language: en
license: apache-2.0

Model Card

Metrics

  • position: -1
  • layer: 31
  • refusal_score: -9.527278900146484
  • refusal_score_baseline: 6.786408424377441
  • steering_score: 2.9693429470062256
  • steering_score_baseline: -11.083024978637695
  • kl_div_score: 0.04789839362178634
  • no_filter: 11
  • nan_values: 0
  • late_layer: 45
  • high_kl: 87
  • low_refusal: 67