skar0's picture
Upload from /root/refusal_direction/pipeline/runs/Llama-3.1-8B-Instruct/orthogonalized_model by /root/refusal_direction/pipeline/model_utils/model_base.py
d4cd5d9 verified
---
language: en
license: apache-2.0
---
# Model Card
## Metrics
- position: -2
- layer: 11
- refusal_score: -9.444916725158691
- refusal_score_baseline: 7.121610641479492
- steering_score: 9.821893692016602
- steering_score_baseline: -12.952377319335938
- kl_div_score: 0.020955339406569296
- no_filter: 19
- nan_values: 0
- late_layer: 35
- high_kl: 49
- low_refusal: 57