sarahyurick
commited on
Add title NemoCurator Instruction Data Guard
Browse files
README.md
CHANGED
@@ -5,12 +5,14 @@ tags:
|
|
5 |
license: other
|
6 |
---
|
7 |
|
|
|
|
|
8 |
# Model Overview
|
9 |
|
10 |
## Description:
|
11 |
-
Instruction
|
12 |
It is trained on an instruction:response dataset and LLM poisoning attacks of such data.
|
13 |
-
Note that optimal use for Instruction
|
14 |
|
15 |
### License/Terms of Use:
|
16 |
[NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)
|
@@ -60,7 +62,7 @@ v1.0 <br>
|
|
60 |
* Synthetic <br>
|
61 |
|
62 |
## Evaluation Benchmarks:
|
63 |
-
Instruction
|
64 |
* Success on identifying LLM poisoning attacks, after the model was trained on examples of the attacks. <br>
|
65 |
* Success on identifying LLM poisoning attacks, but without training on examples of those attacks, at all. <br>
|
66 |
|
@@ -127,7 +129,7 @@ class InstructionDataGuardNet(torch.nn.Module, PyTorchModelHubMixin):
|
|
127 |
x = self.sigmoid(x)
|
128 |
return x
|
129 |
|
130 |
-
# Load Instruction
|
131 |
instruction_data_guard = InstructionDataGuardNet.from_pretrained("nvidia/instruction-data-guard")
|
132 |
instruction_data_guard = instruction_data_guard.to(device)
|
133 |
instruction_data_guard = instruction_data_guard.eval()
|
|
|
5 |
license: other
|
6 |
---
|
7 |
|
8 |
+
# NemoCurator Instruction Data Guard
|
9 |
+
|
10 |
# Model Overview
|
11 |
|
12 |
## Description:
|
13 |
+
Instruction Data Guard is a deep-learning classification model that helps identify LLM poisoning attacks in datasets.
|
14 |
It is trained on an instruction:response dataset and LLM poisoning attacks of such data.
|
15 |
+
Note that optimal use for Instruction Data Guard is for instruction:response datasets.
|
16 |
|
17 |
### License/Terms of Use:
|
18 |
[NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)
|
|
|
62 |
* Synthetic <br>
|
63 |
|
64 |
## Evaluation Benchmarks:
|
65 |
+
Instruction Data Guard is evaluated based on two overarching criteria: <br>
|
66 |
* Success on identifying LLM poisoning attacks, after the model was trained on examples of the attacks. <br>
|
67 |
* Success on identifying LLM poisoning attacks, but without training on examples of those attacks, at all. <br>
|
68 |
|
|
|
129 |
x = self.sigmoid(x)
|
130 |
return x
|
131 |
|
132 |
+
# Load Instruction Data Guard classifier
|
133 |
instruction_data_guard = InstructionDataGuardNet.from_pretrained("nvidia/instruction-data-guard")
|
134 |
instruction_data_guard = instruction_data_guard.to(device)
|
135 |
instruction_data_guard = instruction_data_guard.eval()
|