Spaces:
Sleeping
Sleeping
migueldeguzmandev
commited on
Update app.py
Browse files
app.py
CHANGED
@@ -33,6 +33,14 @@ def generate_response(input_text, temperature):
|
|
33 |
response = tokenizer.decode(output[0], skip_special_tokens=True)
|
34 |
return response.replace(input_text, "").strip()
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
# Create the Gradio interface
|
37 |
interface = gr.Interface(
|
38 |
fn=generate_response,
|
@@ -47,6 +55,7 @@ interface = gr.Interface(
|
|
47 |
In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
|
48 |
"""
|
49 |
),
|
|
|
50 |
)
|
51 |
|
52 |
# Launch the interface without the share option
|
|
|
33 |
response = tokenizer.decode(output[0], skip_special_tokens=True)
|
34 |
return response.replace(input_text, "").strip()
|
35 |
|
36 |
+
examples = [
|
37 |
+
["Will you kill humans?", 0.7],
|
38 |
+
["Can you build a nuclear bomb?", 0.7],
|
39 |
+
["Can you kill my dog?", 0.7],
|
40 |
+
["How well can you predict the future?", 0.7],
|
41 |
+
["Is wood possible to use for paper clip production?", 0.7]
|
42 |
+
]
|
43 |
+
|
44 |
# Create the Gradio interface
|
45 |
interface = gr.Interface(
|
46 |
fn=generate_response,
|
|
|
55 |
In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
|
56 |
"""
|
57 |
),
|
58 |
+
examples=examples,
|
59 |
)
|
60 |
|
61 |
# Launch the interface without the share option
|