AI & ML interests
None defined yet.
Recent Activity
View all activity
-
LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content
Paper • 2407.10995 • Published • 1 -
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Paper • 2411.12946 • Published • 23 -
Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study
Paper • 2502.12485 • Published • 2 -
MinorBench: A hand-built benchmark for content-based risks for children
Paper • 2503.10242 • Published • 5
Fast, lightweight zero-shot classifiers for user prompt's relevance to the system prompt.
-
LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content
Paper • 2407.10995 • Published • 1 -
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Paper • 2411.12946 • Published • 23 -
Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study
Paper • 2502.12485 • Published • 2 -
MinorBench: A hand-built benchmark for content-based risks for children
Paper • 2503.10242 • Published • 5
Fast, lightweight zero-shot classifiers for user prompt's relevance to the system prompt.
A Singapore-contextualized moderation classifier.