Running
2
Responsible AI Benchmark
🏆
Evaluating safety, robustness & fairness for real use-cases
None defined yet.
Evaluating safety, robustness & fairness for real use-cases
Localised Multilingual Moderation Classifier for Singapore
Evaluate if a user prompt is on-topic for a given system prompt
Multimodal search & retrieval-based biodiversity recognition
Evaluate system prompt leakage in LLM output