arxiv:2412.15035

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Published on Dec 19

· Submitted by

sted97 on Dec 23

Upvote

Authors:

Felix Friedrich ,

Simone Tedeschi ,

Patrick Schramowski ,

Manuel Brack ,

Roberto Navigli ,

Abstract

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

View arXiv page View PDF Add to collection

Community

sted97

Paper author Paper submitter 1 day ago

•

edited 1 day ago

We introduce M-ALERT, a multilingual benchmark with 75,000 safety prompts across five languages, to evaluate the safety of large language models (LLMs). Our study reveals significant inconsistencies in safety performance across languages and categories, with certain topics like crime propaganda and substance use consistently triggering unsafe responses. While some models excelled in specific languages or categories, inter-language consistency remained low, even for high-performing models. The findings highlight the need for language-specific safety tuning, policy-aware assessments, and improvements in translation pipelines to ensure robust multilingual safety practices. Our work aims at advancing AI safety research by providing a detailed evaluation framework and actionable insights for developing safer and more inclusive LLMs.

librarian-bot

about 17 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.15035 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.15035 in a Space README.md to link it from this page.