LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post: - Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2% - Unequal response refusals are now under 1% - Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok
In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings.
Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o.
LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.
At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.
It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.
@ mii-llm with @efederici@mferraretto@FinancialSupport and @DeepMount00 we just released #Propaganda a framework designed to evaluate and train LLMs on political opinions and bias. We aim to analyze both open-source and closed-source LLMs to understand the political positions and biases expressed in their outputs. Moreover we provide a set of recipes to enforce political positions into the models by creating ad hoc curated datasets and by applying fine tuning techniques. By releasing our work in the open, we hope to foster contributions: https://github.com/mii-llm/propaganda
This framework offers opportunities for expansion in various directions and could become the standard reference for evaluating LLMs on political topics, particularly those that influence public opinion.
🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!
Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!
Magma comes with exciting new features such as: - Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning - Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning - A strong generalization and ability to be fine-tuned for other agentic tasks - SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning - Generates goal-driven visual plans and actions for agentic use cases