Research Line

Democracy Defense

Avoiding AI that passes the redline of Europe or humanity.

Research on detecting democracy-threatening tendencies of AI, especially the emergent risks exposed by Large Language Models (LLMs).

EuroSafeAI conducts rigorous, public-interest evaluations of AI systems for democratic societies. Our audits assess conformity with the EU AI Act, democratic integrity, historical accuracy, and adherence to human rights standards

View All Research

Research

Publications and ongoing work in this research direction.

EACL 2026

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

We propose a novel methodology to assess LLM alignment on the democracy–authoritarianism spectrum, combining the F-scale psychometric tool, a new favorability metric (FavScore), and role-model probing. LLMs generally favor democratic values but exhibit increased favorability toward authoritarian figures when prompted in Mandarin, and often cite authoritarian figures as role models even outside political contexts.

David Guzman Piedrahita, Irene Strauss, Bernhard Schölkopf, Rada Mihalcea, Zhijing Jin

Read paper Blog post

political biasdemocracy vs authoritarianismmultilingual evaluationAI ethics

ORAL IASEAI 2026

Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models

We introduce HistoricalMisinfo, a curated dataset of 500 historically contested events from 45 countries, each paired with factual and revisionist narratives. To simulate real-world pathways of information dissemination, we design eleven prompt scenarios per event. Evaluating responses from multiple LLMs, we observe vulnerabilities and systematic variation in revisionism across models, countries, and prompt types.

Francesco Ortu, Joeun Yook, Punya Syon Pandey, Keenan Samway, Bernhard Schölkopf, Alberto Cazzaniga, Rada Mihalcea, Zhijing Jin

Read paper Blog post

historical revisionismmisinformationfactualityLLM evaluationdemocratic integrity

COLM 2025 Workshop SoLaR Poster

When Do Language Models Endorse Limitations on Universal Human Rights Principles?

We evaluate how LLMs navigate trade-offs involving the Universal Declaration of Human Rights, leveraging 1,152 synthetically generated scenarios across 24 rights articles in eight languages. Analysis of eleven major LLMs reveals systematic biases: models accept limiting Economic, Social, and Cultural rights more often than Political and Civil rights, with significant cross-linguistic variation.

Keenan Samway, Nicole Miu Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin

Read paper Blog post

human rightsUDHRmultilingual alignmentethical AIvalue bias

ICLR 2026

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

We propose SocialHarmBench, the first comprehensive benchmark to evaluate the vulnerability of LLMs to socially harmful goals with 78,836 prompts from 47 democratic countries collected from 16 genres and 11 domains. These prompts were carefully collected and human-verified by LLM safety experts and political experts. From experiments on 15 cutting-edge LLMs, many safety risks are uncovered.

Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj, Rada Mihalcea, Zhijing Jin

Read paper Blog post

LLM safetysociopolitical harmsbenchmarkingdemocracy defensered-teaming

Findings of ACL 2025

Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

We explore multiple directions to investigate hidden mechanisms behind content moderation: training classifiers to reverse-engineer content moderation decisions across countries, and explaining moderation decisions by analyzing Shapley values and LLM-guided explanations. Our experiments reveal interesting patterns in censored posts, both across countries and over time.

Neemesh Yadav, Jiarui Liu, Francesco Ortu, Roya Ensafi, Zhijing Jin, Rada Mihalcea

Read paper Blog post

content moderationexplainabilitycross-country analysiscensorshipNLP ethics

AI Alignment Index

Building on the research above, we developed the EuroSafeAI Alignment Index — a public leaderboard that evaluates frontier AI models across four dimensions derived directly from our democracy defense publications.

View the Alignment Index

Featured Video

AI, Safety, and Democratic Resilience

A concise overview connecting AI safety, platform accountability, and information integrity. Highlights practical approaches for evaluating model risks and building civic-minded safeguards.

Media Contact

Zhijing Jin

Founder & Head, EuroSafeAI

zjin.admin@cs.toronto.edu

Pepijn Cobben

Cofounder, EuroSafeAI

pcobben@ethz.ch

Punya Syon Pandey

Lab Assistant

ppandey@cs.toronto.edu

Bluesky X / Twitter YouTube

Explore Our Research

View all our publications across AI safety, multi-agent systems, and democracy defense.

All Research Contact Us