All sessions

Advancing AI Safety Across Languages, Cultures, and Contexts

Contents

Executive Summary

This talk addresses a critical gap in AI safety: safeguards calibrated primarily in English and Western contexts fail to generalize across languages, cultures, and regions, creating both safety vulnerabilities and barriers to equitable AI adoption. The summit convenes global stakeholders to advance multilingual and multicultural AI evaluation beyond simple translation toward rigorous, locally-grounded testing frameworks and shared infrastructure.

Key Takeaways

  1. Multilingual AI safety is not optional—it's essential: AI adoption depends on systems reliably working in the languages people actually use and the cultural contexts they inhabit. Safety gaps directly block equitable diffusion of AI benefits.

  2. Go beyond translation to contextualization: Effective multilingual safety requires co-designing evaluation frameworks with local experts, not simply translating existing American-centric benchmarks and safety datasets.

  3. Build shared infrastructure, not monolithic standards: Invest in reusable, locally-grounded evaluation methods and common taxonomies that allow different regions to assess harms according to their own values while maintaining interoperable testing frameworks.

  4. Red team across linguistic and cultural diversity from the start: Cross-regional adversarial testing (like the Singapore IMDA exercise across nine Asia-Pacific countries) surface safety gaps that monolingually-trained teams will miss.

  5. Acknowledge that safety is a design choice, not an accident: Trustworthy multilingual AI "doesn't happen by accident. It happens because we choose to design, evaluate, and govern systems for the world as it actually is."

Key Topics Covered

  • Language representation imbalance in AI training data — dominance of English and underrepresentation of low-resource languages
  • Contextual safety failures — how the same model shows different safety failure rates across languages and cultural contexts
  • Adversarial exploitation — how low-resource languages and non-Latin scripts are weaponized to bypass safeguards
  • Cross-regional red teaming initiatives — work in Singapore (IMDA), India (Sarai/IIT Madras), and broader Asia-Pacific region
  • Evaluation infrastructure challenges — difficulty standardizing testing across languages, including tool-calling in agentic systems
  • Equity and adoption — connection between trustworthy multilingual AI and broad diffusion of AI benefits
  • Institutional collaboration — necessity for shared taxonomies, benchmarks, and evaluation methods across companies and research institutions

Key Points & Insights

  1. Data imbalance drives safety gaps: Fewer than 5% of the world's 7,000+ spoken languages are meaningfully represented online; English dominates at 42% of widely used training datasets. This imbalance creates systematic blind spots in safety mechanisms.

  2. Context is not interchangeable: What constitutes "unsafe" varies significantly across cultures and regions. American-centric datasets (e.g., examples like "TPing a house") become universal training signals despite limited relevance globally, baking in narrow cultural assumptions into all frontier models.

  3. Safety mechanisms miss cultural indirection: English-calibrated safeguards fail to recognize culturally-indirect phrasing, idioms, humor, and implicit intent — particularly critical for contextual harms like scams, manipulation, and social engineering where intent is "hinted at but not stated outright."

  4. Low-resource languages are actively exploited: Adversarial actors deliberately use underrepresented languages and non-Latin scripts to bypass model and system-level safeguards because these contexts were never tested during development.

  5. Translation is insufficient: Simply translating datasets and prompts does not work across multilingual contexts. Testing infrastructure, tool-calling sequences, and underlying assumptions must be redesigned, not merely translated.

  6. Emergent harms evolve with deployment: New safety issues arise as people interact with AI over time (e.g., voice cloning fraud, targeting vulnerable populations). These aren't static but emerge from real-world usage patterns that vary by region.

  7. Standardization vs. localization tension: Creating shared standards while respecting different cultural norms about harm and acceptability requires developing common taxonomies that acknowledge legitimate regional differences without fragmenting safety approaches.

  8. No single institution can solve this alone: Progress requires ecosystem-wide collaboration on shared evaluation infrastructure, benchmarks (e.g., ML Commons AI Luminate expansion), and open participation from academia, government, startups, and industry.

Notable Quotes or Statements

"The same model performing the same task can show materially different safety failure rates depending on languages and cultural contexts." — Hector (Paris AI Summit findings)

"When AI safety mechanisms are calibrated primarily in English to reflect English speaking norms, they miss culturally indirect phrasing. They miss idioms. They miss humor and those misses affect what models refuse, how they follow constraints and how they behave in safety critical scenarios." — Natasha Kremton, Microsoft Chief Responsible Officer

"We ship the same model to billions of people around the world. And fundamentally we're shipping the same judgments and red lines about what is safe and what's unsafe." — Sarah Hooker, Adaptation Labs

"It's not just about language coverage. It's essentially a question also of context. What is unsafe here is completely different from what is considered unsafe in the US." — Sarah Hooker

"You can't expect [a translated dataset] to actually work the same across different languages... it's not a matter of translation." — WC Lee, Singapore IMDA

"The diffusion of trustworthy AI systems doesn't happen by accident. It happens because we choose to design, evaluate, and govern systems for the world as it actually is—multilingual, multicultural, and deeply interconnected." — Natasha Kremton, Microsoft

Speakers & Organizations Mentioned

Speakers:

  • Natasha Kremton — Chief Responsible Officer, Microsoft
  • Sarah Hooker — Co-founder, Adaptation Labs (formerly Head of Cohere Labs)
  • WC Lee — Singapore IMDA (Infocomm Media Development Authority)
  • Sunniama (also appears as "Sunayyama") — Moderator, Microsoft Research India
  • Hector — Referenced as organizer of Singapore work
  • Peter — Referenced (will speak on ML Commons AI Luminate)
  • Kika — Microsoft Research India

Institutions & Organizations:

  • Microsoft (Research, Chief Responsible Officer office, Research Accelerator)
  • Adaptation Labs (startup)
  • Cohere Labs
  • Singapore IMDA (Infocomm Media Development Authority)
  • Sarai at IIT Madras (Indian Institute of Technology)
  • ML Commons (benchmark expansion)
  • GP Center (Tokyo-based, proposed multicultural AI consortium)
  • Frontier AI labs (generic reference to leading model developers)
  • Microsoft Research India and Microsoft Research hubs in Africa

Technical Concepts & Resources

Benchmarks & Evaluation Frameworks:

  • ML Commons AI Luminate — multilingual, multicultural, multimodal benchmark expansion
  • SAMIA — community-centered approach to evaluating model behavior in real-world contexts
  • Project Gecko — co-designed AI applications for agriculture and education in East Africa and South Asia
  • Multicultural AI Consortium — proposed by GP Center (Tokyo) for shared evaluation infrastructure

Key Methodologies:

  • Red teaming — adversarial testing across regions (Singapore IMDA cross-country exercise with 9 Asia-Pacific nations)
  • Common taxonomy development — for bias, stereotyping, and cultural harms across different countries
  • Agentic testing — tool-calling and multi-step trajectory validation across languages
  • Culturally-grounded testing — surfacing locally-relevant bias and stereotyping patterns

Key Gaps Identified:

  • Language representation: <5% of world's spoken languages meaningfully represented in training data; <1% for most languages
  • Dataset English dominance: ~42% of widely-used training datasets in English alone
  • Non-Latin script vulnerability: Low-resource and non-Latin script languages exploited by adversarial actors

Data/Concepts:

  • Contextual harms — scams, manipulation, social engineering, voice cloning fraud
  • Emergent adversarial safety issues — harms that surface over time as users interact with deployed systems
  • Tool-calling translation — challenge of localizing tool names and sequences in agentic systems