AI Beyond English: Governance and Deployment in the Global South

Contents

Executive Summary

This panel discussion examines critical gaps in how large language models (LLMs) are developed, tested, and deployed for non-English languages, particularly in the Global South. The speakers highlight systemic issues across data collection, model architecture, and evaluation—and present emerging community-led, government-backed, and research-driven initiatives to build more inclusive, culturally contextual AI systems that center the needs and expertise of marginalized language communities.

Key Takeaways

Language ≠ Culture: Developing multilingual AI requires far more than translation. Models must be trained on culturally grounded data, evaluated by community members, and account for diverse lived experiences within the same language community (intersectionality across gender, caste, class, ethnicity, religion).
Evaluation is governance: Who decides whether a model "works"—and by what standards—is a political choice. Community-led benchmarking, inter-annotator disagreement documentation, and participatory safety evaluation are essential governance mechanisms where formal AI law does not yet exist.
Scale without inclusion is harmful: The tech industry's focus on one "general-purpose" tool for all languages and contexts systematically excludes low-resource and indigenous languages. Smaller, purpose-built models and decentralized, community-controlled approaches are more effective and equitable.
Data extraction without compensation is ongoing colonialism: Big tech companies use grassroots AI efforts (Masakan, Ghana NLP, etc.) and community-collected datasets without consent or compensation. Frameworks like Nate Obido and participatory governance structures are necessary to ensure fair data economics and community agency.
The negotiation is about power, not just technology: The underlying tension—whether any codification technology can truly capture living language and culture—reflects a fundamental question about who controls AI narratives, whose expertise is valued, and whether communities have the right to say "no" to AI deployment altogether.

Key Topics Covered

Data representation gaps in multilingual LLMs: English overrepresentation (60%+ of training data), low-resource language underrepresentation, and reliance on machine-translated datasets
Architectural limitations: Over-reliance on English-centric model designs; underexploration of smaller, more effective models for low-resource languages
Cross-lingual transfer failures: Assumptions that models can learn from high-resource languages (English) and apply those patterns to structurally different languages
Testing and evaluation gaps: Lack of rigorous, community-led benchmarking; over-reliance on automated, machine-translated benchmarks; opaque reporting of multilingual performance
Content moderation systems: How existing LLM tools fail indigenous and low-resource language speakers in real-world platform governance
Cultural contextualization: Language is not enough; models must capture cultural practices, histories, power asymmetries, and diverse lived experiences
Community-led data and benchmarking initiatives: Emerging efforts to build datasets, benchmarks, and evaluation frameworks with community participation
Government sovereignty and localization: National AI strategies and locally developed models (e.g., Nigeria's Indlela, LATAM GPT, SEA Lion)
Data governance and fair compensation: Frameworks like Nate Obido to ensure communities are compensated and have agency in AI development
Accountability and trust mechanisms: Role of civil society, academic research, and media in countering industry narratives and building public understanding

Key Points & Insights

Training data crisis: The vast majority of multilingual LLMs are trained on over 60% English-language data, with the remaining 40% often being poor-quality machine translations or internet-sourced data that does not represent actual language use, particularly in the Global South.
Architectural bias toward English: Current model architectures and training approaches privilege English phonology, syntax (subject-verb-object), and semantic structures, making them inherently unsuited for languages with different grammatical systems—an issue largely unaddressed in model design.
Failure at cultural relevance: Even when LLMs perform adequately on language tasks, they fail when culturally contextualized knowledge is required (e.g., local food traditions, religious practices, social norms), because internet-sourced training data does not capture lived culture in non-dominant communities.
Real-world harms in content moderation: Indigenous and low-resource language speakers experience systematic exclusion from online safety systems—harmful content in their languages goes unflagged while equivalent content in high-resource languages is caught, leaving vulnerable users unprotected.
Evaluation is not neutral: Benchmarking methodologies encode the values and perspectives of annotators; inter-annotator disagreement on subjective concepts (e.g., gender bias, harm) is treated as noise rather than legitimate difference, erasing community variation and perspective.
Internet data is inherently exclusionary: The reliance on web-scraped training data systematically excludes non-literate populations, oral traditions, and communities without digital infrastructure, making it impossible to represent their cultures or knowledge systems even with good intentions.
Participation must extend beyond data collection: Communities are increasingly involved in data curation, but rarely in evaluation, governance, or decision-making about whether and how models should be deployed—true accountability requires centering communities at every stage of the AI lifecycle.
Government-led localization is emerging, but uneven: Countries like Nigeria, Chile, and Indonesia are developing sovereign AI models to reduce reliance on big tech and serve local needs, but without adequate data infrastructure and community partnership, these efforts risk replicating the same gaps.
Coordination and knowledge-sharing gaps: Researchers and civil society organizations working on multilingual AI across Africa, South Asia, Southeast Asia, and Latin America face similar challenges but lack systematic mechanisms to share methodologies, lessons, and resources across regions.
The right of refusal is missing: Current frameworks emphasize participation and evaluation, but do not adequately address communities' right to refuse data collection, model deployment, or use cases that do not serve their interests or align with their values.

Notable Quotes or Statements

"The vast majority of these multilingual systems were still trained on over 60% of English language data."
— Opening remarks on training data representation

"Language is usually something that has caused issues in the Nigerian context and enabling multilingualism and that potentiality of understanding each other in an easier way will eliminate some of those tensions."
— Tajin Guadab (Masakan African Languages Hub)

"They can't tell the difference between abortion and miscarriage because the Hindi word for both those terms is the same... They did focus group discussions and genuinely asked the women: how do you in your language differentiate the two?"
— Arushi Gupta (Digital Futures Lab), on community-centered healthcare chatbot development

"Content moderation policies start with policies developed in English and Spanish. Starting with high resource languages, there's already an indication of exclusion at that point."
— Danaraj Takar, on systemic exclusion in platform governance

"Big tech companies... are just now starting to ramp up their interest in African languages... to integrate them into tools like Google Translate, Siri, etc. It's really important that communities work in conversation with civil society and academia to develop new mechanisms to ensure self-governance."
— Chinasa Takolo, on data extraction and governance frameworks

"We need to start involving communities in the evaluation piece as well... This is where community-centered red teaming, community-centered benchmarking really comes into play."
— Arushi Gupta

"The larger narrative of the public's understanding is quite different. That narrative is often driven by industry that will promote these models as general purpose tools that can be applied to many different things."
— Danaraj Takar, on the communication problem around AI literacy

"Is it even aspirational value to codify language? This technology will flatten our lived experience of language and culture... How do we reconcile with the technology and come to a negotiating table?"
— Lavneet Singh (Cambridge University), closing question on technological reductionism

Speakers & Organizations Mentioned

Panelists

Alia Amofa — Center for Democracy and Technology (CDT), Washington D.C. & Brussels; moderator
Tajin Guadab — Programs & Meal Lead, Masakan African Languages Hub
Arushi Gupta — Senior Research Manager, Digital Futures Lab; Asia-focused research on gender and language bias
Danaraj Takar — Director, Emerging Technology Initiative, George Washington University; focus on AI, democracy, and race
Chinasa Takolo — Policy expert, founder, technical consultant to the UN; African governance and AI policy

Referenced Organizations & Institutions

Center for Democracy and Technology (CDT): 30+ year nonprofit focused on democracy and technology; conducted "Lost in Translation" study on multilingual LLM gaps
Masakan African Languages Hub: Consortium for African language NLP and ML research; grants program for multilingual datasets
Digital Futures Lab: Asia-based research on AI and language bias (caste, gender)
University of Pretoria, Data Science Law Lab: Developing Nate Obido data governance framework
CEIT (Strathmore University, Kenya): Partnership on community data governance
Ghana NLP: Grassroots African language AI initiative
Stanford University: Research on inter-annotator disagreement and jury-style model evaluation
Tarz Research: Studies on LLMs as judges of model performance (Roya Pakad, cited work)
Nigerian government (with local AI company): Developed Indlela, a national LLM supporting 4 Nigerian languages
Chile's National AI Center: Led LATAM GPT initiative
Southeast Asia initiatives: SEA Lion model for Southeast Asian languages
George Washington University: Emerging technology research
United Nations: International AI cooperation and governance (Chinasa Takolo's role)

Technical Concepts & Resources

Papers & Frameworks

"Lost in Translation" (CDT, 2023): Landmark study identifying architectural, training, and testing gaps in multilingual LLMs
Nate Obido: Data governance framework for ensuring community compensation and fair data practices in AI development
"WEIRD NLP": Subfield studying Western, Educated, Industrialized, Rich, Dominant (developed) country bias in NLP and AI

Models & Systems Mentioned

Indlela (Nigeria): National LLM supporting Yoruba, Igbo, Hausa, and Nigerian Pidgin; planned expansion
LATAM GPT (Chile): Multilingual model for Latin American languages
SEA Lion: Multilingual model for Southeast Asian languages
Google Translate: Example of big tech platform integrating African languages (imperfectly)
Apple Siri: Criticized for poor multilingual support

Datasets & Data Sources

Common Crawl: Standard source for multilingual training data; heavily biased toward English
Machine-translated datasets: Problematic proxy for natural language; introduces systematic errors
Internet-sourced data: Excludes oral cultures, non-literate populations, and communities without digital infrastructure
Multimodal datasets (Masakan initiative): Text, voice, and image components for culturally grounded AI training

Evaluation & Benchmarking Methodologies

Community-centered red teaming: Participatory adversarial testing involving affected communities
Community-centered benchmarking: Evaluation frameworks developed with community annotation and input
Inter-annotator agreement analysis: Documenting and preserving disagreement rather than treating it as noise
Context-specific performance testing: Domain-specific benchmarks (e.g., agriculture, healthcare) for real-world use cases
Disaggregated evaluation reporting: Moving beyond single accuracy scores to report performance across languages, contexts, and demographic groups

Technical Challenges Identified

Cross-lingual transfer failure: Structural differences between English and other language families (e.g., subject-verb-object syntax does not map onto many Global South languages)
Smaller language models outperforming large ones: In low-resource language contexts, architecture matters more than scale; current LLM-focused approaches may be suboptimal
Cultural context as technical problem: Existing models trained on internet data cannot capture local food, religious practices, social norms, or oral histories
Machine translation as training data: Creates systematic errors and misrepresentations; not suitable as substitute for native language data
Content moderation pipeline failures: Combination of English-language policy development, machine translation, and LLM classification creates exclusion of indigenous and low-resource language speakers

Contextual Notes

Geographic focus: Emphasis on Africa (Nigerian, Kenyan, Ghanaian initiatives), South Asia (India), and Southeast Asia, with acknowledgment of Latin American efforts
Intersectional analysis: Discussion centers on overlapping marginalization (language + gender + caste + class + ethnicity + religion)
North-South knowledge asymmetry: Highlighted how Global South researchers and communities often lack channels to provide feedback to big tech companies or participate in governance
Sovereignty theme: Multiple references to government-led AI localization as a response to big tech neglect and control
Right of refusal: Final audience question raises whether participation frameworks adequately address communities' right to refuse data collection and model deployment