AI Beyond English: Governance and Deployment in the Global South
Contents
Executive Summary
This panel discussion examines critical gaps in how large language models (LLMs) are developed, tested, and deployed for non-English languages, particularly in the Global South. The speakers highlight systemic issues across data collection, model architecture, and evaluation—and present emerging community-led, government-backed, and research-driven initiatives to build more inclusive, culturally contextual AI systems that center the needs and expertise of marginalized language communities.
Key Takeaways
-
Language ≠ Culture: Developing multilingual AI requires far more than translation. Models must be trained on culturally grounded data, evaluated by community members, and account for diverse lived experiences within the same language community (intersectionality across gender, caste, class, ethnicity, religion).
-
Evaluation is governance: Who decides whether a model "works"—and by what standards—is a political choice. Community-led benchmarking, inter-annotator disagreement documentation, and participatory safety evaluation are essential governance mechanisms where formal AI law does not yet exist.
-
Scale without inclusion is harmful: The tech industry's focus on one "general-purpose" tool for all languages and contexts systematically excludes low-resource and indigenous languages. Smaller, purpose-built models and decentralized, community-controlled approaches are more effective and equitable.
-
Data extraction without compensation is ongoing colonialism: Big tech companies use grassroots AI efforts (Masakan, Ghana NLP, etc.) and community-collected datasets without consent or compensation. Frameworks like Nate Obido and participatory governance structures are necessary to ensure fair data economics and community agency.
-
The negotiation is about power, not just technology: The underlying tension—whether any codification technology can truly capture living language and culture—reflects a fundamental question about who controls AI narratives, whose expertise is valued, and whether communities have the right to say "no" to AI deployment altogether.
Key Topics Covered
- Data representation gaps in multilingual LLMs: English overrepresentation (60%+ of training data), low-resource language underrepresentation, and reliance on machine-translated datasets
- Architectural limitations: Over-reliance on English-centric model designs; underexploration of smaller, more effective models for low-resource languages
- Cross-lingual transfer failures: Assumptions that models can learn from high-resource languages (English) and apply those patterns to structurally different languages
- Testing and evaluation gaps: Lack of rigorous, community-led benchmarking; over-reliance on automated, machine-translated benchmarks; opaque reporting of multilingual performance
- Content moderation systems: How existing LLM tools fail indigenous and low-resource language speakers in real-world platform governance
- Cultural contextualization: Language is not enough; models must capture cultural practices, histories, power asymmetries, and diverse lived experiences
- Community-led data and benchmarking initiatives: Emerging efforts to build datasets, benchmarks, and evaluation frameworks with community participation
- Government sovereignty and localization: National AI strategies and locally developed models (e.g., Nigeria's Indlela, LATAM GPT, SEA Lion)
- Data governance and fair compensation: Frameworks like Nate Obido to ensure communities are compensated and have agency in AI development
- Accountability and trust mechanisms: Role of civil society, academic research, and media in countering industry narratives and building public understanding
Key Points & Insights
-
Training data crisis: The vast majority of multilingual LLMs are trained on over 60% English-language data, with the remaining 40% often being poor-quality machine translations or internet-sourced data that does not represent actual language use, particularly in the Global South.
-
Architectural bias toward English: Current model architectures and training approaches privilege English phonology, syntax (subject-verb-object), and semantic structures, making them inherently unsuited for languages with different grammatical systems—an issue largely unaddressed in model design.
-
Failure at cultural relevance: Even when LLMs perform adequately on language tasks, they fail when culturally contextualized knowledge is required (e.g., local food traditions, religious practices, social norms), because internet-sourced training data does not capture lived culture in non-dominant communities.
-
Real-world harms in content moderation: Indigenous and low-resource language speakers experience systematic exclusion from online safety systems—harmful content in their languages goes unflagged while equivalent content in high-resource languages is caught, leaving vulnerable users unprotected.
-
Evaluation is not neutral: Benchmarking methodologies encode the values and perspectives of annotators; inter-annotator disagreement on subjective concepts (e.g., gender bias, harm) is treated as noise rather than legitimate difference, erasing community variation and perspective.
-
Internet data is inherently exclusionary: The reliance on web-scraped training data systematically excludes non-literate populations, oral traditions, and communities without digital infrastructure, making it impossible to represent their cultures or knowledge systems even with good intentions.
-
Participation must extend beyond data collection: Communities are increasingly involved in data curation, but rarely in evaluation, governance, or decision-making about whether and how models should be deployed—true accountability requires centering communities at every stage of the AI lifecycle.
-
Government-led localization is emerging, but uneven: Countries like Nigeria, Chile, and Indonesia are developing sovereign AI models to reduce reliance on big tech and serve local needs, but without adequate data infrastructure and community partnership, these efforts risk replicating the same gaps.
-
Coordination and knowledge-sharing gaps: Researchers and civil society organizations working on multilingual AI across Africa, South Asia, Southeast Asia, and Latin America face similar challenges but lack systematic mechanisms to share methodologies, lessons, and resources across regions.
-
The right of refusal is missing: Current frameworks emphasize participation and evaluation, but do not adequately address communities' right to refuse data collection, model deployment, or use cases that do not serve their interests or align with their values.
Notable Quotes or Statements
"The vast majority of these multilingual systems were still trained on over 60% of English language data."
— Opening remarks on training data representation
"Language is usually something that has caused issues in the Nigerian context and enabling multilingualism and that potentiality of understanding each other in an easier way will eliminate some of those tensions."
— Tajin Guadab (Masakan African Languages Hub)
"They can't tell the difference between abortion and miscarriage because the Hindi word for both those terms is the same... They did focus group discussions and genuinely asked the women: how do you in your language differentiate the two?"
— Arushi Gupta (Digital Futures Lab), on community-centered healthcare chatbot development
"Content moderation policies start with policies developed in English and Spanish. Starting with high resource languages, there's already an indication of exclusion at that point."
— Danaraj Takar, on systemic exclusion in platform governance
"Big tech companies... are just now starting to ramp up their interest in African languages... to integrate them into tools like Google Translate, Siri, etc. It's really important that communities work in conversation with civil society and academia to develop new mechanisms to ensure self-governance."
— Chinasa Takolo, on data extraction and governance frameworks
"We need to start involving communities in the evaluation piece as well... This is where community-centered red teaming, community-centered benchmarking really comes into play."
— Arushi Gupta
"The larger narrative of the public's understanding is quite different. That narrative is often driven by industry that will promote these models as general purpose tools that can be applied to many different things."
— Danaraj Takar, on the communication problem around AI literacy
"Is it even aspirational value to codify language? This technology will flatten our lived experience of language and culture... How do we reconcile with the technology and come to a negotiating table?"
— Lavneet Singh (Cambridge University), closing question on technological reductionism
Speakers & Organizations Mentioned
Panelists
- Alia Amofa — Center for Democracy and Technology (CDT), Washington D.C. & Brussels; moderator
- Tajin Guadab — Programs & Meal Lead, Masakan African Languages Hub
- Arushi Gupta — Senior Research Manager, Digital Futures Lab; Asia-focused research on gender and language bias
- Danaraj Takar — Director, Emerging Technology Initiative, George Washington University; focus on AI, democracy, and race
- Chinasa Takolo — Policy expert, founder, technical consultant to the UN; African governance and AI policy
Referenced Organizations & Institutions
- Center for Democracy and Technology (CDT): 30+ year nonprofit focused on democracy and technology; conducted "Lost in Translation" study on multilingual LLM gaps
- Masakan African Languages Hub: Consortium for African language NLP and ML research; grants program for multilingual datasets
- Digital Futures Lab: Asia-based research on AI and language bias (caste, gender)
- University of Pretoria, Data Science Law Lab: Developing Nate Obido data governance framework
- CEIT (Strathmore University, Kenya): Partnership on community data governance
- Ghana NLP: Grassroots African language AI initiative
- Stanford University: Research on inter-annotator disagreement and jury-style model evaluation
- Tarz Research: Studies on LLMs as judges of model performance (Roya Pakad, cited work)
- Nigerian government (with local AI company): Developed Indlela, a national LLM supporting 4 Nigerian languages
- Chile's National AI Center: Led LATAM GPT initiative
- Southeast Asia initiatives: SEA Lion model for Southeast Asian languages
- George Washington University: Emerging technology research
- United Nations: International AI cooperation and governance (Chinasa Takolo's role)
Technical Concepts & Resources
Papers & Frameworks
- "Lost in Translation" (CDT, 2023): Landmark study identifying architectural, training, and testing gaps in multilingual LLMs
- Nate Obido: Data governance framework for ensuring community compensation and fair data practices in AI development
- "WEIRD NLP": Subfield studying Western, Educated, Industrialized, Rich, Dominant (developed) country bias in NLP and AI
Models & Systems Mentioned
- Indlela (Nigeria): National LLM supporting Yoruba, Igbo, Hausa, and Nigerian Pidgin; planned expansion
- LATAM GPT (Chile): Multilingual model for Latin American languages
- SEA Lion: Multilingual model for Southeast Asian languages
- Google Translate: Example of big tech platform integrating African languages (imperfectly)
- Apple Siri: Criticized for poor multilingual support
Datasets & Data Sources
- Common Crawl: Standard source for multilingual training data; heavily biased toward English
- Machine-translated datasets: Problematic proxy for natural language; introduces systematic errors
- Internet-sourced data: Excludes oral cultures, non-literate populations, and communities without digital infrastructure
- Multimodal datasets (Masakan initiative): Text, voice, and image components for culturally grounded AI training
Evaluation & Benchmarking Methodologies
- Community-centered red teaming: Participatory adversarial testing involving affected communities
- Community-centered benchmarking: Evaluation frameworks developed with community annotation and input
- Inter-annotator agreement analysis: Documenting and preserving disagreement rather than treating it as noise
- Context-specific performance testing: Domain-specific benchmarks (e.g., agriculture, healthcare) for real-world use cases
- Disaggregated evaluation reporting: Moving beyond single accuracy scores to report performance across languages, contexts, and demographic groups
Technical Challenges Identified
- Cross-lingual transfer failure: Structural differences between English and other language families (e.g., subject-verb-object syntax does not map onto many Global South languages)
- Smaller language models outperforming large ones: In low-resource language contexts, architecture matters more than scale; current LLM-focused approaches may be suboptimal
- Cultural context as technical problem: Existing models trained on internet data cannot capture local food, religious practices, social norms, or oral histories
- Machine translation as training data: Creates systematic errors and misrepresentations; not suitable as substitute for native language data
- Content moderation pipeline failures: Combination of English-language policy development, machine translation, and LLM classification creates exclusion of indigenous and low-resource language speakers
Contextual Notes
- Geographic focus: Emphasis on Africa (Nigerian, Kenyan, Ghanaian initiatives), South Asia (India), and Southeast Asia, with acknowledgment of Latin American efforts
- Intersectional analysis: Discussion centers on overlapping marginalization (language + gender + caste + class + ethnicity + religion)
- North-South knowledge asymmetry: Highlighted how Global South researchers and communities often lack channels to provide feedback to big tech companies or participate in governance
- Sovereignty theme: Multiple references to government-led AI localization as a response to big tech neglect and control
- Right of refusal: Final audience question raises whether participation frameworks adequately address communities' right to refuse data collection and model deployment
