All sessions

Inclusive AI: Why Linguistic Diversity Matters

Contents

Executive Summary

This summit session presented a collaborative effort between Bhashini (India's national open-source AI language platform) and Current AI to develop an open-source, multilingual, offline AI inference device designed to make AI accessible to non-English speakers and communities in low-connectivity regions. The discussion emphasized linguistic diversity, cultural preservation, and data sovereignty as critical pillars of inclusive AI, with panelists arguing that technology shaped by Western companies inevitably excludes non-Western languages, cultures, and knowledge systems.

Key Takeaways

  1. Open-Source Hardware Is the Beachhead: Just as Linux democratized software, open-source AI hardware is the prerequisite for preventing proprietary lock-in and allowing communities to innovate on their own terms.

  2. Multilingual ≠ Inclusive; You Need Cultural Encoding: Translating English text into 22 languages is insufficient. True inclusion means capturing local knowledge systems—agricultural practices, medicinal traditions, oral histories—and embedding them in AI training data.

  3. Offline Inference Is a Political Decision: The ability to run AI locally, without connectivity, is not a technical nicety—it's essential for privacy, disaster resilience, and independence from centralized platform companies.

  4. Governance Frameworks Must Be Negotiated, Not Imposed: Data sovereignty, reciprocity, and community benefit cannot be solved by universal rules. They require stakeholder engagement, local context, and mechanisms for community oversight.

  5. India & France as a Model for Alternatives: Bilateral partnerships between countries with complementary strengths (India's data/language diversity, France's regulatory/cultural experience) can build globally relevant AI systems that reject the US-China duopoly.

Key Topics Covered

  • Multilingual AI & Linguistic Diversity: Building AI systems that support 22+ languages (with plans for 36+), including tribal languages
  • Open-Source Hardware & Software: The Bhashini-Current AI collaborative device prototype and its open architecture
  • Privacy & Offline Inference: Running AI models entirely on-device without cloud connectivity
  • Cultural Preservation & Representation: Using AI to encode and preserve diverse cultural knowledge, traditions, and heritage
  • Data Sovereignty & Community Rights: Questions of data ownership, consent, and reciprocity when communities share cultural/linguistic data
  • Policy & Governance: National sovereignty in AI infrastructure; regulatory frameworks for cultural data protection
  • Use Cases: Vision-impaired accessibility, agricultural advisory, healthcare, tribal language preservation
  • Global Collaboration: France-India bilateral work on culturally inclusive AI; positioning alternatives to Western tech hegemony
  • India AI Innovation Challenge: Open competition to build on the prototype

Key Points & Insights

  1. The Embodied AI Risk: Large tech companies (Meta, Amazon, Apple) are deploying proprietary embodied AI devices (glasses, robots, smart speakers) that enter personal spaces, use unknown training data, often monolingual, and create ecosystem lock-in that restricts independent innovation—a core concern raised by Current AI's leadership.

  2. Language Shapes Reality: Linguistic diversity is not merely about translation; it reflects distinct worldviews, cultural contexts, and knowledge systems. When AI is trained primarily on Western languages and datasets, non-Western understanding of agriculture, health, law, and tradition is systematically excluded.

  3. Data Without Documentation: India alone has 16+ tribal languages without written scripts and 16 lakh (1.6 million) undigitized place names. Without deliberate effort to capture and encode this knowledge, AI will continue to treat these communities as invisible.

  4. Offline-First Design as Liberation: The prototype device running inference locally (no internet required) is not merely a technical feature—it's political. It prevents data extraction, vendor lock-in, and dependency on centralized cloud providers; it enables use in disaster zones and remote areas.

  5. The Quantization Achievement: The team successfully quantized large language models to run on edge hardware (NVIDIA Jetson) with zero loss in accuracy—a significant technical feat that makes multilingual AI deployable at scale in resource-constrained settings.

  6. Community Data Governance is Context-Specific: There is no one-size-fits-all framework for data sharing. Agricultural data benefits from aggregation; health data requires individual consent; cultural/artistic data involves both preservation incentives and creator rights. Solutions must be negotiated locally with stakeholders.

  7. Sovereignty Requires Multiple Layers: True AI sovereignty (national, community, individual) demands control across five layers: energy, infrastructure, chips, models, and applications. Most countries (including India) lack chip-design sovereignty but can build alternatives through partnership and diversification.

  8. Embodied Knowledge in Data: A tribal farmer's understanding of local pests, a traditional healer's herbal knowledge, oral histories—these are datasets. Including such contextual knowledge prevents AI hallucination and creates systems that approximate human understanding rather than mere statistical pattern-matching.

  9. Trust Institutions Matter More Than Open Source Alone: While open-source code is important, equally critical are trusted third-party institutions that steward data, respect community preferences, and make governance decisions on behalf of users (e.g., which actors get access to health data).

  10. Reciprocity & Benefit-Sharing: If communities contribute cultural/linguistic data to train models that generate commercial value, mechanisms for recognition, compensation, or at minimum community benefit should be designed in—not treated as afterthoughts.


Notable Quotes or Statements

Ayadav (Current AI CEO): "If it's not made by us, it's not for us." — Articulating the fundamental motivation behind multilingual and culturally-rooted AI.

Ayadav: "We don't know how [proprietary embodied AI devices] work. They're continuously recording our data, sending it out to the cloud. We also don't know how they're trained... it's how the iPhone locked up a lot of technology innovation." — Warning about ecosystem capture and the importance of open alternatives.

Amitab G (Bhashini CEO): "We have almost reached a form factor which is quite small... and since it works offline you are in a position to actually use it anywhere almost... so that no person is left behind including the tribal languages." — Vision statement on linguistic inclusivity.

Abishek Singh (AI Action Summit Orchestrator): "Ultimately the objective remains: what is it for citizens? It's not about the tech, models, hardware, GPUs, or datasets—it's about the end user, whether citizen, small business, or industrial unit." — Reframing AI's purpose around user benefit, not technology.

Ano (Paris Summit Orchestrator, French representative): "I don't necessarily want to be transported to Silicon Valley or transported to Shanghai when I get into AI... if all the cultural representation, all the legal background, all the customs are just the de facto way you interact—that's just such a reduction of cultural diversity." — Articulating the homogenization risk of Western-dominated AI.

Abishek Singh: "If I have aggregate data about a particular area... farmers should get crop advisories for maximum benefits... but if it's health data, individuals might not want to share with the larger ecosystem. It will be context-specific; there is not one size fits all solution." — Pragmatic approach to data governance across sectors.


Speakers & Organizations Mentioned

SpeakerRole/AffiliationContext
AyadavCEO, Current AILed device demo; discussed open-source hardware strategy, cultural preservation
Amitab GCEO, BhashiniDiscussed India's 350+ AI models in 22 Indic languages; data collection challenges; 15M daily inferences
Andrew TurgisLead Engineer, Current AITechnical demo of the inference device and multilingual pipeline
Shahim PalingGeneral Manager, BhashiniCollaborated on model integration into hardware
Abishek SinghOrchestrator, AI Action Summit (India)Announced India AI Innovation Challenge; discussed national AI sovereignty
Ano / AlanOrchestrator, Paris AI Action Summit; French delegationDiscussed French cultural policy and France-India bilateral AI collaboration
Martin TisneChair, Current AI; Director, AI CollaborativeModerated fireside chats; discussed democratic governance and indigenous data sovereignty

Organizations: Bhashini (India's national AI language platform), Current AI (public-private partnership for public interest AI), Kalpa Impact (organizer), AI Collaborative, French Government, Indian Government, Maui community (referenced for indigenous data sovereignty example)


Technical Concepts & Resources

Hardware & Infrastructure

  • NVIDIA Jetson – edge computing platform running the prototype
  • Quantization – model compression achieving zero accuracy loss (key innovation)
  • Offline inference – all AI processing on-device; no cloud connectivity required
  • Mesh networking – proposed future for distributed inference across multiple devices
  • Form factor optimization – reducing physical size while maintaining capability

AI Models & Languages

  • 350+ AI models in 22 Indic languages (current), with plans for 36+ languages
  • Tribal languages being added (e.g., Bi language without written script)
  • Language coverage: Bengali, Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Urdu, and others
  • 16 lakh undigitized place names in India (future dataset)

Technical Pipeline (Demonstrated)

  1. ASR (Automatic Speech Recognition) – convert audio to text in native language
  2. NMT (Neural Machine Translation) – translate between languages
  3. LLM (Large Language Model) – embedded inference for question-answering
  4. TTS (Text-to-Speech) – convert output text to audio
  • All modules run concurrently on-device without internet

Datasets & Knowledge Encoding

  • Digital corpus creation via human translators and data annotation (addressing data scarcity)
  • Traditional knowledge (agricultural, medicinal, oral histories) as missing datasets
  • Contextual enrichment (place names, local dialects, regional variants)
  • Privacy-preserving techniques for health and personal data

Methodologies & Frameworks

  • Privacy-preserving aggregation for health and agricultural data sharing
  • Community consent mechanisms for cultural data use
  • Context-specific governance (no universal data-sharing rules)
  • Benefit-sharing & reciprocity frameworks (under development)
  • Right of opposition for artists/creators (proposed solution for tension between preservation and creator rights)

Software & Development

  • Open-source hardware design (hackable, modifiable prototype)
  • Open-source software stack (model inference, orchestration tools)
  • Application frameworks (e.g., "Hear the World" app for vision-impaired accessibility)

Governance & Policy Concepts

  • AI Sovereignty – control across energy, infrastructure, chips, models, applications (5-layer model)
  • Indigenous data sovereignty – communities' rights to data about their culture
  • Trusted third-party stewardship – institutional mediators for data governance
  • Cultural subsidy mechanisms (French model) – public funding for non-dominant language AI development

Challenges Identified

  • Data scarcity for non-English languages (tribal, regional languages)
  • Undocumented traditional knowledge (oral traditions, unwritten languages)
  • Accuracy/latency tradeoffs in edge deployment (solved via advanced quantization)
  • Balancing open-source access with community data rights
  • Chip-design sovereignty (no country currently has full control of AI stack)

Additional Context

Timeline & Announcements:

  • Current AI raised €400 million for public-interest AI initiatives
  • Partnership discussed after the Paris AI Action Summit
  • Device prototype built in 5–6 weeks (rapid collaborative engineering)
  • India AI Innovation Challenge announced with submissions opening February 25
  • Bhashini offering $110,000+ in prize funding (amount flexible based on requests)
  • France-India designated "Year of Innovation" in bilateral relations

Future Directions:

  • Expanding language coverage from 22 to 36+ languages
  • Smaller form factors and improved battery life
  • Solar-powered stationary versions ("micro data centers")
  • Domain-specific variants (agriculture, health, education)
  • University and research-level partnerships between France and India