All sessions

AI Commons for the Global South: Data, Models, and Compute

Contents

Executive Summary

This panel discussion from the AI Impact Summit reframes AI infrastructure as civic infrastructure critical to equitable development in the Global South. Rather than treating AI adoption as primarily a technical or infrastructure challenge, speakers emphasize that democratizing AI requires addressing mindset barriers, data accessibility, governance frameworks, and community-centered approaches that respect local languages, cultures, and user agency.

Key Takeaways

  1. Infrastructure ≠ Adoption: Providing connectivity or AI models is necessary but insufficient. Meaningful AI democratization requires demonstrating local relevance, building trust, enabling peer learning, and respecting community agency and data rights.

  2. Data Governance is as Important as Data Availability: Open-data movements must pair dataset access with clear governance frameworks defining usage rights, benefit-sharing, and oversight—not just raw data dumps.

  3. Language and Localization Are Non-Negotiable: AI systems in the Global South must support Indic and low-resource languages from the start, not as an afterthought, to achieve genuine inclusion beyond English-speaking urban populations.

  4. Pragmatic Open Source Succeeds Where Ideology Alone Fails: Open-source adoption accelerates when it aligns with business and technical necessity (e.g., cloud infrastructure). Similarly, open-data and open-AI initiatives should be framed with clear incentives for contributors and users.

  5. Communities Must Be Partners, Not Data Sources: Sustainable AI democratization requires shifting from extractive models (collecting data and insights without community awareness or benefit) to participatory models where communities understand, consent to, and potentially benefit from AI systems built with their data.

Conference Talk Summary


Key Topics Covered

  • AI as Civic Infrastructure: Positioning AI infrastructure alongside digital public goods as essential for public service delivery, governance, and population-scale impact
  • Infrastructure Beyond Connectivity: The distinction between providing digital infrastructure (fiber, devices) and ensuring meaningful adoption and equitable use
  • Data as Critical Infrastructure: Data availability and governance frameworks as equally important as compute resources for AI development
  • Open Source and Open Data: The role of open-source software and open-data movements in democratizing AI access and preventing vendor lock-in
  • Language and Localization: The necessity of Indic language support and multilingual AI systems for inclusive adoption
  • Data Governance and Trust: Establishing frameworks that clarify data ownership, usage rights, benefit-sharing, and consent mechanisms
  • Community-Centered AI: Peer learning, community-driven adoption, and avoiding extractive data practices
  • Policy and Regulation: Outcome-focused rather than technology-specific regulation; balancing innovation with responsible oversight
  • University and Research Roles: Academia's responsibility to solve local problems while contributing to frontier research
  • Government as First Customer: Using government procurement to support startups and innovators addressing public sector challenges

Key Points & Insights

  1. The Adoption Paradox: Telangana's experience connecting villages with broadband fiber showed that infrastructure alone does not guarantee adoption. Despite high state per-capita income, rural populations did not automatically purchase devices—the barrier was primarily psychological and educational, not economic. Solutions required demonstration of tangible, locally-relevant problem-solving (e.g., pest detection in agriculture).

  2. Data is the Bottleneck, Not Code: While open-source code has become commoditized, data remains scarce and closely held. The real competitive advantage lies in access to relevant, curated datasets. Academia and researchers globally (including MIT and Stanford) are disadvantaged compared to large tech companies with proprietary data streams.

  3. Compute as Heat: Compute resources are essential infrastructure for AI but have been historically underfunded in academic settings. India should establish an academic compute cloud to enable research parity with industry. Without adequate compute, even good data cannot be leveraged effectively.

  4. Data Governance Frameworks Over Data Hoarding: Rather than simply making datasets openly available, structured data governance frameworks are essential. These should define: who contributes data, who can access it, under what circumstances, how benefits are shared if commercial value is created, and who provides oversight. The Telangana Agriculture Data Exchange (ADEX) model demonstrates this approach.

  5. Open Source Succeeds Through Business Logic, Not Ideology: Major companies adopted open source not primarily for altruistic reasons but because cloud infrastructure and modern software architectures require it. This pragmatic shift demonstrates that open infrastructure can align with commercial incentives—visibility, talent attraction, ecosystem effects.

  6. Sovereignty Exists on a Spectrum: "Digital sovereignty" ranges from soft sovereignty (localizing globally available technology for language, culture, and security) to hard sovereignty (requiring in-country creation). Different nations may adopt different positions; there is no one-size-fits-all approach.

  7. Language as Infrastructure: Supporting 500+ low-resource languages and Indian languages in AI models (e.g., Meta's multilingual ASR model, Bhashini translation systems) is not a nice-to-have but essential for actual inclusion. Without language support, AI tools remain inaccessible to the majority of Global South populations.

  8. Data Extraction vs. Data Consent: A critical distinction exists between inclusion (where communities benefit) and extraction (where they become "content" for algorithmic systems without awareness or agency). Current practices often blur this line—data collection happens with limited transparency regarding how behavioral data will be used, replicated, or monetized.

  9. Peer Learning and Organic Adoption: Communities adopt technologies in ways that suit their existing communication patterns (e.g., WhatsApp's explosive adoption in India driven by voice and visual simplicity, not formal digital literacy curricula). Technology design should account for peer-learning dynamics rather than top-down competency models.

  10. Front-Loading Ethics and Consent: Current practice feeds back (collect data, learn, then debate ethics). The speaker from Digital Empowerment Foundation advocates for front-feeding: establish ethical frameworks, explain data usage, and secure informed consent before collection and system deployment.


Notable Quotes or Statements

  • J. Shanjay (Telangana IAS Officer): "You can take the horse to water but [you need to make them want to drink]." —Illustrating that infrastructure provision alone cannot drive adoption without addressing mindset and demonstrating value.

  • PJ Narayan (Former TriplIT Director): "Data is the new oil... [but] the component that releases the heat from the fuel is compute." —Highlighting compute as often-overlooked critical infrastructure for academia.

  • PJ Narayan: "[We need] an open-data moment like the open-source movement." —Advocating for a coordinated, government-backed push to make critical datasets available to startups and researchers, similar to the open-source ecosystem.

  • Osama Manzer (Digital Empowerment Foundation): "The purpose of inclusion is so that more people use it, so that we can use you... we became content." —A critical perspective on how technology platforms extract behavioral data without genuine user benefit or agency.

  • Osama Manzer: "Human in the loop is the most important part as a loop of trust... [before] data collecting from them, are we making them aware of data literacy?" —Arguing for consent and transparency before data collection.

  • Amanda Brock (Open Infrastructure Foundation): "If you take that base [of open source] away from your pizza, what have you got? A sloppy mess." —A metaphor for how open-source infrastructure is foundational even when invisible; if removed, systems collapse.

  • Amanda Brock: "When we look at AI and technology, we want it to serve us. We don't want to be its servants." —A fundamental principle for responsible AI development.

  • Prachi Bhya (Meta): "Our vision is to bring personal super intelligence to everyone and everywhere." —Meta's stated vision for AI inclusion via AI glasses and multilingual support, with examples of real-world use cases (e.g., navigation for the blind).


Speakers & Organizations Mentioned

SpeakerRole / Affiliation
J. ShanjaySpecial Chief Secretary, Government of Telangana; Former IT Secretary
PJ NarayanChair, AI Impact Summit Research Symposium; Former Director, TriplIT (IIIT) Hyderabad
Osama ManzerFounder, Digital Empowerment Foundation
Prachi BhyaPublic Policy Manager, Meta
Rakkesh DubuFounder, Faculty; Session Moderator
Amanda Brock(Open Infrastructure / Open Source perspective; 30+ years in tech; former internet/dot-com lawyer)
AnushkaSession organizer / moderator (opening remarks)

Organizations/Initiatives:

  • Government of Telangana
  • TriplIT (IIIT) Hyderabad
  • Digital Empowerment Foundation
  • Meta
  • Faculty (AI infrastructure company)
  • AI Kosh (AI Commons initiative)
  • Bhatnet (India's broadband fiber initiative)
  • Open Atom Foundation (China's open-source infrastructure)

Technical Concepts & Resources

Data & Governance Systems

  • Telangana Data Exchange (TDX) / Agriculture Data Exchange (ADEX): A structured data governance framework enabling multi-stakeholder data sharing with clear protocols for contributors, users, access conditions, benefit-sharing, and oversight.
  • Data Management Framework: Principles for defining who accesses data, under what conditions, and how commercial benefits are shared with data originators.

AI Models & Tools

  • Meta's Multilingual ASR (Automatic Speech Recognition) Model: Covers 500+ languages; can extend to new low-resource languages from minimal audio samples.
  • Bhashini Translation System: Open-source translation system for Indic languages; forms a research backbone for speech understanding and production across language families (Indo-Aryan language groupings).
  • Be My Eyes App: Accessibility app demonstrated with Meta AI glasses, connecting blind users to sighted volunteers for real-time navigation assistance.
  • UPI Payments via AI Glasses: Pilot program enabling contactless payments through voice/visual interaction.

Linguistic & Data Collection Initiatives

  • Samagra: Open-source entity partnering on local data collection efforts.
  • 100,000 Student Volunteers Project: Effort to collect cultural and linguistic data from villages in Telangana and Andhra Pradesh; achieved ~60,000 samples for local model training.
  • Meta's Contributed Dataset: 12 billion tokens, 4 million pairs in 10 Indian languages, contributed to government's AI Kosh open-source library for agriculture, healthcare, and education applications.

Policy & Governance Concepts

  • Use-Case Based Regulation: India's approach of regulating AI outcomes and harms rather than technology itself, contrasted with Europe's technology-specific regulation.
  • Outcome-Focused Governance: Policy emphasis on defining desired outcomes (e.g., reduced crop loss, better healthcare access) rather than prescribing technology solutions.
  • Government as First Customer: Policy mechanism where government agencies commit to procuring innovative solutions from startups, reducing "valley of death" risk.

Hackathon & Competition

  • AI for All Hackathon (Partnership: Faculty, Meta, AI Kosh):
    • 1st Place: Intelligent Document Processing (Cedax Flash)
    • 2nd Place: BizNova (Code Plus)
    • 3rd Place: FAP NextGen (AI for Health)
    • Focus: Tools for making public datasets AI-ready; addressing data usability and multilingual access

Open Source Infrastructure

  • GitHub: Referenced as enabling collaboration for 25 million developers in India.
  • Open-Source AI Models: Meta's position on releasing open-weight models; broader industry shift toward open source driven by business necessity rather than ideology.
  • Licensing & Compliance Standards: Discussion of preventing open-source AI from becoming "closed in practice" via restrictive licensing or compliance burdens.

Implicit Frameworks & Models

  1. The Civic Infrastructure Model: Treating AI infrastructure as public goods analogous to water, electricity, or roads—requiring government stewardship, equitable access, and governance frameworks.

  2. The Data Exchange Model (ADEX): Multi-stakeholder data sharing with transparent governance, applicable to multiple verticals (agriculture, health, mobility, weather, pricing, taxation).

  3. The Peer-Learning Model: Recognition that technology adoption follows organic, peer-driven pathways rather than formal curriculum; design systems to enable this rather than resist it.

  4. The Dutch Water Management Metaphor: Working with systems (like water) rather than imposing rigid control; applied to AI regulation—enable innovation within a framework of shared values rather than proscriptive rules.

  5. The Soft vs. Hard Sovereignty Spectrum: A range of approaches to digital independence, from localizing global technologies to mandating in-country development.


Gaps & Ambiguities in the Transcript

  • Specific policy recommendations for balancing open-data mandates with privacy concerns are alluded to but not fully detailed.
  • Quantitative data on adoption rates, time-to-value, or ROI for specific initiatives (e.g., ADEX, MISA centers) is limited.
  • Compliance burden mechanisms for open-source AI licensing are mentioned as a concern but not thoroughly explored.
  • Regulatory capture risks are acknowledged but solutions remain at a high level ("work with the water").
  • Concrete steps China took in its open-source-first policy are mentioned as exemplary but not enumerated in detail.

End of Summary