Democratizing AI: Building Trustworthy Systems for Everyone

Contents

Executive Summary

This panel discussion from an AI summit in India focuses on democratizing AI access globally, particularly in the Global South, while ensuring systems are trustworthy and culturally appropriate. Key speakers emphasize that successful AI diffusion depends on five pillars—infrastructure, skilling, multilingual/multicultural AI, local innovation support, and data transparency—combined with robust measurement frameworks and inclusive governance that respects national sovereignty and diverse values.

Key Takeaways

Democratization Requires Active Intervention: AI diffusion will not naturally reach the Global South. Deliberate investments in infrastructure, skilling, and localized systems are mandatory; Microsoft's $50B commitment exemplifies scale required.
Trust Is Measurable, Multidimensional, and Context-Dependent: Trustworthy AI isn't a single metric. It depends on reliability benchmarks (security, accuracy, fairness), cultural fit, accessibility, energy efficiency, and alignment with local policy—all of which must be measured rigorously.
Open Source & Federated Evaluation Are Structural Enablers: Open-source models and technologies like federated evaluation allow countries without massive compute budgets to develop competitive, sovereign AI capabilities while maintaining data privacy and governance.
Inclusion of Marginalized Groups in Governance Is Non-Negotiable: Women, children, rural populations, and the Global South must participate in AI decision-making from the start. Current processes exclude 50% of humanity and guarantee that safety measures will be inadequate or poorly targeted.
Measurement Science Is Foundational to Trustworthy AI: Establishing AI metrology—systematic measurement of model reliability, societal effects, and economic outcomes—is as essential as the technology itself. The field requires interdisciplinary collaboration (computer science, social science, law, psychology) and long-term commitment.

Key Topics Covered

Global AI Diffusion Gap: Disparity between AI adoption in the Global North versus Global South (roughly 2:1 ratio)
Infrastructure as Foundation: Data centers, connectivity, and energy requirements for broad AI deployment
Skilling & Workforce Development: Teacher training and workforce readiness as critical enablers of technology adoption
Multilingual & Multicultural AI: Necessity of culturally sensitive models and safety benchmarks beyond English
Measurement & Benchmarking: Industrial-scale evaluation frameworks (including federated evaluation) for reliability
Data Governance & Sovereignty: Cross-border data sharing, national agency over AI systems, and local customization
Healthcare Applications: Real-world use cases in primary care, maternal health, and disease surveillance
Trustworthiness Definition: Context-specific, multi-dimensional approach encompassing accuracy, security, accessibility, and cultural relevance
Gender & Inclusion: Under-representation of women and marginalized communities in AI decision-making
Open Source & Accessibility: Role of open-source models in enabling access for countries unable to afford proprietary systems

Key Points & Insights

Reliability, Not Capability, Is the Bottleneck: Peter Matson (ML Commons) emphasizes that AI adoption is constrained by reliability concerns—whether systems are correct, secure, and safe consistently—rather than by raw capability. Trust depends on demonstrable reliability across diverse contexts.
Five-Pillar Framework for Equitable AI Diffusion (Microsoft):
- Infrastructure (data centers with sovereignty controls)
- Skilling (e.g., training 2 million Indian teachers in AI-specific education)
- Multilingual/multicultural AI (expanding benchmarks to Hindi, Tamil, Malay, Japanese, Korean)
- Local innovation support (solving locally-relevant problems)
- Data transparency (contributing adoption metrics to central projects like World Bank initiatives)
Trustworthiness Is Contextual, Not Universal: Dr. Harisha Aya (Gates Foundation) argues trustworthiness encompasses multiple dimensions: Does the system work offline/on edge? Is it in the right language? Does it respect local policy variations (e.g., different maternal health rules across Indian states)? These cannot be standardized globally.
Governance & Interdependence Challenge: Dr. Gog (earlier speaker) identifies the core challenge as managing governance across hardware, software, and ethical protocols—not controlling every layer, but ensuring institutional capability and confidence in systems that reflect each country's priorities.
Industrial-Scale Benchmarking Technology: ML Commons' federated evaluation and confidential compute enable reliable measurement across dispersed datasets without centralizing sensitive data—critical for healthcare, government, and cross-border applications.
Energy & Efficiency as Equity Issues: Smaller, domain-specific, lower-parameter models are essential for the Global South where energy costs and connectivity are constraints. Current focus on giant models disadvantages resource-constrained regions.
Open Source as Access Equalizer: Open-source and open-weight models (e.g., Microsoft's Phi family) empower countries and communities to adapt technology to local contexts without depending on proprietary systems, crucial for bottom 50% of the population pyramid.
Measurement Science for AI Is Nascent: Wendy Hall (Southampton) calls for establishing AI metrology as a rigorous science—measuring not just model performance but societal effects, trust factors, and economic outcomes. The UK's Center for AI Measurement and AI Security Institute (renamed Network for AI Measurement & Evaluation) exemplifies this direction.
Gender & Systemic Exclusion: Women and children are largely absent from AI governance discussions despite being 50% of the population. Safety measures (e.g., deep fake prevention) must involve those most affected; current discourse is male-dominated.
Data Governance Paradox: While data is essential for AI, not all data can or should be open. Cross-border data sharing requires new frameworks—currently a UN-level governance challenge—and registries/repositories are needed so researchers can find usable datasets.

Notable Quotes or Statements

Peter Matson (ML Commons): "If I had to point to anything that's holding back AI today, it's not capability, it's reliability, right? Is it correct? Is it secure? Is it safe all the time?"
Natasha Crampton (Microsoft): "We have to make choices that lead to that outcome. And so for that reason I am excited about these attempts at measurement in multiple dimensions... we get to write this future but we have to actively guide it."
Dr. Harisha Aya (Gates Foundation): "To build trust, you need to get to the bottom 50% of the pyramid... we want to make sure that this doesn't create a divide [not just between global north and south] but even within countries."
Wendy Hall (Southampton): "AI is missing out 50% of the population right... 50% of us are women and we're not involved in the discussions about keeping us safe... women are involved at the top level in the decision making about what we do."
Wendy Hall (on measurement): "The world is not going to end at the end of this year because of AI... We have time to do this. If we can develop this new science... we can really start to think about how we measure trust and one of the metrics in AI metrology will be the trust factor."
Dr. Gog (on governance): "A bigger challenge might be to manage the interdependence of the AI ecosystem because it spans hardware software and the protocol so to say or the ethics around that."

Speakers & Organizations Mentioned

Speaker	Role/Affiliation	Key Focus
Dr. Gog	Panelist (working group on international AI coordination)	Infrastructure sharing, governance, institutional capability
Natasha Crampton	Chief Responsible AI Officer, Microsoft	Responsible AI principles, global AI diffusion, sovereignty controls
Peter Matson	President, ML Commons; Senior Staff Engineer, Google	AI benchmarking, reliability measurement, multilingual safety
Dr. Harisha Aya	Director (Health, AI, Digital Innovation), Gates Foundation	Health applications, edge computing, sustainability, low-income populations
Dame Wendy Hall	Regius Professor of Computer Science, University of Southampton; Co-chair, UK AI Review	AI governance, measurement science, inclusion, data governance, metrology
Brad Smith	(Referenced, Microsoft leadership)	AI principles and responsible AI strategy
Vince Cerf	(Mentioned, unable to attend)	Referenced as supporter of Global South development via AI
Joshua Bengio	(Referenced, Google/DeepMind)	Exponential opportunity and risk in AI diffusion
Sanjay Jain	(Mentioned, colleague of Dr. Aya)	MOSIP (digital public infrastructure, India)

Organizations:

Microsoft, Google, Google DeepMind
ML Commons
Gates Foundation
University of Southampton
National Physical Laboratory (UK)
UK AI Security Institute / Network for AI Measurement & Evaluation
NIST (referenced)
World Bank
UN CSTD (Commission for Science & Technology Development)
IMDA (Singapore)

Technical Concepts & Resources

Benchmarking & Measurement Frameworks

Federated Evaluation: Sending models to distributed facilities to test on local data without centralizing sensitive information
MedPerf Project: Healthcare-specific benchmarking using federated evaluation across diverse datasets
Confidential Compute: Technology enabling secure cross-border data sharing and evaluation
ML Commons Benchmarks: Multilingual safety benchmarks expanded to Hindi, Tamil, Malay, Japanese, Korean
Lingua Africa Initiative: Partnerships with local communities (Gates Foundation) to collect rich, locally-representative language data

Models & Technologies

Microsoft Phi Family: Open-weight models enabling local adaptation
Smaller, Domain-Specific Models: Alternative to giant LLMs for resource-constrained environments
Edge Computing & On-Device Inference: Critical for low-connectivity regions; alternative compute architectures beyond traditional digital systems

Infrastructure & Data

Sovereignty Controls: Mechanisms in cloud data centers allowing national agency
Data Repositories/Registries: Proposed infrastructure for global discovery of datasets
Cross-Border Data Sharing Frameworks: Emerging governance models for compliant data exchange

Policy & Governance References

GDPR "Brussels Effect": Mentioned as example of global regulatory influence (does not yet apply to AI uniformly)
AADHAR (India): Digital identity system; MOSIP (modeled on AADHAR) as open-source digital infrastructure
UK National Physical Laboratory AI Measurement Center: Government-backed center for AI metrology
UN Report on AI Governance: Accepted recommendations on global scientific panel, global dialogue, and global fund; data governance recommendations pending

Metrics & Measurement Areas

AI Metrology: New science encompassing measurement of reliability, trust, societal effects, economic outcomes
Trust Factor: Proposed core metric in AI metrology
Real-world Evidence: Measurement of actual usefulness in development and social sectors
Adoption & Usage Data: Sharing mechanisms for understanding where AI diffusion is accelerating/lagging

Additional Context

Conference Setting: This discussion occurred at a major AI summit in India (estimated 250,000 attendees), reflecting India's position as a leader in digital public infrastructure and AI policy. The event highlighted India's inclusive approach ("AI is all inclusive") while critically examining gaps in representation and access.

Underlying Tensions:

Scale vs. localization (global models adapted to diverse contexts)
Speed of innovation vs. time for rigorous measurement
Proprietary vs. open approaches to AI
Centralized vs. edge/decentralized computing for resource-constrained settings
Top-down governance vs. grassroots inclusion

Future Challenges Highlighted:

Multi-turn and agentic AI benchmarking (not yet mature)
Scaling measurement infrastructure across industries and geographies
Data governance frameworks that enable cross-border sharing without colonialism
Gender and demographic inclusion in AI governance and research