All sessions

Regulating Open Data: Principles, Challenges, and Opportunities

Contents

Executive Summary

This India AI Impact Summit 2026 panel examines whether India should transition from voluntary open data initiatives to a statutory regulatory framework requiring government bodies to share standardized, AI-ready datasets. The discussion emphasizes that open data is not merely technical infrastructure but a question of power, sovereignty, and equitable benefit distribution in the global AI economy—requiring deliberate regulatory design that balances innovation, privacy, data sovereignty, and inclusive development.

Key Takeaways

  1. Regulation Is Not Optional for Sustainability — Absent statutory mandates with clear standards (anonymization, interoperability, purpose, consent), government data sharing will remain episodic, uneven, and politically reversible. Voluntary initiatives fail at scale.

  2. Open Data Is Data Sovereignty, Not Data Giveaway — Structured openness, with domestic capacity building, domestic digital law precedence, and safeguards against asymmetric extraction, allows nations to be architects of their digital future rather than passive markets.

  3. Trust and Clarity Drive Investment — Like capital markets regulation created India's vibrant IPO ecosystem, clear data governance frameworks build investor confidence, attract AI/tech investment, and enable transition from services-based to products-based tech sectors.

  4. India's Model Is Complementary, Not Competitive — EU's regulatory rigor and India's innovation-first approach to digital public infrastructure are not opposed; they can inform a Global South alternative to US platform dominance.

  5. The Real Question Is: Who Benefits? — Humanist AI and inclusive growth require ensuring that when Indian health, agricultural, or citizen data train global AI models, benefits flow back to India—through regulatory frameworks mandating shared value, not just shared data.

Key Topics Covered

  • Statutory vs. Voluntary Open Data Frameworks — Whether regulation should mandate government data sharing or remain aspirational
  • Data Sovereignty and Digital Inequality — How developing nations can avoid becoming raw data suppliers while global North captures value downstream
  • "AI-Ready" Data Standards — The evolution from static PDFs/CSVs to API-accessible, standardized, interoperable datasets suitable for LLMs and advanced AI systems
  • Privacy, Consent, and Anonymization — Balancing transparency with individual rights and meaningful consent mechanisms
  • Institutional vs. Government Data — Opening "dark data" siloed in regulatory agencies, commercial enterprises, and public institutions (NPCI, CERT, etc.)
  • Sectoral Data Opening — Healthcare, agriculture, financial, education—tailored policies for different sectors rather than one-size-fits-all approach
  • Digital Public Infrastructure as a Global Model — India Stack (Aadhaar, UPI, DigiLocker) as template for developing nations
  • Geopolitical Safeguards and Data Localization — National security concerns, trade agreements, and asymmetric data extraction by tech giants
  • Evidence-Based Policymaking — How aggregated data enables better targeting of welfare, disaster response, and agricultural interventions
  • Investor Confidence and Economic Growth — How regulatory clarity in data governance (like capital markets regulation) attracts investment and enables market development

Key Points & Insights

  1. The "Plumbing Problem" — Open data is foundational infrastructure ("plumbing") without which AI innovation ("PowerPoint") cannot function effectively. Without regulation, participation remains voluntary, uneven, and unreliable.

  2. Data Processing Power, Not Volume, Is the Constraint — Following Chris Miller's Chip, the real bottleneck is not data abundance but computational capacity and platform control. Abundance alone does not confer agency; openness without capacity can entrench inequality.

  3. The Data Sovereignty Trilemma — Developing nations face three paths: digital ascendancy (dependence on foreign cloud/AI platforms), digital capitulation (one-sided data concessions), or digital sovereignty (regulated openness + domestic capacity). Most Global South countries drift toward the first two.

  4. Value Creation Detaches from Data Generation — Data produced in India (mobility, payments, health) is stored, processed, and monetized abroad. The location where data originates is not where value is extracted. Without sovereignty safeguards, benefits accrue to platform headquarters, not data sources.

  5. Structured Openness, Not Chaotic Transparency — "Openness without strategy creates imbalance; openness with guardrails creates resilience." Open data requires clarity of purpose, safeguards, consent, accountability, and domestic capacity building—not indiscriminate release.

  6. "Dark Data" Problem — Government collects vast institutional datasets (NPCI payment data, CERT cyber intelligence, regulatory records) that remain siloed and unused. Opening this data, with safeguards, could unlock innovation in fintech, cybersecurity, and other domains.

  7. AI Requires a New Data Standard — LLMs, small language models, and AI systems demand real-time API access, metadata standards, and interoperability—not static downloads. Current open data platforms (data.gov.in) predate this need.

  8. Citizen Trust Is the Political Bottleneck — Citizens must actively consent to share their anonymized data (UPI transactions, health records, farmer data) for AI model training and policy improvement. This is a political and ethical question, not purely technical.

  9. Sectoral vs. Centralized Approach — One federated national repository will not work. Healthcare, agriculture, financial, and education data require sector-specific opening policies aligned with regulators (RBI, Ministry of Agriculture, etc.).

  10. Soft-Touch Regulation Over EU's GDPR Model — India should avoid Europe's approach, which prioritizes data protection over innovation. The goal is a middle path: regulatory clarity that enables sustainable innovation while protecting sovereignty and rights.


Notable Quotes or Statements

Dr. Shashi Tharoor (Keynote): "Data is the raw material. If governments hold the richest datasets, then refusing to regulate sharing properly is like building a digital economy and locking the warehouse."

Dr. Shashi Tharoor: "The location where data is produced is not necessarily the location where value is created." — On digital asymmetry and why data sovereignty matters for developing nations.

Moderator (via "PM Hacker" analogy): "The plumbing without it the rest is just PowerPoint." — On open data as foundational infrastructure.

Dr. Shashi Tharoor: "Openness without strategy creates imbalance; openness with guardrails creates resilience."

Mr. Sil Shrot (Sil Amachan Mangaladas): "The absence of a legal framework goes from being an inconvenience to an impediment in the development of a sustainable data economy."

Dr. Sasit Patra (Parliament): "The political question is: how many citizens are willing to share their anonymized UPI transactions for LLM training? That's where the catch is."

Ms. Vedashree Gupta: "We need a federated open data strategy. Data needs to be opened at the sectoral level, not in one centralized repository."

Ms. Arena Gosh (Anthropic India): "Trust for all of us needs to be a verifiable outcome." — On why transparency in data use and AI decisions is non-negotiable.

Miss Asha Jadeja Motwani (Venture Capitalist): "If we've decided to work with the Americans on the stack, then at a policy level we need a joint regulatory framework so we are never conflicting with them." — On pragmatic coordination with democratic partners.

Dr. Shashi Tharoor (Closing): "We must emerge as a digital sovereign empowered to protect our own giants and capture the wealth generated by our own data... not subject ourselves to subordinate status under a new extractive digital Raj."


Speakers & Organizations Mentioned

Core Panelists

  • Dr. Shashi Tharoor — Indian politician, author, keynote speaker; former UN official; expertise in digital governance and geopolitics
  • Ms. Vedashree Gupta — Distinguished figure in open government data and regulatory frameworks; 35+ years in digital industry
  • Ms. Arena Gosh — Managing Director, Anthropic India; AI company perspective
  • Mr. Sil Shrot — Partner and founding managing partner, Sil Amachan Mangaladas (law firm); benefactor of the Sil Shrot Center for AI Law and Regulation
  • Dr. Sasit Patra — Member of Parliament (India); parliamentary oversight on communications and IT; academic background
  • Mr. Arun Prabhu — Partner, Sil Amachan Mangaladas; digital and TMT (telecom, media, technology) practice
  • Miss Asha Jadeja Motwani — Founder, Motwani Jadeja Foundation; venture capitalist; India-US relations expert

Organizations & Initiatives Referenced

  • India AI Impact Summit 2026 — Host event
  • Sil Amachan Mangaladas — Major Indian law firm (corporate, regulatory)
  • Anthropic — AI research company with India operations
  • Motwani Jadeja Foundation — Philanthropic organization focused on India-US ties and AI/tech
  • Motwani Jadeja Institute for American Studies — Established at an Indian university (implied from context)
  • Ministry of Electronics and IT (India) — Government AI guidelines
  • Reserve Bank of India (RBI) — Financial regulator
  • National Payments Corporation of India (NPCI) — Payment systems operator; holds fintech data
  • CERT (Cyber Emergency Response Team) — Cybersecurity agency with institutional data
  • World Bank — Open development datasets
  • G20 (India's 2023 Presidency) — New Delhi leaders declaration on digital public infrastructure
  • United Nations — Global Digital Compact
  • European Union — GDPR, Data Protection Regulation, Digital Markets Act
  • UK Cabinet Office (referenced via fictional scenario) — Comparative governance model

Technical Concepts & Resources

Data Standards & Infrastructure

  • API-Ready Data — Real-time data access via application programming interfaces (contrast to static CSV/PDF downloads)
  • Metadata Standards — Descriptive information about datasets enabling interoperability and machine readability
  • Model Context Protocol (MCP) — Framework created by Anthropic (2024) for standardized AI data access; adopted by Linux Foundation; enables universal data connectors
  • Anonymization Protocols — Technical and legal methods to remove personally identifiable information while preserving analytical utility
  • Interoperability Standards — Technical specifications enabling data from multiple sources to work seamlessly (e.g., federated architectures)

Data Governance Models

  • Federated Open Data Strategy — Sector-specific data opening (healthcare, agriculture, fintech) rather than centralized repository
  • Tiered Access Model — Free (public), paid (commercial), and restricted (national security/sensitive) data tiers
  • Non-Personal Data Framework — Regulatory focus on aggregated, anonymized data separate from personal data protection
  • Sectoral Data Opening Policies — UK Payment Systems Directive, EU Financial Data Access (FIDA), healthcare initiatives as templates

Relevant AI/ML Models

  • Large Language Models (LLMs) — Primary use case for aggregated training data; need contextual, domain-specific data (agriculture, health, legal)
  • Small Language Models (SLMs) — Localized LLMs requiring language-specific, region-specific datasets
  • Synthetic Data — Artificially generated data preserving statistical properties while protecting privacy

Legal/Regulatory Frameworks Referenced

  • India Stack — Public digital infrastructure (Aadhaar digital ID, UPI payments, DigiLocker document exchange); model for other developing nations
  • Digital Personal Data Protection Act (India) — Privacy law providing foundation for consent and control
  • Non-Personal Data Governance Framework (India) — Parallel track for aggregated/anonymized data
  • EU General Data Protection Regulation (GDPR) — High-privacy-protection model (criticized as innovation-limiting)
  • Payment Systems Directive (EU/UK) — Mandates banks share payment data for fintech innovation
  • Open Banking Initiative — Financial data-sharing model enabling third-party innovation
  • Ayushman Bharat Mission — Indian health initiative expected to generate aggregated health data
  • Pradhan Mantri Fasal Bima Yojana (PMFBY) — Crop insurance scheme generating agricultural data
  • Data Protection Impact Assessments (DPIA) — Legal requirement to assess privacy risks
  • Putaswami Judgment (India) — Constitutional ruling on privacy as fundamental right; cited as foundational for data governance

Policy & Economic Concepts

  • Data Sovereignty — National capacity to regulate how data serves domestic development priorities
  • Digital Ascendancy — Concentration of data processing, storage, and AI development in foreign (US/EU) platforms
  • Digital Capitulation — One-sided data concessions in trade agreements (Indonesia, Malaysia examples cited)
  • Dark Data — Institutional data collected but unused/siloed (NPCI, CERT, regulators, commercial enterprises)
  • Evidence-Based Policymaking — Using aggregated data to predict needs (e.g., crop loss districts) and design targeted interventions
  • Humanist AI — AI serving broad population (farmers, health workers, tribals) not just elite segments
  • Vasudhaiva Kutumbakam — Sanskrit concept of "whole world is one family"; cited as cornerstone of inclusive Indian AI philosophy
  • Soft-Touch Regulation — Lighter regulatory hand than EU GDPR; clarity without over-prescription

Data Sources & Use Cases Discussed

  • Government Data — Spending, welfare, weather, agricultural, health
  • Institutional Data — NPCI (payments), CERT (cybersecurity), RBI (financial), regulators
  • Commercial Data — Payment transactions, health records, user behavior (requires anonymization)
  • Meteorological Data — US example: releasing weather data freely created ecosystem in forecasting, logistics, insurance
  • Health Data — COVID dashboards; rare disease cases; mental health (sensitive; requires strong safeguards)
  • Agricultural Data — Soil, irrigation, market indices, crop loss patterns
  • Mobility Data — Ride-sharing, transportation patterns
  • Education Data — Learning outcomes, enrollment
  • Citizen Data — UPI transactions, Aadhaar, welfare program enrollment

Policy Implications & Recommendations

Regulatory Framework Elements

  1. Clear Purpose and Safeguards — Data release must state why, for whom, with what protections
  2. Anonymization Standards — Legally recognized, technically robust standards preventing re-identification
  3. Consent and Control — Individuals/communities have meaningful agency; informed, revocable consent with grievance mechanisms
  4. Accountability Architecture — Clear standards for access, independent oversight, remedies for misuse
  5. Domestic Capacity Building — Public data must strengthen local research, startups, digital infrastructure—not just circulate globally
  6. Sectoral Policies — Healthcare, agriculture, finance, education get tailored opening strategies
  7. Interoperability Standards — Metadata, APIs, technical specifications enabling seamless data flows
  8. Data Localization with Cooperation — Data can flow across borders, but processing and initial storage respect national interests

Against Overregulation

  • Avoid EU GDPR's innovation-limiting approach
  • Balance protection with growth
  • Avoid creating legal liability that discourages officials and entrepreneurs
  • Soft-touch regulation: clarity without prescription

Geopolitical Safeguards

  • Joint regulatory frameworks with democratic partners (US, EU) to avoid conflicts
  • Domestic digital law takes precedence over foreign commitments
  • Trade agreements must not lock in data dependency or digital capitulation
  • Mandatory benefit-sharing when Indian data trains global AI models

Gaps, Uncertainties & Open Questions

  1. Scale of Implementation — How do sectoral policies coordinate across 35+ central ministries and state governments?
  2. Enforcement Capacity — Does India have institutional capacity to audit, enforce, and remediate data misuse at scale?
  3. Consent at Scale — How do you obtain informed, revocable consent from 1.4 billion citizens for data aggregation?
  4. Trust Barriers — Citizens remain skeptical of government and corporate data use; will regulation alone overcome this?
  5. Real-World Impact — Will "AI for agriculture" reach the vast majority of farmers without tractors, electricity, or water security?
  6. Judiciary Readiness — Can India's overburdened courts (50 million pending cases) provide meaningful dispute resolution for data governance?
  7. Technology Transfer — Will opening Indian health/agricultural data to Western AI researchers result in affordable, locally-deployed solutions or proprietary, unaffordable products?
  8. US Stack Dependency — If India relies on US chips, cloud infrastructure, and AI models, how enforceable are data sovereignty safeguards against US government/corporate pressure?

Conclusion

The panel consensus: India must move from voluntary open data initiatives to a structured, statutory regulatory framework that mandates government data sharing in "AI-ready" formats while protecting privacy, ensuring consent, building domestic capacity, and preventing asymmetric value extraction. This is