Advancing Safe & Equitable AI in Healthcare Systems | India AI Impact Summit 2026

Contents

Executive Summary

This panel discussion from the India AI Impact Summit 2026 examines critical gaps in how AI health tools are being evaluated, arguing that current evaluations focus narrowly on model accuracy while ignoring implementation context, user adoption, and real-world health outcomes. The panelists—spanning academia, global health organizations, regulatory bodies, and impact investing—present a framework for intentional AI evaluation that considers ecosystem fit, operational metrics, data sovereignty, and equity before scaling.

Key Takeaways

Evaluation must precede scale, not follow it: Before deploying AI tools at population scale, systematically test implementation fit—user adoption, infrastructure compatibility, and proximal outcomes—not just model accuracy.
Context is not an afterthought; it determines everything: An excellent AI model in the wrong setting (wrong population, wrong workflow, wrong infrastructure) will have zero health impact. Implementation readiness assessment is as critical as clinical validation.
Build shared language before building regulation: Without taxonomy that developers, clinicians, policymakers, and regulators all understand, fair procurement and evaluation are impossible. WHO's approach of categorizing AI by function (not by product) enables comparison and standardization.
Data sovereignty and DPI are enablers, not solutions: India's ABDM, state data exchanges, and repositories provide necessary infrastructure, but health sector workforce lacks awareness of governance norms. IEC and capacity building must accompany infrastructure.
ROI and adoption are business questions, not just academic ones: Hospital administrators, state health officials, and governments must have transparent metrics for procurement decisions and budget allocation. Without integrating AI into health budgets (NHM, PIPs), public health impact will not materialize.

Key Topics Covered

Evaluation methodology gaps: Current focus on model-level metrics (accuracy, benchmarks) versus ecosystem-level outcomes
Implementation context: How environmental, organizational, and human factors determine whether AI tools actually get used
Digital public infrastructure (DPI): Role of foundational systems (ABDM, data repositories, shared platforms) in enabling AI at scale
Data sovereignty and security: India's approach to data governance, consent management, and federated research
Taxonomy and standardization: Need for common language across health, tech, and policy sectors to enable fair evaluation and procurement
Workforce capacity building: Training needs for reviewers, policymakers, and health professionals to understand AI in healthcare contexts
Operational and economic evaluation: Metrics beyond clinical efficacy—including adoption, cost-effectiveness, and budget integration
Regulatory and governance frameworks: WHO's agile approach to AI regulation without one-size-fits-all mandates
Equity and accessibility: Ensuring AI doesn't amplify health inequities in resource-constrained settings

Key Points & Insights

Evidence gaps persist at scale: Two decades of digital health research show persistent methodological gaps and limited attention to implementation context. With AI, "we risk repeating the same mistakes but at scale at speed and with potentially greater consequences" (Dr. Smisha Agarwal, Johns Hopkins Center for Global Digital Health Innovation).
Model accuracy ≠ health impact: A 99% accurate risk prediction model has zero health impact if deployed in a context where:
- Health workers lack training or trust the output
- Facilities cannot act on the results
- Infrastructure is unreliable (electricity, connectivity)
- The recommendation conflicts with resource availability
Intervention-context fit drives outcomes: Same intervention yields vastly different results depending on population characteristics. SMS messaging for cervical cancer screening (younger population, higher smartphone access) showed 4x impact vs. negligible impact for hypertension management (older population).
Three pillars of evaluation are missing: Current evaluations address only clinical rigor; they must also assess:
- Operational fit: Will health workers adopt it? Will it reduce workload or add burden?
- Economic evaluation: What is actual ROI in resource-constrained settings?
- Implementation pathways: Does it fit existing care delivery workflows?
Taxonomy is foundational for regulation and procurement: Without shared vocabulary across technologists, clinicians, and policymakers, evaluation standards cannot be applied or compared. WHO's approach categorizes AI by function (computer vision, CDSS, ambient AI with sensors, digital twins) to enable standardized evaluation frameworks.
Data sovereignty is operationalized in India:
- Public-funded research data must be deposited in government repositories (DBT, DST, ICMR)
- Foreign-funded research: compute and storage must remain within Indian boundaries
- State-level data exchanges (Telangana, Odisha) and ABDM consent manager provide federated governance
- However, IEC (information, education, communication) gap remains—health sector lacks awareness of data privacy norms vs. finance sector
DPI enables but doesn't guarantee scale: India's success with ID (Aadhaar), vaccination registries, and payment systems shows that when right ecosystem, incentives, and infrastructure align, rapid adoption is possible. This substrate is necessary but insufficient for AI without complementary clinical, operational, and procurement strategies.
RCTs are not the only path: Commercial sectors use rapid AB testing, process evaluations, and active monitoring instead of lengthy baseline-midline-endline studies. These methodologies must be adapted for public health while maintaining rigor and neutrality—but innovation in evaluation science itself is needed.
Clinician behavior is endogenous to adoption: Study finding: X-ray reading AI adoption was higher among already-skilled radiologists than unskilled ones, contradicting the assumption that AI helps those with lowest baseline capability. Human psychology, trust, and professional identity shape adoption in ways technical specifications cannot predict.
Outcome measurement must shift proximal: Current evaluations overindex on distant outcomes (e.g., "AI CDSS → better diabetes control"). This logic chain depends on factors outside intervention scope (drug availability, affordability, patient habits). Evaluation should measure proximal outcomes closer to intervention (e.g., "CDSS improves quality of care recommendations").

Notable Quotes or Statements

Dr. Smisha Agarwal (Johns Hopkins): "Most innovations fail due to lack of intervention context fit... We can have a perfect model but we can't actually act on the results and ultimately we see no health impact from the best working models."
Dr. Smisha Agarwal: "The right solution starts with the problem. Often in innovation space we have a solution and we're trying to find a problem."
Samir Pajari (WHO): "You cannot regulate single product for single place. It's different in different countries but it has to meet the problem statements in the areas... formulas have to be localized."
Dr. Mona Douglas (ICMR): "In health we are [still] that much backward [compared to finance on data privacy awareness]... the implementation part, the IEC part has to come out or it's going to be a long road."
Arjun Wentaran (Gates Foundation): "When you're thinking of developmental outcomes or population scale outcomes... that math may not always necessarily hold because those people aren't going to be paying you... that requires solid procurement structures, good thinking, transparent standards."
Dr. Sarang Deo (ISB): "Understanding this pathway is important to say which tool would be appropriate for me to have an impact... If states are going to buy AI tools, they have to enter into some budget line item somewhere. Without that there's going to be no public health impact."
Moderator (Suramya Heeren): "Change our outlook from 'did the doctor do any good while using the tool' to 'did the patient have a good experience.' That's the ground truthing device loop."

Speakers & Organizations Mentioned

Speaker	Title / Organization
Dr. Smisha Agarwal	Director, Johns Hopkins Center for Global Digital Health Innovation; Associate Professor, Johns Hopkins School of Medicine
Dr. Mona Douglas	Director, National Institute of Research and Digital Health and Data Science (ICMR); Ophthalmologist
Samir Pajari	Senior Leader, AI for Health, World Health Organization (WHO)
Dr. Sarang Deo	Professor, Operations Management, Indian School of Business (ISB)
Arjun Wentaran	Senior Program Officer, Gates Foundation
Suramya Heeren	Moderator (affiliation not explicitly stated; appears to be conference organizer/facilitator)

Key Institutions:

Johns Hopkins Center for Global Digital Health Innovation
Indian Council of Medical Research (ICMR)
World Health Organization (WHO)
Indian School of Business (ISB)
Bill & Melinda Gates Foundation
Ministry of Health & Family Welfare (India)
Ayushman Bharat Digital Mission (ABDM)

Technical Concepts & Resources

Frameworks & Guidelines

WHO Digital Health Guidelines (2019): Evidence base on how digitization improves healthcare; showed poor evidence due to prioritizing wrong outcomes
WHO Digital Health Taxonomy: Categorizes digital health interventions (mHealth, eHealth, SMS programs, etc.)
WHO AI Taxonomy (in development): Six functional categories—computer vision, CDSS, ambient AI with sensors, digital twins, language models, and others
DPDP Act (India): Data Protection and Personal Data Protection legislation governing data privacy

Technical & Policy Tools

ABDM (Ayushman Bharat Digital Mission): National health data exchange with consent manager; enables federated data governance at state level
Data Repositories: DBT (Department of Biotechnology), DST (Department of Science & Technology), ICMR repositories for public-funded research
Federated Research Systems: State-level computational health units enabling secure, local AI development without centralizing sensitive health data

Evaluation Methodologies

RCT (Randomized Controlled Trial): Traditional but time-intensive; not always feasible for rapidly evolving AI tools
AB Testing: Rapid, simultaneous comparison of control and test versions; enables faster iteration
Process Evaluation: Assesses how and why interventions work in real-world contexts
Implementation Readiness Assessment: Ecosystem evaluation covering infrastructure, workforce, trust, workflow fit
Proximal Outcome Measurement: Measuring outcomes closer to intervention rather than distant downstream health outcomes

Clinical AI Use Cases Mentioned

Chatbots: Neonatal care, infant nutrition, palliative care management
Ambient Scribes: Automated documentation of clinical encounters and electronic health records
Disease Outbreak Prediction: Algorithmic epidemiology
Diabetic Retinopathy Detection: Computer vision for ophthalmology screening
Radiograph Interpretation: AI-assisted X-ray reading by informal healthcare providers
CDSS (Clinical Decision Support Systems): AI-assisted diagnosis/management recommendations; increasingly incorporating large language models
Non-alcoholic Fatty Liver Disease (NAFLD) Diagnosis: AI tool deployment at tertiary vs. primary care levels
Discharge Summary Generation: NLP tool reducing administrative burden; example of non-clinical AI use case

Key Concepts & Terms

Intervention-Context Fit: Alignment between AI tool design and population characteristics, infrastructure, workflow
Fidelity: Actual usage rate and depth of engagement (vs. theoretical capability)
Taxonomy/Ontology: Evolving shared vocabulary for categorizing and evaluating AI tools
DPI (Digital Public Infrastructure): Foundational systems (registries, data exchanges, identity systems) enabling AI at scale
Counterfactual: Comparator condition (e.g., paper-based support vs. digital decision support without AI)
IEC (Information, Education, Communication): Awareness and training for stakeholders on policy, governance, and best practices
Model Sophistication vs. Adoption Fit: Tension between technical capability and willingness/ability of users to implement

Methodological & Thematic Gaps Identified

Evidence lag behind innovation: Current evaluation pace cannot match AI development velocity
Evaluation plurality underutilized: Most evaluations use only clinical/academic framing; operational, economic, and equity framings are missing
Workforce capacity shortage: Inadequate training for proposal reviewers, state-level implementers, and health professionals to assess AI critically
Procurement disconnects: No standardized language for governments to compare and procure AI solutions; budget integration into NHM/PIPs is absent
Human behavior underestimated: Adoption models rarely account for clinician psychology, trust dynamics, or professional identity threats
Cost of complexity: Complex, black-box AI models generate lower trust and fidelity than simpler, more interpretable tools in resource-constrained settings

Conference: India AI Impact Summit 2026
URL: https://www.youtube.com/watch?v=vvjRbQV6k9E
Focus Area: Safe, Equitable AI Deployment in Healthcare Systems