Advancing Safe & Equitable AI in Healthcare Systems | India AI Impact Summit 2026
Contents
Executive Summary
This panel discussion from the India AI Impact Summit 2026 examines critical gaps in how AI health tools are being evaluated, arguing that current evaluations focus narrowly on model accuracy while ignoring implementation context, user adoption, and real-world health outcomes. The panelists—spanning academia, global health organizations, regulatory bodies, and impact investing—present a framework for intentional AI evaluation that considers ecosystem fit, operational metrics, data sovereignty, and equity before scaling.
Key Takeaways
-
Evaluation must precede scale, not follow it: Before deploying AI tools at population scale, systematically test implementation fit—user adoption, infrastructure compatibility, and proximal outcomes—not just model accuracy.
-
Context is not an afterthought; it determines everything: An excellent AI model in the wrong setting (wrong population, wrong workflow, wrong infrastructure) will have zero health impact. Implementation readiness assessment is as critical as clinical validation.
-
Build shared language before building regulation: Without taxonomy that developers, clinicians, policymakers, and regulators all understand, fair procurement and evaluation are impossible. WHO's approach of categorizing AI by function (not by product) enables comparison and standardization.
-
Data sovereignty and DPI are enablers, not solutions: India's ABDM, state data exchanges, and repositories provide necessary infrastructure, but health sector workforce lacks awareness of governance norms. IEC and capacity building must accompany infrastructure.
-
ROI and adoption are business questions, not just academic ones: Hospital administrators, state health officials, and governments must have transparent metrics for procurement decisions and budget allocation. Without integrating AI into health budgets (NHM, PIPs), public health impact will not materialize.
Key Topics Covered
- Evaluation methodology gaps: Current focus on model-level metrics (accuracy, benchmarks) versus ecosystem-level outcomes
- Implementation context: How environmental, organizational, and human factors determine whether AI tools actually get used
- Digital public infrastructure (DPI): Role of foundational systems (ABDM, data repositories, shared platforms) in enabling AI at scale
- Data sovereignty and security: India's approach to data governance, consent management, and federated research
- Taxonomy and standardization: Need for common language across health, tech, and policy sectors to enable fair evaluation and procurement
- Workforce capacity building: Training needs for reviewers, policymakers, and health professionals to understand AI in healthcare contexts
- Operational and economic evaluation: Metrics beyond clinical efficacy—including adoption, cost-effectiveness, and budget integration
- Regulatory and governance frameworks: WHO's agile approach to AI regulation without one-size-fits-all mandates
- Equity and accessibility: Ensuring AI doesn't amplify health inequities in resource-constrained settings
Key Points & Insights
-
Evidence gaps persist at scale: Two decades of digital health research show persistent methodological gaps and limited attention to implementation context. With AI, "we risk repeating the same mistakes but at scale at speed and with potentially greater consequences" (Dr. Smisha Agarwal, Johns Hopkins Center for Global Digital Health Innovation).
-
Model accuracy ≠ health impact: A 99% accurate risk prediction model has zero health impact if deployed in a context where:
- Health workers lack training or trust the output
- Facilities cannot act on the results
- Infrastructure is unreliable (electricity, connectivity)
- The recommendation conflicts with resource availability
-
Intervention-context fit drives outcomes: Same intervention yields vastly different results depending on population characteristics. SMS messaging for cervical cancer screening (younger population, higher smartphone access) showed 4x impact vs. negligible impact for hypertension management (older population).
-
Three pillars of evaluation are missing: Current evaluations address only clinical rigor; they must also assess:
- Operational fit: Will health workers adopt it? Will it reduce workload or add burden?
- Economic evaluation: What is actual ROI in resource-constrained settings?
- Implementation pathways: Does it fit existing care delivery workflows?
-
Taxonomy is foundational for regulation and procurement: Without shared vocabulary across technologists, clinicians, and policymakers, evaluation standards cannot be applied or compared. WHO's approach categorizes AI by function (computer vision, CDSS, ambient AI with sensors, digital twins) to enable standardized evaluation frameworks.
-
Data sovereignty is operationalized in India:
- Public-funded research data must be deposited in government repositories (DBT, DST, ICMR)
- Foreign-funded research: compute and storage must remain within Indian boundaries
- State-level data exchanges (Telangana, Odisha) and ABDM consent manager provide federated governance
- However, IEC (information, education, communication) gap remains—health sector lacks awareness of data privacy norms vs. finance sector
-
DPI enables but doesn't guarantee scale: India's success with ID (Aadhaar), vaccination registries, and payment systems shows that when right ecosystem, incentives, and infrastructure align, rapid adoption is possible. This substrate is necessary but insufficient for AI without complementary clinical, operational, and procurement strategies.
-
RCTs are not the only path: Commercial sectors use rapid AB testing, process evaluations, and active monitoring instead of lengthy baseline-midline-endline studies. These methodologies must be adapted for public health while maintaining rigor and neutrality—but innovation in evaluation science itself is needed.
-
Clinician behavior is endogenous to adoption: Study finding: X-ray reading AI adoption was higher among already-skilled radiologists than unskilled ones, contradicting the assumption that AI helps those with lowest baseline capability. Human psychology, trust, and professional identity shape adoption in ways technical specifications cannot predict.
-
Outcome measurement must shift proximal: Current evaluations overindex on distant outcomes (e.g., "AI CDSS → better diabetes control"). This logic chain depends on factors outside intervention scope (drug availability, affordability, patient habits). Evaluation should measure proximal outcomes closer to intervention (e.g., "CDSS improves quality of care recommendations").
Notable Quotes or Statements
-
Dr. Smisha Agarwal (Johns Hopkins): "Most innovations fail due to lack of intervention context fit... We can have a perfect model but we can't actually act on the results and ultimately we see no health impact from the best working models."
-
Dr. Smisha Agarwal: "The right solution starts with the problem. Often in innovation space we have a solution and we're trying to find a problem."
-
Samir Pajari (WHO): "You cannot regulate single product for single place. It's different in different countries but it has to meet the problem statements in the areas... formulas have to be localized."
-
Dr. Mona Douglas (ICMR): "In health we are [still] that much backward [compared to finance on data privacy awareness]... the implementation part, the IEC part has to come out or it's going to be a long road."
-
Arjun Wentaran (Gates Foundation): "When you're thinking of developmental outcomes or population scale outcomes... that math may not always necessarily hold because those people aren't going to be paying you... that requires solid procurement structures, good thinking, transparent standards."
-
Dr. Sarang Deo (ISB): "Understanding this pathway is important to say which tool would be appropriate for me to have an impact... If states are going to buy AI tools, they have to enter into some budget line item somewhere. Without that there's going to be no public health impact."
-
Moderator (Suramya Heeren): "Change our outlook from 'did the doctor do any good while using the tool' to 'did the patient have a good experience.' That's the ground truthing device loop."
Speakers & Organizations Mentioned
| Speaker | Title / Organization |
|---|---|
| Dr. Smisha Agarwal | Director, Johns Hopkins Center for Global Digital Health Innovation; Associate Professor, Johns Hopkins School of Medicine |
| Dr. Mona Douglas | Director, National Institute of Research and Digital Health and Data Science (ICMR); Ophthalmologist |
| Samir Pajari | Senior Leader, AI for Health, World Health Organization (WHO) |
| Dr. Sarang Deo | Professor, Operations Management, Indian School of Business (ISB) |
| Arjun Wentaran | Senior Program Officer, Gates Foundation |
| Suramya Heeren | Moderator (affiliation not explicitly stated; appears to be conference organizer/facilitator) |
Key Institutions:
- Johns Hopkins Center for Global Digital Health Innovation
- Indian Council of Medical Research (ICMR)
- World Health Organization (WHO)
- Indian School of Business (ISB)
- Bill & Melinda Gates Foundation
- Ministry of Health & Family Welfare (India)
- Ayushman Bharat Digital Mission (ABDM)
Technical Concepts & Resources
Frameworks & Guidelines
- WHO Digital Health Guidelines (2019): Evidence base on how digitization improves healthcare; showed poor evidence due to prioritizing wrong outcomes
- WHO Digital Health Taxonomy: Categorizes digital health interventions (mHealth, eHealth, SMS programs, etc.)
- WHO AI Taxonomy (in development): Six functional categories—computer vision, CDSS, ambient AI with sensors, digital twins, language models, and others
- DPDP Act (India): Data Protection and Personal Data Protection legislation governing data privacy
Technical & Policy Tools
- ABDM (Ayushman Bharat Digital Mission): National health data exchange with consent manager; enables federated data governance at state level
- Data Repositories: DBT (Department of Biotechnology), DST (Department of Science & Technology), ICMR repositories for public-funded research
- Federated Research Systems: State-level computational health units enabling secure, local AI development without centralizing sensitive health data
Evaluation Methodologies
- RCT (Randomized Controlled Trial): Traditional but time-intensive; not always feasible for rapidly evolving AI tools
- AB Testing: Rapid, simultaneous comparison of control and test versions; enables faster iteration
- Process Evaluation: Assesses how and why interventions work in real-world contexts
- Implementation Readiness Assessment: Ecosystem evaluation covering infrastructure, workforce, trust, workflow fit
- Proximal Outcome Measurement: Measuring outcomes closer to intervention rather than distant downstream health outcomes
Clinical AI Use Cases Mentioned
- Chatbots: Neonatal care, infant nutrition, palliative care management
- Ambient Scribes: Automated documentation of clinical encounters and electronic health records
- Disease Outbreak Prediction: Algorithmic epidemiology
- Diabetic Retinopathy Detection: Computer vision for ophthalmology screening
- Radiograph Interpretation: AI-assisted X-ray reading by informal healthcare providers
- CDSS (Clinical Decision Support Systems): AI-assisted diagnosis/management recommendations; increasingly incorporating large language models
- Non-alcoholic Fatty Liver Disease (NAFLD) Diagnosis: AI tool deployment at tertiary vs. primary care levels
- Discharge Summary Generation: NLP tool reducing administrative burden; example of non-clinical AI use case
Key Concepts & Terms
- Intervention-Context Fit: Alignment between AI tool design and population characteristics, infrastructure, workflow
- Fidelity: Actual usage rate and depth of engagement (vs. theoretical capability)
- Taxonomy/Ontology: Evolving shared vocabulary for categorizing and evaluating AI tools
- DPI (Digital Public Infrastructure): Foundational systems (registries, data exchanges, identity systems) enabling AI at scale
- Counterfactual: Comparator condition (e.g., paper-based support vs. digital decision support without AI)
- IEC (Information, Education, Communication): Awareness and training for stakeholders on policy, governance, and best practices
- Model Sophistication vs. Adoption Fit: Tension between technical capability and willingness/ability of users to implement
Methodological & Thematic Gaps Identified
- Evidence lag behind innovation: Current evaluation pace cannot match AI development velocity
- Evaluation plurality underutilized: Most evaluations use only clinical/academic framing; operational, economic, and equity framings are missing
- Workforce capacity shortage: Inadequate training for proposal reviewers, state-level implementers, and health professionals to assess AI critically
- Procurement disconnects: No standardized language for governments to compare and procure AI solutions; budget integration into NHM/PIPs is absent
- Human behavior underestimated: Adoption models rarely account for clinician psychology, trust dynamics, or professional identity threats
- Cost of complexity: Complex, black-box AI models generate lower trust and fidelity than simpler, more interpretable tools in resource-constrained settings
Conference: India AI Impact Summit 2026
URL: https://www.youtube.com/watch?v=vvjRbQV6k9E
Focus Area: Safe, Equitable AI Deployment in Healthcare Systems
