Genomics and AI for Global Health | Empowering the Global South Through Secure Data Access
Contents
Executive Summary
This AI summit panel addressed the critical gap in genomic data representation and access between the Global North and Global South, presenting technological and policy solutions for equitable global health collaboration. The speakers demonstrated that privacy-preserving data sharing methods—particularly "data visitation" approaches—enable collaborative AI research while protecting data sovereignty, and emphasized that genuine progress requires technical innovation coupled with ethical frameworks, human-centered design, and cross-sector commitment to benefit all populations equitably.
Key Takeaways
-
Data visitation and federated learning are proven technologies for health equity: Proof-of-concept deployments (Bermuda genetics, NHS COVID testing) demonstrate that collaborative genomic research is possible without centralized data movement, enabling Global South institutions to participate with low infrastructure cost and full data control.
-
The Global South must lead, not follow, in building solutions: 85% of humanity lives in the Global South; genomic AI must be tailored to their disease burdens, populations, infrastructure realities, and sovereignty requirements—not imposed from the North. India's digital public infrastructure (Aadhaar, APIs, large population) positions it as a natural leader.
-
Embed ethics and governance into code, not compliance documents: As AI systems accelerate beyond traditional oversight, principles like FAIR (Findability, Accessibility, Interoperability, Reusability), CARE (Collective Benefit, Authority to Decide, Responsibility, Ethics), and PILOT (Purpose, Integrity, Law, Openness, Test) must be hardcoded into algorithms and platforms.
-
Real-world impact requires cross-sector collaboration and local champions: Technology alone fails. Success requires: individual scientists/engineers with passion, institutional support from hospitals/universities, policy frameworks protecting sovereignty, and sustained dialogue with affected communities. Boots on the ground matter as much as cutting-edge methods.
-
The Human Genome Project 2 is fundamentally about making genomic medicine a public good for humanity, not just researchers or companies: Success is measured in whether populations globally live longer and healthier—requiring equitable data access, benefit-sharing, and locally-tailored implementations across all 14 participating countries and the Global South.
Key Topics Covered
- Global data imbalance in genomics and AI: Underrepresentation of Global South populations in biomedical databases and AI training
- Data sovereignty and privacy protection: Legal, regulatory, and ethical constraints on data sharing across borders
- Data visitation as an alternative to data centralization: Keeping data local while enabling remote analysis through code/query submission
- Federated learning in healthcare: Distributed model training across hospitals without centralizing sensitive patient data
- The Human Genome Project 2: International effort to expand genomic research beyond the Caucasian-dominated original project
- Precision medicine and public health equity: Making genomic advances accessible to diverse populations globally
- Ethics in AI health systems: Building ethical considerations into technology rather than treating them as post-hoc compliance
- Real-world pilot projects: Bermuda-based Caribbean genetics research and COVID-19 screening across NHS hospitals
- Implementation barriers and solutions: Infrastructure, expertise, cost, and institutional adoption challenges
- Multi-stakeholder collaboration models: Coordinating scientists, engineers, clinicians, policymakers, and communities
Key Points & Insights
-
Data representation crisis: 85% of global population (Global South) is dramatically underrepresented in major biomedical databases used to train AI models. Original Human Genome Project relied heavily on Caucasian populations, skewing all downstream precision medicine applications.
-
Disease burden disparity: Global South populations (e.g., India) experience higher disease burdens (diabetes, coronary heart disease) but lack access to genomic research and AI-driven diagnostics developed from populations they don't represent.
-
Data visitation principle: Rather than copying sensitive data to central servers, "data visitation" sends computational queries and code to the data, with only results returned. This preserves sovereignty, prevents data leakage, and enables multi-party collaboration without uploading terabytes of sensitive health information.
-
Biovault as practical implementation: Open-source, privacy-first platform enabling federated encrypted computation across distributed data sites. Supports arbitrary analysis (RNA sequencing, ML training, clinical inference) while keeping data local and encrypted end-to-end.
-
Real-world proof of concept (Caribbean genetics): Dr. Kar Weldon's Oxford nanopore sequencing data in Bermuda was analyzed for allele frequencies and disease classification without being uploaded anywhere—enabling "world-first" research on previously inaccessible data due to sensitivity concerns.
-
Federated learning in NHS COVID-19 response: Raspberry Pi devices (£40 credit-card-sized computers running Linux) deployed to four NHS hospitals. Models trained locally on each hospital's data, parameters securely aggregated centrally—produced better generalization (seen broader data diversity) than traditional centralized approaches.
-
Sovereignty + privacy + utility trade-off: Traditional models force choice between (a) maximum utility with zero privacy/control or (b) zero benefit from isolation. Data visitation aims for "top-right corner": maximum utility and maximum privacy through preservation of governance and encryption.
-
Ethics must be embedded in technology, not in papers: Given AI's speed and scale, ethics committees, informed consent forms, and regulatory PDFs are insufficient. Ethical principles (purpose, integrity, law, openness) must be built into algorithms and systems architectures.
-
Quality control as critical use case: Data visitation enables not just analysis but quality assurance—determining real-world data validity and whether datasets accurately represent the populations they purport to represent (essential since AI has no access to ground truth).
-
Passion + domain expertise = implementation success: Successful projects require: (a) individual champions with vision, (b) technologists paired with domain experts (doctors, farmers, clinicians), (c) local data solving local problems (one agricultural solution for UK ≠ solution for Kerala), and (d) clear value proposition (money, health, sustainability).
Notable Quotes or Statements
"The Global South must lead, not follow. Do we need [expensive Northern solutions]? No. Do we want them? Yes. But we want solutions that are affordable, tailor-made for the people of the Global South, easily accessible, and openly available to everyone."
— Vipin Bhatnagar (implied speaker on Global South representation)
"Data once copied cannot be uncopied. You should consider what it means to share and collaborate with data."
— Mava J, Open Mind Foundation
"We're in a post-AI world now. Technology is amazing. We're in an engineering bottleneck. Bringing this technology to where it needs to go requires boots on the ground—people with expertise who can integrate it with people who have the problems."
— Andrew Sultan or panelist (attribution unclear from transcript)
"The important thing for you, young audience and students, is: don't ask 'what should I work on?' Ask 'what am I passionate about?' Find the data, find the problem, find the value. Be good at what you love, and the solution will follow."
— Vipin Bhatnagar (core message synthesized from panel discussion)
"For it to be a truly a public good, it has to touch each and every one of you and your families and your children. We truly live longer and healthier because we fulfilled the promise of the human genome as a public good for everyone."
— Closing remarks speaker (likely Wade or organizing committee member)
"Morality is the basis of all things, and truth is the substance of all morality. Ethics and science have one thing in common: veracity—truth. We have to bring those things together within our AI technologies."
— Francis Cwley, policy/ethics panelist
Speakers & Organizations Mentioned
| Speaker | Role / Organization | Contribution |
|---|---|---|
| Vipin Bhatnagar (implied) | Global South representation speaker | Framed data equity gap, disease burden, India's infrastructure advantages |
| Mava J | Principal Engineer, Open Mind Foundation | Presented Biovault: privacy-first data visitation platform |
| Dr. Kar Weldon | Founder/CEO, Terogenetics (Bermuda) | Real-world Caribbean genetics pilot using data visitation |
| Dr. Rana Dejani | Professor of Molecular Biology, Hashimite University (Jordan) | Recorded testimony on epigenetics, trauma, refugee populations; tested Biovault |
| Andrew Sultan | University of Oxford / NHS | COVID-19 screening model using federated learning with Raspberry Pi devices |
| Francis Cwley | Policy/ethics panelist | Discussed embedded ethics, FAIR/CARE/PILOT principles, quality control frameworks |
| Dawn Chen | Moderator | Session facilitator and closing remarks |
| Unidentified closing speaker | Likely Wade or committee member | Human Genome Project 2 vision, multi-stakeholder mobilization |
Key Institutions / Programs:
- Human Genome Project 2: International initiative (14 countries) for equitable global genomic research
- Open Mind Foundation: Developer of Biovault platform
- Oxford University Hospitals / NHS: Federated learning COVID-19 screening deployment across Birmingham, Bedford, Portsmouth, Oxford
- WHO: Referenced for global health summit context
- European Open Science Club & Research Data Alliance: Published ethics guidelines for AI in health research
- ICMR (Indian Council of Medical Research): Mentioned as strong ethics regulator in India
- Aadhaar: India's digital public infrastructure (referenced as asset for data-driven health solutions)
Technical Concepts & Resources
Methodologies & Frameworks
| Concept | Definition | Application |
|---|---|---|
| Data Visitation | Code/queries travel to data; only results returned; data never copied or centralized | Multi-institutional genomic research, privacy-preserving collaboration |
| Federated Learning | Model training distributed across decentralized nodes; only model parameters (weights) aggregated centrally | NHS COVID screening: trained on 4 hospitals simultaneously without data movement |
| Federated Encrypted Computation | Peer-to-peer topology with end-to-end encryption; supports >2 parties collaborating | Biovault's infrastructure for arbitrary multi-party analysis |
| FAIR Principles | Findability, Accessibility, Interoperability, Reusability (for data) | Standard for open science data stewardship |
| CARE Principles | Collective Benefit, Authority to Decide, Responsibility, Ethics (for Indigenous/community data) | Equity-centered data governance |
| PILOT Principles | Purpose, Integrity, Law, Openness, Test (for AI in health) | Emerging framework for ethics embedded in AI systems |
Technologies & Platforms
- Biovault: Open-source, desktop utility; supports Windows; end-to-end encryption; extensions via Nextflow, Jupyter; keeps data local via facade interface
- Raspberry Pi: £40 (~5,000 INR) credit-card-sized Linux computer used as federated learning node in NHS hospitals; removable micro SD card for secure data destruction
- Oxford Nanopore sequencing: Long-read DNA sequencing technology used in Bermuda pilot
- Nextflow: Workflow framework supported by Biovault for computational analysis
- Jupyter: Notebook interface for interactive analysis on Biovault
- Ubuntu: Operating system run on Raspberry Pi nodes with user-friendly GUI
Datasets & Research Domains
- Human Genome Project (original, 2003): Caucasian-dominated reference; genomic blueprint but lacked diversity
- Human Genome Project 2 (2026): Expanded to 14 countries; goal: precision public health for global populations
- Single-cell RNA sequencing: Advanced genomics for cancer cell analysis (use case demonstrated in Biovault preprint)
- Clinical datasets: Remote inference on large NHS datasets using federated learning
- Caribbean population genetic data: Allele frequencies, disease classification (APOL1 chronic kidney disease prevalence in ancestry groups) analyzed via Biovault without upload
- Nomad database: Global ancestry and variant database; used for comparative validation in Caribbean study
- NHS COVID-19 screening data: 72,000 patients across 4 sites; tabular vital signs + blood panels; 45-minute screening result vs. 3+ hours for PCR tests
Papers & Resources Mentioned
- Biovault preprint (uploaded to bioRxiv days before conference): Implementation details, single-cell RNA analysis, ML training, privacy-preserving genome analysis, real-world use cases
- "Defensible business model" blog post (referenced by Mava J): Discusses private data as defensible resource in AI economy
- European Open Science Club & Research Data Alliance publications: Ethics guidelines for AI in health research; informed consent frameworks; quality control considerations for AI-driven biomedical research
- QR code (conference slide): Direct link to Biovault preprint on bioRxiv
Key Regulatory/Policy Frameworks Mentioned
- GDPR (EU) & UK Data Protection Law: Restrict cross-border human data transfer; necessitate federated approaches
- ICMR (India): Strong ethics regulations for biomedical research
- Data Sovereignty: Legal requirement in many countries (e.g., Jordan, Bermuda) limiting data export
Conclusion
This summit talk synthesized cutting-edge technical innovations (data visitation, federated learning) with urgent equity challenges (Global South underrepresentation in genomics, disease burden disparities) and ethical imperatives (embedding governance in code). The core message: technology enables equitable collaboration only when paired with local leadership, cross-sector commitment, and ethics-first design. Real deployments (Caribbean genetics, NHS COVID screening) prove viability; Human Genome Project 2 signals global intent. Success requires individual champions, institutional support, policy frameworks, and a fundamental reorientation toward human health outcomes as the measure of success—not papers published or companies funded.
