Genomics and AI for Global Health | Empowering the Global South Through Secure Data Access

Contents

Executive Summary

This AI summit panel addressed the critical gap in genomic data representation and access between the Global North and Global South, presenting technological and policy solutions for equitable global health collaboration. The speakers demonstrated that privacy-preserving data sharing methods—particularly "data visitation" approaches—enable collaborative AI research while protecting data sovereignty, and emphasized that genuine progress requires technical innovation coupled with ethical frameworks, human-centered design, and cross-sector commitment to benefit all populations equitably.

Key Takeaways

Data visitation and federated learning are proven technologies for health equity: Proof-of-concept deployments (Bermuda genetics, NHS COVID testing) demonstrate that collaborative genomic research is possible without centralized data movement, enabling Global South institutions to participate with low infrastructure cost and full data control.
The Global South must lead, not follow, in building solutions: 85% of humanity lives in the Global South; genomic AI must be tailored to their disease burdens, populations, infrastructure realities, and sovereignty requirements—not imposed from the North. India's digital public infrastructure (Aadhaar, APIs, large population) positions it as a natural leader.
Embed ethics and governance into code, not compliance documents: As AI systems accelerate beyond traditional oversight, principles like FAIR (Findability, Accessibility, Interoperability, Reusability), CARE (Collective Benefit, Authority to Decide, Responsibility, Ethics), and PILOT (Purpose, Integrity, Law, Openness, Test) must be hardcoded into algorithms and platforms.
Real-world impact requires cross-sector collaboration and local champions: Technology alone fails. Success requires: individual scientists/engineers with passion, institutional support from hospitals/universities, policy frameworks protecting sovereignty, and sustained dialogue with affected communities. Boots on the ground matter as much as cutting-edge methods.
The Human Genome Project 2 is fundamentally about making genomic medicine a public good for humanity, not just researchers or companies: Success is measured in whether populations globally live longer and healthier—requiring equitable data access, benefit-sharing, and locally-tailored implementations across all 14 participating countries and the Global South.

Key Topics Covered

Global data imbalance in genomics and AI: Underrepresentation of Global South populations in biomedical databases and AI training
Data sovereignty and privacy protection: Legal, regulatory, and ethical constraints on data sharing across borders
Data visitation as an alternative to data centralization: Keeping data local while enabling remote analysis through code/query submission
Federated learning in healthcare: Distributed model training across hospitals without centralizing sensitive patient data
The Human Genome Project 2: International effort to expand genomic research beyond the Caucasian-dominated original project
Precision medicine and public health equity: Making genomic advances accessible to diverse populations globally
Ethics in AI health systems: Building ethical considerations into technology rather than treating them as post-hoc compliance
Real-world pilot projects: Bermuda-based Caribbean genetics research and COVID-19 screening across NHS hospitals
Implementation barriers and solutions: Infrastructure, expertise, cost, and institutional adoption challenges
Multi-stakeholder collaboration models: Coordinating scientists, engineers, clinicians, policymakers, and communities

Key Points & Insights

Data representation crisis: 85% of global population (Global South) is dramatically underrepresented in major biomedical databases used to train AI models. Original Human Genome Project relied heavily on Caucasian populations, skewing all downstream precision medicine applications.
Disease burden disparity: Global South populations (e.g., India) experience higher disease burdens (diabetes, coronary heart disease) but lack access to genomic research and AI-driven diagnostics developed from populations they don't represent.
Data visitation principle: Rather than copying sensitive data to central servers, "data visitation" sends computational queries and code to the data, with only results returned. This preserves sovereignty, prevents data leakage, and enables multi-party collaboration without uploading terabytes of sensitive health information.
Biovault as practical implementation: Open-source, privacy-first platform enabling federated encrypted computation across distributed data sites. Supports arbitrary analysis (RNA sequencing, ML training, clinical inference) while keeping data local and encrypted end-to-end.
Real-world proof of concept (Caribbean genetics): Dr. Kar Weldon's Oxford nanopore sequencing data in Bermuda was analyzed for allele frequencies and disease classification without being uploaded anywhere—enabling "world-first" research on previously inaccessible data due to sensitivity concerns.
Federated learning in NHS COVID-19 response: Raspberry Pi devices (£40 credit-card-sized computers running Linux) deployed to four NHS hospitals. Models trained locally on each hospital's data, parameters securely aggregated centrally—produced better generalization (seen broader data diversity) than traditional centralized approaches.
Sovereignty + privacy + utility trade-off: Traditional models force choice between (a) maximum utility with zero privacy/control or (b) zero benefit from isolation. Data visitation aims for "top-right corner": maximum utility and maximum privacy through preservation of governance and encryption.
Ethics must be embedded in technology, not in papers: Given AI's speed and scale, ethics committees, informed consent forms, and regulatory PDFs are insufficient. Ethical principles (purpose, integrity, law, openness) must be built into algorithms and systems architectures.
Quality control as critical use case: Data visitation enables not just analysis but quality assurance—determining real-world data validity and whether datasets accurately represent the populations they purport to represent (essential since AI has no access to ground truth).
Passion + domain expertise = implementation success: Successful projects require: (a) individual champions with vision, (b) technologists paired with domain experts (doctors, farmers, clinicians), (c) local data solving local problems (one agricultural solution for UK ≠ solution for Kerala), and (d) clear value proposition (money, health, sustainability).

Notable Quotes or Statements

"The Global South must lead, not follow. Do we need [expensive Northern solutions]? No. Do we want them? Yes. But we want solutions that are affordable, tailor-made for the people of the Global South, easily accessible, and openly available to everyone."
— Vipin Bhatnagar (implied speaker on Global South representation)

"Data once copied cannot be uncopied. You should consider what it means to share and collaborate with data."
— Mava J, Open Mind Foundation

"We're in a post-AI world now. Technology is amazing. We're in an engineering bottleneck. Bringing this technology to where it needs to go requires boots on the ground—people with expertise who can integrate it with people who have the problems."
— Andrew Sultan or panelist (attribution unclear from transcript)

"The important thing for you, young audience and students, is: don't ask 'what should I work on?' Ask 'what am I passionate about?' Find the data, find the problem, find the value. Be good at what you love, and the solution will follow."
— Vipin Bhatnagar (core message synthesized from panel discussion)

"For it to be a truly a public good, it has to touch each and every one of you and your families and your children. We truly live longer and healthier because we fulfilled the promise of the human genome as a public good for everyone."
— Closing remarks speaker (likely Wade or organizing committee member)

"Morality is the basis of all things, and truth is the substance of all morality. Ethics and science have one thing in common: veracity—truth. We have to bring those things together within our AI technologies."
— Francis Cwley, policy/ethics panelist

Speakers & Organizations Mentioned

Speaker	Role / Organization	Contribution
Vipin Bhatnagar (implied)	Global South representation speaker	Framed data equity gap, disease burden, India's infrastructure advantages
Mava J	Principal Engineer, Open Mind Foundation	Presented Biovault: privacy-first data visitation platform
Dr. Kar Weldon	Founder/CEO, Terogenetics (Bermuda)	Real-world Caribbean genetics pilot using data visitation
Dr. Rana Dejani	Professor of Molecular Biology, Hashimite University (Jordan)	Recorded testimony on epigenetics, trauma, refugee populations; tested Biovault
Andrew Sultan	University of Oxford / NHS	COVID-19 screening model using federated learning with Raspberry Pi devices
Francis Cwley	Policy/ethics panelist	Discussed embedded ethics, FAIR/CARE/PILOT principles, quality control frameworks
Dawn Chen	Moderator	Session facilitator and closing remarks
Unidentified closing speaker	Likely Wade or committee member	Human Genome Project 2 vision, multi-stakeholder mobilization

Key Institutions / Programs:

Human Genome Project 2: International initiative (14 countries) for equitable global genomic research
Open Mind Foundation: Developer of Biovault platform
Oxford University Hospitals / NHS: Federated learning COVID-19 screening deployment across Birmingham, Bedford, Portsmouth, Oxford
WHO: Referenced for global health summit context
European Open Science Club & Research Data Alliance: Published ethics guidelines for AI in health research
ICMR (Indian Council of Medical Research): Mentioned as strong ethics regulator in India
Aadhaar: India's digital public infrastructure (referenced as asset for data-driven health solutions)

Technical Concepts & Resources

Methodologies & Frameworks

Concept	Definition	Application
Data Visitation	Code/queries travel to data; only results returned; data never copied or centralized	Multi-institutional genomic research, privacy-preserving collaboration
Federated Learning	Model training distributed across decentralized nodes; only model parameters (weights) aggregated centrally	NHS COVID screening: trained on 4 hospitals simultaneously without data movement
Federated Encrypted Computation	Peer-to-peer topology with end-to-end encryption; supports >2 parties collaborating	Biovault's infrastructure for arbitrary multi-party analysis
FAIR Principles	Findability, Accessibility, Interoperability, Reusability (for data)	Standard for open science data stewardship
CARE Principles	Collective Benefit, Authority to Decide, Responsibility, Ethics (for Indigenous/community data)	Equity-centered data governance
PILOT Principles	Purpose, Integrity, Law, Openness, Test (for AI in health)	Emerging framework for ethics embedded in AI systems

Technologies & Platforms

Biovault: Open-source, desktop utility; supports Windows; end-to-end encryption; extensions via Nextflow, Jupyter; keeps data local via facade interface
Raspberry Pi: £40 (~5,000 INR) credit-card-sized Linux computer used as federated learning node in NHS hospitals; removable micro SD card for secure data destruction
Oxford Nanopore sequencing: Long-read DNA sequencing technology used in Bermuda pilot
Nextflow: Workflow framework supported by Biovault for computational analysis
Jupyter: Notebook interface for interactive analysis on Biovault
Ubuntu: Operating system run on Raspberry Pi nodes with user-friendly GUI

Datasets & Research Domains

Human Genome Project (original, 2003): Caucasian-dominated reference; genomic blueprint but lacked diversity
Human Genome Project 2 (2026): Expanded to 14 countries; goal: precision public health for global populations
Single-cell RNA sequencing: Advanced genomics for cancer cell analysis (use case demonstrated in Biovault preprint)
Clinical datasets: Remote inference on large NHS datasets using federated learning
Caribbean population genetic data: Allele frequencies, disease classification (APOL1 chronic kidney disease prevalence in ancestry groups) analyzed via Biovault without upload
Nomad database: Global ancestry and variant database; used for comparative validation in Caribbean study
NHS COVID-19 screening data: 72,000 patients across 4 sites; tabular vital signs + blood panels; 45-minute screening result vs. 3+ hours for PCR tests

Papers & Resources Mentioned

Biovault preprint (uploaded to bioRxiv days before conference): Implementation details, single-cell RNA analysis, ML training, privacy-preserving genome analysis, real-world use cases
"Defensible business model" blog post (referenced by Mava J): Discusses private data as defensible resource in AI economy
European Open Science Club & Research Data Alliance publications: Ethics guidelines for AI in health research; informed consent frameworks; quality control considerations for AI-driven biomedical research
QR code (conference slide): Direct link to Biovault preprint on bioRxiv

Key Regulatory/Policy Frameworks Mentioned

GDPR (EU) & UK Data Protection Law: Restrict cross-border human data transfer; necessitate federated approaches
ICMR (India): Strong ethics regulations for biomedical research
Data Sovereignty: Legal requirement in many countries (e.g., Jordan, Bermuda) limiting data export

Conclusion

This summit talk synthesized cutting-edge technical innovations (data visitation, federated learning) with urgent equity challenges (Global South underrepresentation in genomics, disease burden disparities) and ethical imperatives (embedding governance in code). The core message: technology enables equitable collaboration only when paired with local leadership, cross-sector commitment, and ethics-first design. Real deployments (Caribbean genetics, NHS COVID screening) prove viability; Human Genome Project 2 signals global intent. Success requires individual champions, institutional support, policy frameworks, and a fundamental reorientation toward human health outcomes as the measure of success—not papers published or companies funded.