AI and the State: Governing Intelligence in Government

Contents

Executive Summary

This panel discussion examines the unique tensions governments face as both regulators and users of AI systems. Unlike private companies, government AI deployments carry profound democratic accountability stakes: failures aren't brand problems but threats to democratic legitimacy, equity, and social cohesion. The panelists argue that governments must establish independent evaluation ecosystems, maintain transparency mechanisms, and develop comprehensive national AI strategies grounded in human-centered principles rather than reactive responses to technological developments.

Key Takeaways

Governments must develop explicit, comprehensive national AI strategies tied to industrial policy, regulation, and evaluation—not reactive piecemeal responses. Strategy should articulate what AI will do for the nation and how to achieve it across all policy levers.
Independent third-party evaluation ecosystems are necessary infrastructure, not optional nice-to-haves. Similar to aviation, finance, medicine, and education, AI-impacted sectors require independent evaluators. This is both an ethical imperative and a massive market opportunity.
Governments should model best practices, not worst practices—demonstrating how to deploy AI safely, transparently, and accountably, rather than racing to the bottom. This sets norms for industry and other nations.
Transparency and accountability must be built into system design, not added afterward. This includes documentation of limitations, mechanisms for citizens to understand how they're affected, and chains of accountability even when systems are autonomous.
International coordination is essential but difficult. Without alignment on standards, enforcement, and transparency expectations, countries hosting powerful labs will face unresolvable conflicts of interest, and regulatory arbitrage will undermine protections globally.

Key Topics Covered

Dual role conflict: Governments as both regulators and deployers of AI systems
Accountability vs. autonomy: Tensions between autonomous AI systems and democratic accountability mechanisms
Evaluation and testing infrastructure: Gaps in government capacity for responsible AI assessment
Sovereign AI strategies: Risks and benefits of national AI development for developing nations
Regulatory frameworks: EU AI Act, UK legislation, US state-level approaches, and international alignment
Conflict of interest: Countries hosting frontier AI labs facing pressure to both regulate and compete
Failure modes: Deployment risks across benefits administration, healthcare, national security
Trust and transparency: Democratic legitimacy eroding from opaque algorithmic decision-making
International governance: Mechanisms for countries to hold each other accountable
Documentation and standards: Technical standards as tools for cross-border accountability

Key Points & Insights

Government AI failures have democratic consequences: When private companies' AI systems fail, it damages their brand. When governments' systems fail (wrongful benefit denials, discrimination, surveillance), it erodes democratic legitimacy, trust, and social cohesion itself.
The "dog fooding" principle: Governments that deploy AI without robust safeguards are essentially testing their regulatory systems on themselves. If regulatory gaps exist, government deployments expose them first and most severely.
Hidden capabilities create blind spots: The most powerful AI models are no longer published—they're kept inside companies for cost and control reasons. This creates an "internal deployment" frontier where governments and the public have minimal visibility into what's happening.
Terminology and evaluation saturation: Vague language ("AI") conflates narrow use cases with general deployment, making it difficult to understand real-world impacts. Additionally, standard evaluations now saturate at 100% performance, offering no meaningful signal about actual safety or reliability.
Insufficient investment in assurance infrastructure: Most government interest focuses on sovereign data centers and model development, while investment in privacy, security, testing, and responsible-use evaluation remains critically insufficient.
Measurement precedes meaningful governance: Terms like "manipulation," "bias," "discrimination," and "critical thinking impact" must be operationalized and quantified before they can be effectively governed. This requires deep scientific intervention from quantitative social scientists, not just computer scientists.
Countries with AI labs face acute conflicts of interest: Nations hosting frontier AI development face competing pressures: economic/military competitiveness depends on these labs, while regulatory responsibility demands oversight. This creates incoherence (e.g., export controls vs. corporate pressure).
Citizens have fundamentally different relationships to state vs. commercial AI: Citizens cannot opt out of government services, cannot switch jurisdictions easily, and have rights to understand how state systems affect them—expectations that differ sharply from consumer relationships with commercial products.
A "race to the bottom" is emerging among nation-states: Just as private companies competed to minimize safety investments, governments may now compete to attract AI development through deregulation, eroding hard-fought protections across jurisdictions.
Documentation and international standards are foundational but insufficient: Technical documentation, incident reporting, and benchmarking standards exist across frameworks (EU AI Act, NIST RMF, etc.), but governments rarely hold themselves to these same standards they impose on industry. International law mechanisms remain underdeveloped for AI-specific harms (transboundary interference, election manipulation, infrastructure attacks).

Notable Quotes or Statements

Gaia Marcus (Ada Lovelace Institute): "When governments deploy AI systems themselves, I think of them essentially dog fooding their regulatory system... if there are liability gaps, governance gaps, government often suffers from that."
Yan Tay (FLI/CSER): "The biggest failure mode doesn't happen at the application level... [but] inside the companies... we now have lost awareness of what is happening inside the labs."
Raman Chowdhury: "If we are going to evolve the process of evaluation, we actually have to evolve the process of measurement... [These] are concepts that need to be measured and quantified."
Alandre Nelson: "If it fails in the government side... the erosion to democratic societies, the erosion to society is really profound... We need to move the Overton window [and ask countries to be] the best model of how these tools and systems could be used, not the worst model."
Stephanie Iffland (Partnership on AI): "You need to have the tools to determine whether you trust [a system]. How do we ensure we have documentation around limitations that models have?"
Panel consensus: "Move from voluntary commitments to mandatory standards that are deployable, enforceable, and ratified at the domestic level."

Speakers & Organizations Mentioned

Panelists:

Gaia Marcus – Director, Ada Lovelace Institute; former UK civil service (data/AI strategy)
Yan Tay – Founding engineer, Skype and Kazaa; Co-founder, Cambridge Center for the Study of Existential Risk (CSER) and Future of Life Institute (FLI)
Dr. Raman Chowdhury – Data scientist, responsible AI researcher; worked with Biden administration, DEFCON AI Red Teaming, Project Arya (NIST)
Stephanie Iffland – Senior Managing Director of Public Policy, Partnership on AI; former UK government (digital standards policy)
Dr. Christine Custous – Program manager, Nelson Science, Technology, and Social Values Lab (moderator)
Dr. Alandre Nelson – (American researcher/speaker referenced, appears to have worked on White House OSTP AI Bill of Rights)

Organizations & Initiatives:

Ada Lovelace Institute
Future of Life Institute (FLI)
Cambridge Center for the Study of Existential Risk (CSER)
Partnership on AI
White House Office of Science and Technology Policy (OSTP)
NIST (National Institute of Standards and Technology) – NIST RMF (Risk Management Framework), Project Arya
UK Government (data/AI strategy, digital standards policy)
EU (AI Act)
AI companies: OpenAI, Google, Microsoft, Anthropic, Nvidia
Ashoka University
Center for AI and Digital Policy
Human Intelligence (public benefit corporation)
Collective Intelligence Project

Government Initiatives:

US AI Bill of Rights
EU AI Act
NIST Risk Management Framework (RMF)
Seoul Summit commitments
G7 Code of Conduct

Technical Concepts & Resources

Evaluation & Testing Methodologies:

DEFCON red teaming event (Biden administration, frontier model evaluation)
Project Arya (NIST-led, citizen-participation red teaming)
NIST Risk Management Framework (RMF)
Operationalization of terms (privacy, security, bias, manipulation, discrimination, reliance, critical thinking impact)
Measurement frameworks for AI impact on cognition and behavior
Benchmarking and metrology standards
Technical documentation requirements and interoperability

AI System Concepts:

Frontier models (internal vs. published versions)
Unsupervised learning (training regime for large models)
Model distillation (creating smaller deployable models from larger ones)
Autonomous agents and "internal deployment" (AI systems with human-in-loop decision-making)
Recursive self-improvement and loss of control risks
Hallucinations in generative AI (erroneous outputs)
Parasocial relationships and mental health impacts

Governance & Policy Concepts:

"Dog fooding" regulatory systems (government testing its own rules)
Chain of accountability (civil service principle)
Justified/calibrated trust (tools enabling informed consent)
Independent third-party evaluation ecosystem
International standards (ISO, technical standards for cross-border accountability)
Export controls on semiconductors
Data sovereignty and national AI strategies
Regulatory frameworks and interoperability of documentation requirements

Case Studies & Applications:

Transcription tools in social work (hallucinations affecting care records)
Algorithmic pricing notification laws (NYC, NY State)
AI in benefits administration and social welfare decisions
AI in medical research and healthcare
Agents in government services
Potential military/autonomous weapons applications

Institutions & Standards Referenced:

NIST Risk Management Framework (RMF)
EU AI Act
UK AI governance frameworks
ISO standards for cross-border consistency
G7 Code of Conduct on AI governance

Research Papers/Works Mentioned:

"Learn Fast and Build Things" – Ada Lovelace Institute report (32 government AI/data use cases over 6 years)
"Grown Up" – research on 14-24 year olds growing up with digital technology (with Enough Foundation)
AI governance stack analysis (2020 baseline, 2025 update with 13 levels)
Partnership on AI research on interoperability of documentation requirements (8 international policy frameworks)
Paper on agents and international law (September, Partnership on AI)
Paper on robust assurance ecosystem (released Friday post-panel)

Context Notes

Event: AI Summit (India-based, given references to "this summit" and Indian attendees)
Timeframe: Discussion reflects 2024-2025 developments; ChatGPT introduced November 2022 as reference point
Geopolitical Context: Strong discussion of US-EU-China dynamics, concerns about countries hosting leading labs, references to UN Security Council and Article 109 (raised by audience member)
Key Tension: Democratic/human-centered AI governance vs. geopolitical/competitive pressures driving deregulation