AI and the State: Governing Intelligence in Government
Contents
Executive Summary
This panel discussion examines the unique tensions governments face as both regulators and users of AI systems. Unlike private companies, government AI deployments carry profound democratic accountability stakes: failures aren't brand problems but threats to democratic legitimacy, equity, and social cohesion. The panelists argue that governments must establish independent evaluation ecosystems, maintain transparency mechanisms, and develop comprehensive national AI strategies grounded in human-centered principles rather than reactive responses to technological developments.
Key Takeaways
-
Governments must develop explicit, comprehensive national AI strategies tied to industrial policy, regulation, and evaluation—not reactive piecemeal responses. Strategy should articulate what AI will do for the nation and how to achieve it across all policy levers.
-
Independent third-party evaluation ecosystems are necessary infrastructure, not optional nice-to-haves. Similar to aviation, finance, medicine, and education, AI-impacted sectors require independent evaluators. This is both an ethical imperative and a massive market opportunity.
-
Governments should model best practices, not worst practices—demonstrating how to deploy AI safely, transparently, and accountably, rather than racing to the bottom. This sets norms for industry and other nations.
-
Transparency and accountability must be built into system design, not added afterward. This includes documentation of limitations, mechanisms for citizens to understand how they're affected, and chains of accountability even when systems are autonomous.
-
International coordination is essential but difficult. Without alignment on standards, enforcement, and transparency expectations, countries hosting powerful labs will face unresolvable conflicts of interest, and regulatory arbitrage will undermine protections globally.
Key Topics Covered
- Dual role conflict: Governments as both regulators and deployers of AI systems
- Accountability vs. autonomy: Tensions between autonomous AI systems and democratic accountability mechanisms
- Evaluation and testing infrastructure: Gaps in government capacity for responsible AI assessment
- Sovereign AI strategies: Risks and benefits of national AI development for developing nations
- Regulatory frameworks: EU AI Act, UK legislation, US state-level approaches, and international alignment
- Conflict of interest: Countries hosting frontier AI labs facing pressure to both regulate and compete
- Failure modes: Deployment risks across benefits administration, healthcare, national security
- Trust and transparency: Democratic legitimacy eroding from opaque algorithmic decision-making
- International governance: Mechanisms for countries to hold each other accountable
- Documentation and standards: Technical standards as tools for cross-border accountability
Key Points & Insights
-
Government AI failures have democratic consequences: When private companies' AI systems fail, it damages their brand. When governments' systems fail (wrongful benefit denials, discrimination, surveillance), it erodes democratic legitimacy, trust, and social cohesion itself.
-
The "dog fooding" principle: Governments that deploy AI without robust safeguards are essentially testing their regulatory systems on themselves. If regulatory gaps exist, government deployments expose them first and most severely.
-
Hidden capabilities create blind spots: The most powerful AI models are no longer published—they're kept inside companies for cost and control reasons. This creates an "internal deployment" frontier where governments and the public have minimal visibility into what's happening.
-
Terminology and evaluation saturation: Vague language ("AI") conflates narrow use cases with general deployment, making it difficult to understand real-world impacts. Additionally, standard evaluations now saturate at 100% performance, offering no meaningful signal about actual safety or reliability.
-
Insufficient investment in assurance infrastructure: Most government interest focuses on sovereign data centers and model development, while investment in privacy, security, testing, and responsible-use evaluation remains critically insufficient.
-
Measurement precedes meaningful governance: Terms like "manipulation," "bias," "discrimination," and "critical thinking impact" must be operationalized and quantified before they can be effectively governed. This requires deep scientific intervention from quantitative social scientists, not just computer scientists.
-
Countries with AI labs face acute conflicts of interest: Nations hosting frontier AI development face competing pressures: economic/military competitiveness depends on these labs, while regulatory responsibility demands oversight. This creates incoherence (e.g., export controls vs. corporate pressure).
-
Citizens have fundamentally different relationships to state vs. commercial AI: Citizens cannot opt out of government services, cannot switch jurisdictions easily, and have rights to understand how state systems affect them—expectations that differ sharply from consumer relationships with commercial products.
-
A "race to the bottom" is emerging among nation-states: Just as private companies competed to minimize safety investments, governments may now compete to attract AI development through deregulation, eroding hard-fought protections across jurisdictions.
-
Documentation and international standards are foundational but insufficient: Technical documentation, incident reporting, and benchmarking standards exist across frameworks (EU AI Act, NIST RMF, etc.), but governments rarely hold themselves to these same standards they impose on industry. International law mechanisms remain underdeveloped for AI-specific harms (transboundary interference, election manipulation, infrastructure attacks).
Notable Quotes or Statements
-
Gaia Marcus (Ada Lovelace Institute): "When governments deploy AI systems themselves, I think of them essentially dog fooding their regulatory system... if there are liability gaps, governance gaps, government often suffers from that."
-
Yan Tay (FLI/CSER): "The biggest failure mode doesn't happen at the application level... [but] inside the companies... we now have lost awareness of what is happening inside the labs."
-
Raman Chowdhury: "If we are going to evolve the process of evaluation, we actually have to evolve the process of measurement... [These] are concepts that need to be measured and quantified."
-
Alandre Nelson: "If it fails in the government side... the erosion to democratic societies, the erosion to society is really profound... We need to move the Overton window [and ask countries to be] the best model of how these tools and systems could be used, not the worst model."
-
Stephanie Iffland (Partnership on AI): "You need to have the tools to determine whether you trust [a system]. How do we ensure we have documentation around limitations that models have?"
-
Panel consensus: "Move from voluntary commitments to mandatory standards that are deployable, enforceable, and ratified at the domestic level."
Speakers & Organizations Mentioned
Panelists:
- Gaia Marcus – Director, Ada Lovelace Institute; former UK civil service (data/AI strategy)
- Yan Tay – Founding engineer, Skype and Kazaa; Co-founder, Cambridge Center for the Study of Existential Risk (CSER) and Future of Life Institute (FLI)
- Dr. Raman Chowdhury – Data scientist, responsible AI researcher; worked with Biden administration, DEFCON AI Red Teaming, Project Arya (NIST)
- Stephanie Iffland – Senior Managing Director of Public Policy, Partnership on AI; former UK government (digital standards policy)
- Dr. Christine Custous – Program manager, Nelson Science, Technology, and Social Values Lab (moderator)
- Dr. Alandre Nelson – (American researcher/speaker referenced, appears to have worked on White House OSTP AI Bill of Rights)
Organizations & Initiatives:
- Ada Lovelace Institute
- Future of Life Institute (FLI)
- Cambridge Center for the Study of Existential Risk (CSER)
- Partnership on AI
- White House Office of Science and Technology Policy (OSTP)
- NIST (National Institute of Standards and Technology) – NIST RMF (Risk Management Framework), Project Arya
- UK Government (data/AI strategy, digital standards policy)
- EU (AI Act)
- AI companies: OpenAI, Google, Microsoft, Anthropic, Nvidia
- Ashoka University
- Center for AI and Digital Policy
- Human Intelligence (public benefit corporation)
- Collective Intelligence Project
Government Initiatives:
- US AI Bill of Rights
- EU AI Act
- NIST Risk Management Framework (RMF)
- Seoul Summit commitments
- G7 Code of Conduct
Technical Concepts & Resources
Evaluation & Testing Methodologies:
- DEFCON red teaming event (Biden administration, frontier model evaluation)
- Project Arya (NIST-led, citizen-participation red teaming)
- NIST Risk Management Framework (RMF)
- Operationalization of terms (privacy, security, bias, manipulation, discrimination, reliance, critical thinking impact)
- Measurement frameworks for AI impact on cognition and behavior
- Benchmarking and metrology standards
- Technical documentation requirements and interoperability
AI System Concepts:
- Frontier models (internal vs. published versions)
- Unsupervised learning (training regime for large models)
- Model distillation (creating smaller deployable models from larger ones)
- Autonomous agents and "internal deployment" (AI systems with human-in-loop decision-making)
- Recursive self-improvement and loss of control risks
- Hallucinations in generative AI (erroneous outputs)
- Parasocial relationships and mental health impacts
Governance & Policy Concepts:
- "Dog fooding" regulatory systems (government testing its own rules)
- Chain of accountability (civil service principle)
- Justified/calibrated trust (tools enabling informed consent)
- Independent third-party evaluation ecosystem
- International standards (ISO, technical standards for cross-border accountability)
- Export controls on semiconductors
- Data sovereignty and national AI strategies
- Regulatory frameworks and interoperability of documentation requirements
Case Studies & Applications:
- Transcription tools in social work (hallucinations affecting care records)
- Algorithmic pricing notification laws (NYC, NY State)
- AI in benefits administration and social welfare decisions
- AI in medical research and healthcare
- Agents in government services
- Potential military/autonomous weapons applications
Institutions & Standards Referenced:
- NIST Risk Management Framework (RMF)
- EU AI Act
- UK AI governance frameworks
- ISO standards for cross-border consistency
- G7 Code of Conduct on AI governance
Research Papers/Works Mentioned:
- "Learn Fast and Build Things" – Ada Lovelace Institute report (32 government AI/data use cases over 6 years)
- "Grown Up" – research on 14-24 year olds growing up with digital technology (with Enough Foundation)
- AI governance stack analysis (2020 baseline, 2025 update with 13 levels)
- Partnership on AI research on interoperability of documentation requirements (8 international policy frameworks)
- Paper on agents and international law (September, Partnership on AI)
- Paper on robust assurance ecosystem (released Friday post-panel)
Context Notes
- Event: AI Summit (India-based, given references to "this summit" and Indian attendees)
- Timeframe: Discussion reflects 2024-2025 developments; ChatGPT introduced November 2022 as reference point
- Geopolitical Context: Strong discussion of US-EU-China dynamics, concerns about countries hosting leading labs, references to UN Security Council and Article 109 (raised by audience member)
- Key Tension: Democratic/human-centered AI governance vs. geopolitical/competitive pressures driving deregulation
