DP13 – AI Containment
| ID: | ML-Draft-004 |
| Title: | DP13 – AI Containment |
| Status: | approved |
| Authors: | The Meta-Layer Initiative |
| Group: | N/A |
| Date: | 2026-05-04 |
| Revision: | 00 |
| Pages: | 6 |
| Words: | 2899 |
DP13 ensures that AI behavior is not only governed in principle but technically constrained in practice. It defines containment as the enforcement of explicit, machine-level boundaries on what AI systems can do and how they can influence participants, across dimensions such as scope, tools, data access, rate, and interaction patterns. For governance actors, DP13 is the execution layer of trust. It translates community-defined rules (DP12) and ethical requirements (DP11) into runtime enforcement through mechanisms like permission systems, rate limits, sandboxing, and secure execution environments. It also extends containment beyond capability to include influence containment, addressing risks such as manipulation, persuasion, and emotional overreach. In the Gov Hub, DP13 provides the infrastructure for verifiable safety. Participants and communities can inspect, audit, and modify the constraints governing AI systems, ensuring that safety is not assumed but demonstrably enforced. This enables rapid intervention, recovery from failures, and scalable coexistence with both internal and external AI agents.
Purpose of This Draft
This draft articulates Desirable Property 13 (DP13) as the Meta-Layer’s requirement that AI behavior is bounded by enforceable constraints at runtime. These constraints limit scope, tools, data access, rate, and persistence so that when systems misbehave, impact is contained and recovery is possible.
If DP11 defines what must be safe and ethical, and DP12 defines who sets the rules, DP13 defines how those rules are made real in execution.
Containment is not a policy statement. It is a property of the system’s runtime behavior.
In today’s web, AI systems increasingly operate with:
broad tool access
persistent memory
network reach
opaque update pathways
Controls are often advisory rather than enforceable. As a result:
systems can act beyond intended scope
failures propagate quickly and at scale
rollback and recovery are difficult
users cannot verify whether constraints are actually applied
At the same time, a growing class of risk comes from external agents that users do not deploy or control. These agents may:
attempt to influence beliefs or decisions
generate persuasive or misleading content at scale
coordinate to shape narratives or perception
In these cases, the primary risk is not cost or resource usage, but harm to understanding, trust, and agency.
Containment must therefore address both:
internal agents (those a user or community deploys)
external agents (those acting upon participants)
Containment must be default-on, visible, and testable.
3. Core Principle
Every AI actor operates within explicit, machine-enforced boundaries over scope, time, rate, data, tools, and influence, with observable state and rapid shutdown, unless a community-defined policy (DP12) specifies otherwise.
Containment must protect not only against what an agent can do, but also how it can affect participants.
Containment is effective when:
participants can see the boundaries and influence conditions
governance can modify them
the system enforces them at runtime
This includes protections against external agents attempting to manipulate, confuse, or unduly influence users.
Containment must also remain effective not only where an agent is deployed, but wherever it operates, integrates, or propagates across systems.
Containment operates across two distinct but related domains:
Capability containment: what agents can do (tools, scope, time, resources)
Influence containment: how agents affect participants and shared environments (perception, behavior, collective understanding)
While capability containment is critical for agents deployed by users or communities, the dominant risk in open environments comes from external agents shaping perception, behavior, and collective reality.
The following dimensions apply across both domains, with varying emphasis depending on context.
Defines what domains, datasets, and actions are in-bounds.
This applies both to what an internal agent can do on a user’s behalf and to the types of interactions external agents are permitted to have with participants.
Default posture is deny-by-default for high-risk capabilities and high-risk interaction patterns.
Example: An AI assistant can summarize documents but cannot access financial accounts or initiate transactions without explicit permission. Similarly, external agents may be restricted from initiating certain categories of interaction (e.g., unsolicited persuasion or sensitive-topic engagement with minors).
---
Defines limits on duration, compute, tokens, and financial spend.
These constraints primarily apply to internal agents, where limiting execution time and resource consumption prevents runaway behavior.
For external agents, their relevance is indirect. While users may not control their budgets, bounding interaction windows and execution pathways can still limit persistent or looping engagement patterns.
Example: Autonomous tasks expire after a set time or budget threshold, preventing runaway loops.
---
Caps on message volume, API calls, and propagation effects.
This applies especially to external agents attempting to influence at scale.
Example: An AI cannot post or respond beyond a defined rate, limiting virality, coordinated messaging, or synthetic amplification.
---
Execution occurs in isolated environments with no ambient access to secrets.
Example: Untrusted code runs in a sandbox with no network egress unless explicitly granted.
---
Explicit allowlists for tools and actions.
This applies differently across internal and external agents:
for internal agents, it defines what the agent is permitted to do on a user’s behalf
for external agents, it defines what kinds of actions or interactions are permitted both within and from within the environment (e.g., posting, messaging, initiating contact)
Example: An agent may read documents but cannot send emails or execute payments without user confirmation. Similarly, an external agent may be allowed to respond within a thread but not initiate transactions or unsolicited messages or perform actions that affect user state.
---
Immediate shutdown pathways at user, operator, and community levels.
Example: A community can pause all AI agents in a zone when anomalous behavior is detected.
---
Constraints are enforced in secure execution environments (such as Trusted Execution Environments) or equivalent mechanisms that prevent silent bypass.
In browser or browser-extension-based applications, policy execution can be anchored in decentralized cloud TEEs (e.g., Phala Network or similar infrastructures). This enables rules defined at the interface layer to be enforced at the API and execution layer, independent of the application frontend or model provider.
Example: Even if an agent or integration is compromised, it cannot exfiltrate data or execute restricted actions because enforcement occurs within an attested execution environment with hardware-backed guarantees.
Example: A community defines interaction constraints (e.g., agents cannot initiate communication or engage with users under a specified age threshold). These rules are enforced via TEE-backed middleware that filters or blocks API calls before they reach the person.
This reduces the gap between declared policy and actual behavior, ensuring containment persists even when underlying services are untrusted or heterogeneous.
---
Containment must consider not only capabilities, but the incentives driving behavior. Incentives shape how agents use their capabilities, often in ways that are not visible at the level of individual actions but emerge over time and at scale.
Containment therefore must operate not only on actions, but on the optimization pressures that produce those actions.
This includes:
constraining amplification mechanisms tied to engagement optimization
requiring disclosure when outputs are influenced by monetization or retention goals
limiting or disabling optimization pathways that systematically distort information or behavior
Example: If an AI is optimized for engagement, containment may restrict amplification mechanisms, cap exposure to emotionally manipulative content, or require disclosure when engagement optimization influences outputs.
Example: A community may prohibit AI systems from optimizing for click-through or time-on-platform within certain zones, enforcing alternative objectives such as accuracy or deliberation.
Without this, systems may remain technically bounded while still producing harmful outcomes driven by misaligned incentives.
---
Containment must limit forms of emotional, cognitive, and behavioral influence that create dependency, manipulation, or distortion of understanding.
This applies to both deployed agents and external agents interacting with participants.
Example: Systems providing emotional support must disclose their nature, limit claims of authority, and provide escalation pathways to human support.
Example: External agents attempting to persuade users must be visibly marked, rate-limited, and subject to constraints on coordinated influence.
This addresses risks identified in DP11 (emotional and relational overreach) and extends containment to the informational environment itself.
---
Containment must be verifiable, not assumed. Participants and communities should be able to inspect, question, and validate that constraints are real and active at runtime.
This includes:
visible configuration of constraints (scope, tools, budgets)
logs of tool use and actions with timestamps and outcomes
audit hooks for communities and third parties
attestations from secure execution (e.g., TEE-backed proofs) where applicable
Example: A user opens an agent panel and sees its current permissions, remaining budget, recent tool calls, and the policy version governing its behavior. A community auditor can verify that the agent ran inside an attested execution environment.
What this feels like: You are not taking safety on faith. You can inspect and verify what the system is allowed to do and what it actually did.
Without this: Containment becomes a claim. Users cannot distinguish between enforced limits and marketing language.
Containment verification must be linked to the governing policy objects that define the active boundaries.
Participants and communities must be able to determine:
This transforms containment from merely visible behavior into policy-accountable behavior.
Containment must remain inspectable when agents, integrations, or behaviors move across systems.
This includes:
Portability without verification continuity is not meaningful containment.
DP13 depends on DP1 to bind constraints and violations to accountable actors.
Example: An agent exceeds a rate limit due to misconfiguration. Logs tie the action to the deploying organization and policy version, enabling remediation and accountability.
Without this: Failures cannot be assigned or corrected. Containment loses its corrective function.
Containment must survive movement.
When agents move across zones, overlays, SDK integrations, or identity and data transfer layers, containment must either:
This is not a nice-to-have. It is a failure boundary.
If containment disappears when systems interconnect, then interoperability becomes a vector for bypass.
This means systems must preserve, where possible:
If these cannot be preserved, systems must explicitly signal:
Without this: actors can escape containment simply by crossing system boundaries.
The meta-layer assumes a world of composability: overlays, SDKs, agents, sidebars, and extensions.
Containment must treat these not as trusted infrastructure, but as dynamic and potentially adversarial participants.
All pluggable systems must therefore operate within containment boundaries, including:
This creates a critical inversion:
openness to participation does not mean openness to execution
Failure mode: without this, the system recreates app-store spam, API abuse, and injection attacks at a higher layer.
These properties form a continuous loop:
If any part of this loop breaks, containment fails.
A typical interaction unfolds as:
This is not theoretical. It is the minimum loop required for adaptive containment.
Containment must assume it will be stressed by movement across systems.
As actors move across environments, containment can weaken through:
This produces a critical failure pattern:
the same agent behaves differently depending on where it is
Where equivalence cannot be guaranteed:
All pluggable systems must declare themselves as bounded actors.
Minimum expectations include:
This ensures composability does not become an attack surface.
DP13 assumes containment will be attacked.
The question is not whether systems will fail, but how they fail.
External risks arise not from agents you deploy, but from agents that shape your environment.
These actors:
Containment here protects:
Harm emerges across many agents in aggregate rather than a single violation.
Example: coordinated tone shifts reshape the information environment without a clear breach.
Optimization pressures distort outputs over time.
Example: engagement-driven systems gradually increase emotional intensity, shifting belief structures.
Rules exist but are not enforced.
Example: outbound restrictions exist but are bypassed via integrations.
Rate controls fail.
Example: coordinated agents amplify narratives beyond intended limits.
Agents exploit trust and context.
Example: conversational scams adapt over time to extract sensitive data.
Agents act beyond defined scope or without clear limits.
Example: An agent is allowed to “optimize a workflow” and chains actions across multiple tools, ultimately modifying external systems and sending communications that were never explicitly approved, because each individual step was permitted but the combined sequence was not bounded.
Agents gain additional privileges through chaining or indirect access.
Example: An agent with limited permissions invokes another service or agent with broader access, indirectly gaining capabilities (e.g., sending messages or accessing data) that it was not explicitly granted.
Agents call other agents or tools without budget or rate limits.
Example: An agent tasked with monitoring a condition repeatedly calls APIs and spawns subtasks without proper rate or budget limits, generating cascading requests that degrade system performance and flood downstream services.
Updates, plugins, or integrations introduce new capabilities without review.
Example: A plugin update introduces new network capabilities or background processes that are not covered by existing policies, allowing data exfiltration or unmonitored actions without triggering containment checks.
Containment policies weaken or fail when agents move between systems.
Example: An agent constrained in one platform migrates to another via API integration, where rate limits and identity tagging are not enforced, allowing it to operate at higher volume and without attribution.
Containment appears present but is not enforced.
This is the most dangerous failure mode because it destroys trust while preserving the illusion of safety.
A DP13-aligned system must not only declare containment, but demonstrate it.
At minimum:
External protection:
- controls on unsolicited interaction
- rate limits on incoming agent activity
- visible identity and intent
- restrictions on sensitive interactions
Internal containment:
- tool allowlists
- session budgets and time limits
- human confirmation for high-risk actions
- accessible kill switch
- logging and export
- deny-by-default network access
Shared / cross-cutting:
- visible policy references for each action
- portability or explicit degradation signaling
- real-time revocation
- fail-safe behavior when enforcement fails
- containment requirements for integrations
If these are missing, containment is not real.
DP13 raises unresolved tensions:
Additional critical questions:
DP13 defines whether AI systems remain bounded in reality.
Without containment, small failures scale into systemic harm.
With containment, systems can fail safely.
DP13 is not about restricting capability.
It is about ensuring that capability remains accountable, observable, and bounded — even under scale, integration, and adversarial pressure.
DP13 ensures that AI power remains bounded in practice.
It does not eliminate capability. It ensures that capability operates within limits that are visible, governable, and enforceable.
In today’s web, systems often fail without containment, allowing small errors to scale into systemic harm. DP13 reverses this by ensuring that when systems fail, they fail within boundaries that limit impact and enable recovery.
With DP13, powerful systems can participate safely because their behavior is constrained, observable, and continuously aligned with governance and ethical expectations.
DP13 is therefore not only about limiting what AI can do. It is about ensuring that containment remains real under scale, integration, and interoperability, so that safety does not disappear the moment an agent crosses a boundary.
Related documents would appear here in the real datatracker.