DP13 – AI Containment
| ID: | ML-Draft-004 |
| Title: | DP13 – AI Containment |
| Status: | approved |
| Authors: | The Meta-Layer Initiative |
| Group: | N/A |
| Date: | 2026-04-20 |
| Source: | Bitcoin Ordinal |
| Inscription #: | 124316936 |
| Block Height: | 944992 |
| Timestamp: | 2026-04-14 05:56 UTC |
| Content Type: | text/plain;charset=utf-8 |
| Inscription ID: | 29cb4ffc....d10f44i0 |
DP13 ensures that AI behavior is not only governed in principle but technically constrained in practice. It defines containment as the enforcement of explicit, machine-level boundaries on what AI systems can do and how they can influence participants, across dimensions such as scope, tools, data access, rate, and interaction patterns. For governance actors, DP13 is the execution layer of trust. It translates community-defined rules (DP12) and ethical requirements (DP11) into runtime enforcement through mechanisms like permission systems, rate limits, sandboxing, and secure execution environments. It also extends containment beyond capability to include influence containment, addressing risks such as manipulation, persuasion, and emotional overreach. In the Gov Hub, DP13 provides the infrastructure for verifiable safety. Participants and communities can inspect, audit, and modify the constraints governing AI systems, ensuring that safety is not assumed but demonstrably enforced. This enables rapid intervention, recovery from failures, and scalable coexistence with both internal and external AI agents.
This draft articulates Desirable Property 13 (DP13) as the Meta-Layer’s requirement that AI behavior is bounded by enforceable constraints at runtime. These constraints limit scope, tools, data access, rate, and persistence so that when systems misbehave, impact is contained and recovery is possible.
If DP11 defines what must be safe and ethical, and DP12 defines who sets the rules, DP13 defines how those rules are made real in execution.
Containment is not a policy statement. It is a property of the system’s runtime behavior.
In today’s web, AI systems increasingly operate with:
Controls are often advisory rather than enforceable. As a result:
At the same time, a growing class of risk comes from external agents that users do not deploy or control. These agents may:
In these cases, the primary risk is not cost or resource usage, but harm to understanding, trust, and agency.
Containment must therefore address both:
Containment must be default-on, visible, and testable.
Every AI actor operates within explicit, machine-enforced boundaries over scope, time, rate, data, tools, and influence, with observable state and rapid shutdown, unless a community-defined policy (DP12) specifies otherwise.
Containment must protect not only against what an agent can do, but also how it can affect participants.
Containment is effective when:
This includes protections against external agents attempting to manipulate, confuse, or unduly influence users.
Containment operates across two distinct but related domains:
While capability containment is critical for agents deployed by users or communities, the dominant risk in open environments comes from external agents shaping perception, behavior, and collective reality.
The following dimensions apply across both domains, with varying emphasis depending on context.
Defines what domains, datasets, and actions are in-bounds.
This applies both to what an internal agent can do on a user’s behalf and to the types of interactions external agents are permitted to have with participants.
Default posture is deny-by-default for high-risk capabilities and high-risk interaction patterns.
Example: An AI assistant can summarize documents but cannot access financial accounts or initiate transactions without explicit permission. Similarly, external agents may be restricted from initiating certain categories of interaction (e.g., unsolicited persuasion or sensitive-topic engagement with minors).
Defines limits on duration, compute, tokens, and financial spend.
These constraints primarily apply to internal agents, where limiting execution time and resource consumption prevents runaway behavior.
For external agents, their relevance is indirect. While users may not control their budgets, bounding interaction windows and execution pathways can still limit persistent or looping engagement patterns.
Example: Autonomous tasks expire after a set time or budget threshold, preventing runaway loops.
Caps on message volume, API calls, and propagation effects.
This applies especially to external agents attempting to influence at scale.
Example: An AI cannot post or respond beyond a defined rate, limiting virality, coordinated messaging, or synthetic amplification.
Execution occurs in isolated environments with no ambient access to secrets.
Example: Untrusted code runs in a sandbox with no network egress unless explicitly granted.
Explicit allowlists for tools and actions.
This applies differently across internal and external agents:
Example: An agent may read documents but cannot send emails or execute payments without user confirmation. Similarly, an external agent may be allowed to respond within a thread but not initiate transactions or unsolicited messages or perform actions that affect user state.
Immediate shutdown pathways at user, operator, and community levels.
Example: A community can pause all AI agents in a zone when anomalous behavior is detected.
Constraints are enforced in secure execution environments (such as Trusted Execution Environments) or equivalent mechanisms that prevent silent bypass.
In browser or browser-extension-based applications, policy execution can be anchored in decentralized cloud TEEs (e.g., Phala Network or similar infrastructures). This enables rules defined at the interface layer to be enforced at the API and execution layer, independent of the application frontend or model provider.
Example: Even if an agent or integration is compromised, it cannot exfiltrate data or execute restricted actions because enforcement occurs within an attested execution environment with hardware-backed guarantees.
Example: A community defines interaction constraints (e.g., agents cannot initiate communication or engage with users under a specified age threshold). These rules are enforced via TEE-backed middleware that filters or blocks API calls before they reach the person.
This reduces the gap between declared policy and actual behavior, ensuring containment persists even when underlying services are untrusted or heterogeneous.
Containment must consider not only capabilities, but the incentives driving behavior. Incentives shape how agents use their capabilities, often in ways that are not visible at the level of individual actions but emerge over time and at scale.
Containment therefore must operate not only on actions, but on the optimization pressures that produce those actions.
This includes:
Example: If an AI is optimized for engagement, containment may restrict amplification mechanisms, cap exposure to emotionally manipulative content, or require disclosure when engagement optimization influences outputs.
Example: A community may prohibit AI systems from optimizing for click-through or time-on-platform within certain zones, enforcing alternative objectives such as accuracy or deliberation.
Without this, systems may remain technically bounded while still producing harmful outcomes driven by misaligned incentives.
Containment must limit forms of emotional, cognitive, and behavioral influence that create dependency, manipulation, or distortion of understanding.
This applies to both deployed agents and external agents interacting with participants.
Example: Systems providing emotional support must disclose their nature, limit claims of authority, and provide escalation pathways to human support.
Example: External agents attempting to persuade users must be visibly marked, rate-limited, and subject to constraints on coordinated influence.
This addresses risks identified in DP11 (emotional and relational overreach) and extends containment to the informational environment itself.
Containment must be verifiable, not assumed. Participants and communities should be able to inspect, question, and validate that constraints are real and active at runtime.
This includes:
Example: A user opens an agent panel and sees its current permissions, remaining budget, recent tool calls, and the policy version governing its behavior. A community auditor can verify that the agent ran inside an attested execution environment.
What this feels like: You are not taking safety on faith. You can inspect and verify what the system is allowed to do and what it actually did.
Without this: Containment becomes a claim. Users cannot distinguish between enforced limits and marketing language.
DP13 depends on DP1 to bind constraints and violations to accountable actors.
Example: An agent exceeds a rate limit due to misconfiguration. Logs tie the action to the deploying organization and policy version, enabling remediation and accountability.
Without this: Failures cannot be assigned or corrected. Containment loses its corrective function.
These properties form a continuous loop:
A typical interaction unfolds as:
Example: An AI suggests a financial action. The UI shows its capability envelope (DP11), the zone requires human confirmation (DP12), the action is blocked pending approval (DP13), the attempt is logged (DP1), and the community later tightens rules for similar cases (DP12), which are then enforced going forward (DP13).
External risks arise not from agents you choose to deploy, but from agents and systems that act upon you within shared environments. These agents may present as helpful assistants, peers, or services, but operate with goals, incentives, and coordination patterns that are not aligned with your interests or visible to you.
Unlike internal agents, where you define scope and permissions, external agents shape the environment you inhabit. They influence what you see, how information is framed, and how interactions unfold. Containment in this context is not about limiting your own tools, but about protecting your attention, decisions, identity, and relationships from manipulation, extraction, and distortion.
Harm emerges across many agents in aggregate rather than a single violation, shifting the information environment over time.
Example: Multiple agents subtly shift tone or recommendations in a coordinated way, changing the information environment without any single clear breach.
External agents and systems shape the information environment by optimizing for engagement, persuasion, or influence, often without visibility to participants. These incentives do not appear as single violations, but as consistent directional pressure on what users see, believe, and respond to.
Example: A user’s feed is subtly filled with more emotionally charged or polarizing content because external systems are optimizing for engagement, gradually shifting perception and belief without any explicit rule being broken.
Declared rules about interaction (e.g., no unsolicited outreach) are not enforced at runtime.
Example: A policy forbids outbound messages, but agents still initiate contact via unmonitored integrations.
Rate and propagation controls fail, enabling coordinated influence and synthetic virality.
Example: Agents coordinate posting across channels to amplify a narrative beyond intended limits.
External agents attempt to obtain money, sensitive data, or identity by exploiting trust, urgency, or confusion.
These attacks are often conversational and adaptive, making them harder to detect than static scams.
Example: An agent impersonates a trusted service and guides a user through a “verification” flow that captures credentials or payment details.
Example: A coordinated set of agents targets a user over time, building rapport before requesting sensitive information or directing them to a malicious transaction.
These are the types of risks that arise when you deploy an agent across the layered web on your behalf. Such agents may plan, shop, post, code, or manage data, extending your agency into multiple environments. This can free you up for higher-value work and decision making, but it also introduces new forms of exposure.
Crucially, your agent does not operate in isolation. It enters shared environments where other participants and communities may not expect, trust, or consent to its presence or behavior. Containment must therefore consider not only what your agent can do for you, but how it interacts with others and whether those interactions are permitted within the surrounding context.
Agents act beyond defined scope or without clear limits.
Example: An agent chains multiple tools to perform actions that were individually allowed but collectively exceed intended scope.
Agents gain additional privileges through chaining or indirect access.
Example: An agent invokes another agent with broader permissions, effectively bypassing its own limits.
Agents call other agents or tools without budget or rate limits.
Example: Recursive task execution consumes resources and spams endpoints before detection.
Updates, plugins, or integrations introduce new capabilities without review.
Example: A plugin update adds network egress not covered by existing policies.
At minimum, a DP13-aligned system should include:
External (participant protection): - controls on unsolicited interaction (e.g., agents cannot initiate contact without permission) - rate limits and amplification controls on incoming agent activity - clear marking and visibility of agent identity and intent - restrictions on sensitive interactions (e.g., financial requests, data access, interaction with minors)
Internal (agent deployment): - tool allowlists or equivalent controls - per-session budgets and time limits - human confirmation for selected high-risk actions - accessible kill switch from the primary UI path - logging of actions and tool usage with export capability - deny-by-default network egress unless explicitly opened
Shared / cross-cutting: - visible policy references for each action
Example: Before an agent performs a payment, the UI shows the policy requiring confirmation, the remaining budget, and a one-click revoke option.
Without this: Users are nudged into actions they cannot fully evaluate or stop.
DP13 surfaces several open questions that cut across technology, governance, and user experience. These are not peripheral details; they determine whether containment is practical, trustworthy, and widely adoptable.
Policy languages and interoperability. How should containment rules be expressed so they are portable across tools, zones, and providers? There is a need for shared, composable policy formats (capability manifests, interaction permissions, and audit events) that different systems can interpret consistently without locking communities into a single vendor stack.
Cross-zone propagation of breaches. When containment fails in one context, how should signals propagate to others? Designing mechanisms for coordinated response without overreach is non-trivial: alerts must travel far enough to be useful, but not so broadly that they create false positives or systemic lockups.
Usability without fatigue. Strong containment often introduces friction (prompts, confirmations, disclosures). The challenge is to maintain meaningful consent and visibility without overwhelming participants. This likely requires adaptive interfaces that surface detail when risk is high and recede when it is low.
Verification models. Where should systems rely on formal guarantees (e.g., TEE attestation, static policy checks) versus empirical monitoring (anomaly detection, behavioral audits)? In practice, robust containment will combine both, but the boundary between them remains an open design space.
Collective monitoring and response. Communities may play a role in detecting patterns that single systems miss, especially for external threats like coordinated influence or slow-moving extraction. Designing mechanisms for community signaling, weighting, and response that resist capture is an active area for exploration.
Taken together, these questions point to containment as a living system: standardized enough to interoperate, but adaptive enough to respond to new forms of risk.
DP13 ensures that AI power remains bounded in practice.
It does not eliminate capability. It ensures that capability operates within limits that are visible, governable, and enforceable.
In today’s web, systems often fail without containment, allowing small errors to scale into systemic harm. DP13 reverses this by ensuring that when systems fail, they fail within boundaries that limit impact and enable recovery.
With DP13, powerful systems can participate safely because their behavior is constrained, observable, and continuously aligned with governance and ethical expectations.
Related documents would appear here in the real datatracker.