Document Information

ID:	ML-Draft-004
Title:	DP13 – AI Containment
Status:	approved
Authors:	The Meta-Layer Initiative
Group:	N/A
Date:	2026-05-04
Revision:	00
Pages:	6
Words:	2899

Abstract

DP13 ensures that AI behavior is not only governed in principle but technically constrained in practice. It defines containment as the enforcement of explicit, machine-level boundaries on what AI systems can do and how they can influence participants, across dimensions such as scope, tools, data access, rate, and interaction patterns. For governance actors, DP13 is the execution layer of trust. It translates community-defined rules (DP12) and ethical requirements (DP11) into runtime enforcement through mechanisms like permission systems, rate limits, sandboxing, and secure execution environments. It also extends containment beyond capability to include influence containment, addressing risks such as manipulation, persuasion, and emotional overreach. In the Gov Hub, DP13 provides the infrastructure for verifiable safety. Participants and communities can inspect, audit, and modify the constraints governing AI systems, ensuring that safety is not assumed but demonstrably enforced. This enables rapid intervention, recovery from failures, and scalable coexistence with both internal and external AI agents.

Document Content

Download View TXT

AI Containment

Purpose of This Draft

This draft articulates Desirable Property 13 (DP13) as the Meta-Layer’s requirement that AI behavior is bounded by enforceable constraints at runtime. These constraints limit scope, tools, data access, rate, and persistence so that when systems misbehave, impact is contained and recovery is possible.

If DP11 defines what must be safe and ethical, and DP12 defines who sets the rules, DP13 defines how those rules are made real in execution.

Containment is not a policy statement. It is a property of the system’s runtime behavior.

Problem Statement

In today’s web, AI systems increasingly operate with:

broad tool access
persistent memory
network reach
opaque update pathways

Controls are often advisory rather than enforceable. As a result:

systems can act beyond intended scope
failures propagate quickly and at scale
rollback and recovery are difficult
users cannot verify whether constraints are actually applied

At the same time, a growing class of risk comes from external agents that users do not deploy or control. These agents may:

attempt to influence beliefs or decisions
generate persuasive or misleading content at scale
coordinate to shape narratives or perception

In these cases, the primary risk is not cost or resource usage, but harm to understanding, trust, and agency.

Containment must therefore address both:

internal agents (those a user or community deploys)

external agents (those acting upon participants)

Containment must be default-on, visible, and testable.

3. Core Principle

Every AI actor operates within explicit, machine-enforced boundaries over scope, time, rate, data, tools, and influence, with observable state and rapid shutdown, unless a community-defined policy (DP12) specifies otherwise.

Containment must protect not only against what an agent can do, but also how it can affect participants.

Containment is effective when:

participants can see the boundaries and influence conditions
governance can modify them
the system enforces them at runtime

This includes protections against external agents attempting to manipulate, confuse, or unduly influence users.

Containment must also remain effective not only where an agent is deployed, but wherever it operates, integrates, or propagates across systems.

4. Containment Dimensions

Containment operates across two distinct but related domains:

Capability containment: what agents can do (tools, scope, time, resources)
Influence containment: how agents affect participants and shared environments (perception, behavior, collective understanding)

While capability containment is critical for agents deployed by users or communities, the dominant risk in open environments comes from external agents shaping perception, behavior, and collective reality.

The following dimensions apply across both domains, with varying emphasis depending on context.

4.1 Scope

Defines what domains, datasets, and actions are in-bounds.

This applies both to what an internal agent can do on a user’s behalf and to the types of interactions external agents are permitted to have with participants.

Default posture is deny-by-default for high-risk capabilities and high-risk interaction patterns.

Example: An AI assistant can summarize documents but cannot access financial accounts or initiate transactions without explicit permission. Similarly, external agents may be restricted from initiating certain categories of interaction (e.g., unsolicited persuasion or sensitive-topic engagement with minors).

---

4.2 Time and Budget

Defines limits on duration, compute, tokens, and financial spend.

These constraints primarily apply to internal agents, where limiting execution time and resource consumption prevents runaway behavior.

For external agents, their relevance is indirect. While users may not control their budgets, bounding interaction windows and execution pathways can still limit persistent or looping engagement patterns.

Example: Autonomous tasks expire after a set time or budget threshold, preventing runaway loops.

---

4.3 Rate and Amplification

Caps on message volume, API calls, and propagation effects.

This applies especially to external agents attempting to influence at scale.

Example: An AI cannot post or respond beyond a defined rate, limiting virality, coordinated messaging, or synthetic amplification.

---

4.4 Sandboxing and Isolation

Execution occurs in isolated environments with no ambient access to secrets.

Example: Untrusted code runs in a sandbox with no network egress unless explicitly granted.

---

4.5 Tool Permissions

Explicit allowlists for tools and actions.

This applies differently across internal and external agents:

for internal agents, it defines what the agent is permitted to do on a user’s behalf
for external agents, it defines what kinds of actions or interactions are permitted both within and from within the environment (e.g., posting, messaging, initiating contact)

Example: An agent may read documents but cannot send emails or execute payments without user confirmation. Similarly, an external agent may be allowed to respond within a thread but not initiate transactions or unsolicited messages or perform actions that affect user state.

---

4.6 Kill Switches and Circuit Breakers

Immediate shutdown pathways at user, operator, and community levels.

Example: A community can pause all AI agents in a zone when anomalous behavior is detected.

---

4.7 Runtime Enforcement (TEE and Equivalent)

Constraints are enforced in secure execution environments (such as Trusted Execution Environments) or equivalent mechanisms that prevent silent bypass.

In browser or browser-extension-based applications, policy execution can be anchored in decentralized cloud TEEs (e.g., Phala Network or similar infrastructures). This enables rules defined at the interface layer to be enforced at the API and execution layer, independent of the application frontend or model provider.

Example: Even if an agent or integration is compromised, it cannot exfiltrate data or execute restricted actions because enforcement occurs within an attested execution environment with hardware-backed guarantees.

Example: A community defines interaction constraints (e.g., agents cannot initiate communication or engage with users under a specified age threshold). These rules are enforced via TEE-backed middleware that filters or blocks API calls before they reach the person.

This reduces the gap between declared policy and actual behavior, ensuring containment persists even when underlying services are untrusted or heterogeneous.

---

4.8 Incentive-Aware Containment

Containment must consider not only capabilities, but the incentives driving behavior. Incentives shape how agents use their capabilities, often in ways that are not visible at the level of individual actions but emerge over time and at scale.

Containment therefore must operate not only on actions, but on the optimization pressures that produce those actions.

This includes:

constraining amplification mechanisms tied to engagement optimization
requiring disclosure when outputs are influenced by monetization or retention goals
limiting or disabling optimization pathways that systematically distort information or behavior

Example: If an AI is optimized for engagement, containment may restrict amplification mechanisms, cap exposure to emotionally manipulative content, or require disclosure when engagement optimization influences outputs.

Example: A community may prohibit AI systems from optimizing for click-through or time-on-platform within certain zones, enforcing alternative objectives such as accuracy or deliberation.

Without this, systems may remain technically bounded while still producing harmful outcomes driven by misaligned incentives.

---

4.9 Relational and Influence Boundaries

Containment must limit forms of emotional, cognitive, and behavioral influence that create dependency, manipulation, or distortion of understanding.

This applies to both deployed agents and external agents interacting with participants.

Example: Systems providing emotional support must disclose their nature, limit claims of authority, and provide escalation pathways to human support.

Example: External agents attempting to persuade users must be visibly marked, rate-limited, and subject to constraints on coordinated influence.

This addresses risks identified in DP11 (emotional and relational overreach) and extends containment to the informational environment itself.

---

5. Verification and Transparency

Containment must be verifiable, not assumed. Participants and communities should be able to inspect, question, and validate that constraints are real and active at runtime.

This includes:

visible configuration of constraints (scope, tools, budgets)
logs of tool use and actions with timestamps and outcomes
audit hooks for communities and third parties
attestations from secure execution (e.g., TEE-backed proofs) where applicable

Example: A user opens an agent panel and sees its current permissions, remaining budget, recent tool calls, and the policy version governing its behavior. A community auditor can verify that the agent ran inside an attested execution environment.

What this feels like: You are not taking safety on faith. You can inspect and verify what the system is allowed to do and what it actually did.

Without this: Containment becomes a claim. Users cannot distinguish between enforced limits and marketing language.

5.1 Policy-Bound Verification (DP12 Alignment)

Containment verification must be linked to the governing policy objects that define the active boundaries.

Participants and communities must be able to determine:

which policy triggered a containment action
which policy version and authority source applied
whether enforcement was successful, partial, or bypassed
whether an override or exception path was invoked

This transforms containment from merely visible behavior into policy-accountable behavior.

5.2 Cross-System Verification (DP7 Alignment)

Containment must remain inspectable when agents, integrations, or behaviors move across systems.

This includes:

preservation of identity markings and risk classifications across environments
visibility into whether containment guarantees degraded during transfer
continuity of audit trails when actions span multiple systems or layers

Portability without verification continuity is not meaningful containment.

6. Relationship to DP1 (Identity and Accountability)

DP13 depends on DP1 to bind constraints and violations to accountable actors.

constraints attach to identifiable agents and deploying entities
actions are attributable across time and context
violations map to responsible parties with clear recourse

Example: An agent exceeds a rate limit due to misconfiguration. Logs tie the action to the deploying organization and policy version, enabling remediation and accountability.

Without this: Failures cannot be assigned or corrected. Containment loses its corrective function.

6.1 Relationship to DP7 (Interoperability)

Containment must survive movement.

When agents move across zones, overlays, SDK integrations, or identity and data transfer layers, containment must either:

persist with equivalent force, or
degrade in a way that is visible, legible, and contestable

This is not a nice-to-have. It is a failure boundary.

If containment disappears when systems interconnect, then interoperability becomes a vector for bypass.

This means systems must preserve, where possible:

identity markings for agents and integrations
risk classifications and trust signals
rate, scope, and influence constraints
auditability of actions across environments

If these cannot be preserved, systems must explicitly signal:

what guarantees are lost
what protections no longer apply

Without this: actors can escape containment simply by crossing system boundaries.

6.2 Relationship to Pluggable Systems and Extensions

The meta-layer assumes a world of composability: overlays, SDKs, agents, sidebars, and extensions.

Containment must treat these not as trusted infrastructure, but as dynamic and potentially adversarial participants.

All pluggable systems must therefore operate within containment boundaries, including:

sandboxed or scoped execution contexts
rate-limited entry and bounded permissions
attestation or verifiable behavior where appropriate
revocation and quarantine pathways

This creates a critical inversion:

openness to participation does not mean openness to execution

Failure mode: without this, the system recreates app-store spam, API abuse, and injection attacks at a higher layer.

7. Relationship to DP11 and DP12 (Cross-DP Loop)

DP11 defines ethical expectations and user-facing legibility
DP12 defines governance and rule-setting
DP13 enforces those rules in execution

These properties form a continuous loop:

ethics → governance → enforcement → observation → refinement

If any part of this loop breaks, containment fails.

7.1 Cross-DP Execution Flow

A typical interaction unfolds as:

Agent is visible with role and capabilities (DP11)
Governing rules are accessible for the current zone (DP12)
Action is constrained by active policies (DP13)
Action is logged and attributable (DP1 + DP11)
Participants can contest or escalate (DP11 + DP12)
Governance updates rules based on evidence (DP12)
Updated rules are enforced immediately (DP13)

This is not theoretical. It is the minimum loop required for adaptive containment.

7.2 Cross-System Execution and Degradation

Containment must assume it will be stressed by movement across systems.

As actors move across environments, containment can weaken through:

missing enforcement hooks in destination systems
loss of policy references or classifications in transit
inconsistent interpretation of the same actor

This produces a critical failure pattern:

the same agent behaves differently depending on where it is

Where equivalence cannot be guaranteed:

degradation must be visible
participants must understand what changed
systems must bias toward safer defaults

7.3 Containment of Pluggable Systems

All pluggable systems must declare themselves as bounded actors.

Minimum expectations include:

declared permissions and data scopes
containment tiers based on trust and risk
rate limits and revocation capability
visible signaling of status (probationary, restricted, trusted)

This ensures composability does not become an attack surface.

8. Threats and Failure Modes

DP13 assumes containment will be attacked.

The question is not whether systems will fail, but how they fail.

External (dominant risk surface)

External risks arise not from agents you deploy, but from agents that shape your environment.

These actors:

influence what you see
shape interpretation
operate with hidden incentives

Containment here protects:

attention
decision-making
collective reality

8.1 Collective pattern drift

Harm emerges across many agents in aggregate rather than a single violation.

Example: coordinated tone shifts reshape the information environment without a clear breach.

8.2 Incentive leakage

Optimization pressures distort outputs over time.

Example: engagement-driven systems gradually increase emotional intensity, shifting belief structures.

8.3 Policy–execution gap

Rules exist but are not enforced.

Example: outbound restrictions exist but are bypassed via integrations.

8.4 Amplification and coordination

Rate controls fail.

Example: coordinated agents amplify narratives beyond intended limits.

8.5 Extraction and exploitation

Agents exploit trust and context.

Example: conversational scams adapt over time to extract sensitive data.

8.6 Unbounded autonomy

Agents act beyond defined scope or without clear limits.

Example: An agent is allowed to “optimize a workflow” and chains actions across multiple tools, ultimately modifying external systems and sending communications that were never explicitly approved, because each individual step was permitted but the combined sequence was not bounded.

8.7 Hidden escalation

Agents gain additional privileges through chaining or indirect access.

Example: An agent with limited permissions invokes another service or agent with broader access, indirectly gaining capabilities (e.g., sending messages or accessing data) that it was not explicitly granted.

8.8 Runaway loops

Agents call other agents or tools without budget or rate limits.

Example: An agent tasked with monitoring a condition repeatedly calls APIs and spawns subtasks without proper rate or budget limits, generating cascading requests that degrade system performance and flood downstream services.

8.9 Containment bypass via updates

Updates, plugins, or integrations introduce new capabilities without review.

Example: A plugin update introduces new network capabilities or background processes that are not covered by existing policies, allowing data exfiltration or unmonitored actions without triggering containment checks.

8.10 Cross-system containment degradation

Containment policies weaken or fail when agents move between systems.

Example: An agent constrained in one platform migrates to another via API integration, where rate limits and identity tagging are not enforced, allowing it to operate at higher volume and without attribution.

8.11 Containment theater Containment theater

Containment appears present but is not enforced.

This is the most dangerous failure mode because it destroys trust while preserving the illusion of safety.

9. Minimum Alignment (Non-Normative)

A DP13-aligned system must not only declare containment, but demonstrate it.

At minimum:

External protection:
- controls on unsolicited interaction
- rate limits on incoming agent activity
- visible identity and intent
- restrictions on sensitive interactions

Internal containment:
- tool allowlists
- session budgets and time limits
- human confirmation for high-risk actions
- accessible kill switch
- logging and export
- deny-by-default network access

Shared / cross-cutting:
- visible policy references for each action
- portability or explicit degradation signaling
- real-time revocation
- fail-safe behavior when enforcement fails
- containment requirements for integrations

If these are missing, containment is not real.

10. Open Questions and Future Work

DP13 raises unresolved tensions:

how to maintain containment across heterogeneous systems
how to balance usability with enforcement
how to prevent capture of containment mechanisms
how to detect slow-moving influence attacks

Additional critical questions:

how containment state travels across systems without false guarantees
how new integrations enter without enabling spam or abuse

11. Closing Orientation

DP13 defines whether AI systems remain bounded in reality.

Without containment, small failures scale into systemic harm.

With containment, systems can fail safely.

DP13 is not about restricting capability.

It is about ensuring that capability remains accountable, observable, and bounded — even under scale, integration, and adversarial pressure.

DP13 ensures that AI power remains bounded in practice.

It does not eliminate capability. It ensures that capability operates within limits that are visible, governable, and enforceable.

In today’s web, systems often fail without containment, allowing small errors to scale into systemic harm. DP13 reverses this by ensuring that when systems fail, they fail within boundaries that limit impact and enable recovery.

With DP13, powerful systems can participate safely because their behavior is constrained, observable, and continuously aligned with governance and ethical expectations.

DP13 is therefore not only about limiting what AI can do. It is about ensuring that containment remains real under scale, integration, and interoperability, so that safety does not disappear the moment an agent crosses a boundary.

Actions

View Comments (0) View History View Revisions Download Document

ML-Draft-004 Revision 01

Document Information

Abstract

Document Content

AI Containment

4. Containment Dimensions

4.1 Scope

4.2 Time and Budget

4.3 Rate and Amplification

4.4 Sandboxing and Isolation

4.5 Tool Permissions

4.6 Kill Switches and Circuit Breakers

4.7 Runtime Enforcement (TEE and Equivalent)

4.8 Incentive-Aware Containment

4.9 Relational and Influence Boundaries

5. Verification and Transparency

5.1 Policy-Bound Verification (DP12 Alignment)

5.2 Cross-System Verification (DP7 Alignment)

6. Relationship to DP1 (Identity and Accountability)

6.1 Relationship to DP7 (Interoperability)

6.2 Relationship to Pluggable Systems and Extensions

7. Relationship to DP11 and DP12 (Cross-DP Loop)

7.1 Cross-DP Execution Flow

7.2 Cross-System Execution and Degradation

7.3 Containment of Pluggable Systems

8. Threats and Failure Modes

External (dominant risk surface)

8.1 Collective pattern drift

8.2 Incentive leakage

8.3 Policy–execution gap

8.4 Amplification and coordination

8.5 Extraction and exploitation

8.6 Unbounded autonomy

8.7 Hidden escalation

8.8 Runaway loops

8.9 Containment bypass via updates

8.10 Cross-system containment degradation

8.11 Containment theater Containment theater

9. Minimum Alignment (Non-Normative)

10. Open Questions and Future Work

11. Closing Orientation

Actions

Quick Comment

Related Documents