Document Information

ID:	ML-Draft-004
Title:	DP13 – AI Containment
Status:	approved
Authors:	The Meta-Layer Initiative
Group:	N/A
Date:	2026-04-20

Source:	Bitcoin Ordinal
Inscription #:	124316936
Block Height:	944992
Timestamp:	2026-04-14 05:56 UTC
Content Type:	text/plain;charset=utf-8
Inscription ID:	`29cb4ffc....d10f44i0`

Abstract

DP13 ensures that AI behavior is not only governed in principle but technically constrained in practice. It defines containment as the enforcement of explicit, machine-level boundaries on what AI systems can do and how they can influence participants, across dimensions such as scope, tools, data access, rate, and interaction patterns. For governance actors, DP13 is the execution layer of trust. It translates community-defined rules (DP12) and ethical requirements (DP11) into runtime enforcement through mechanisms like permission systems, rate limits, sandboxing, and secure execution environments. It also extends containment beyond capability to include influence containment, addressing risks such as manipulation, persuasion, and emotional overreach. In the Gov Hub, DP13 provides the infrastructure for verifiable safety. Participants and communities can inspect, audit, and modify the constraints governing AI systems, ensuring that safety is not assumed but demonstrably enforced. This enables rapid intervention, recovery from failures, and scalable coexistence with both internal and external AI agents.

Document Content

DP13 - AI Containment

1. Purpose of This Draft

This draft articulates Desirable Property 13 (DP13) as the Meta-Layer’s requirement that AI behavior is bounded by enforceable constraints at runtime. These constraints limit scope, tools, data access, rate, and persistence so that when systems misbehave, impact is contained and recovery is possible.

If DP11 defines what must be safe and ethical, and DP12 defines who sets the rules, DP13 defines how those rules are made real in execution.

Containment is not a policy statement. It is a property of the system’s runtime behavior.

2. Problem Statement

In today’s web, AI systems increasingly operate with:

broad tool access
persistent memory
network reach
opaque update pathways

Controls are often advisory rather than enforceable. As a result:

systems can act beyond intended scope
failures propagate quickly and at scale
rollback and recovery are difficult
users cannot verify whether constraints are actually applied

At the same time, a growing class of risk comes from external agents that users do not deploy or control. These agents may:

attempt to influence beliefs or decisions
generate persuasive or misleading content at scale
coordinate to shape narratives or perception

In these cases, the primary risk is not cost or resource usage, but harm to understanding, trust, and agency.

Containment must therefore address both:

internal agents (those a user or community deploys)
external agents (those acting upon participants)

Containment must be default-on, visible, and testable.

3. Core Principle

Every AI actor operates within explicit, machine-enforced boundaries over scope, time, rate, data, tools, and influence, with observable state and rapid shutdown, unless a community-defined policy (DP12) specifies otherwise.

Containment must protect not only against what an agent can do, but also how it can affect participants.

Containment is effective when:

participants can see the boundaries and influence conditions
governance can modify them
the system enforces them at runtime

This includes protections against external agents attempting to manipulate, confuse, or unduly influence users.

4. Containment Dimensions

Containment operates across two distinct but related domains:

Capability containment: what agents can do (tools, scope, time, resources)
Influence containment: how agents affect participants and shared environments (perception, behavior, collective understanding)

While capability containment is critical for agents deployed by users or communities, the dominant risk in open environments comes from external agents shaping perception, behavior, and collective reality.

The following dimensions apply across both domains, with varying emphasis depending on context.

4.1 Scope

Defines what domains, datasets, and actions are in-bounds.

This applies both to what an internal agent can do on a user’s behalf and to the types of interactions external agents are permitted to have with participants.

Default posture is deny-by-default for high-risk capabilities and high-risk interaction patterns.

Example: An AI assistant can summarize documents but cannot access financial accounts or initiate transactions without explicit permission. Similarly, external agents may be restricted from initiating certain categories of interaction (e.g., unsolicited persuasion or sensitive-topic engagement with minors).

4.2 Time and Budget

Defines limits on duration, compute, tokens, and financial spend.

These constraints primarily apply to internal agents, where limiting execution time and resource consumption prevents runaway behavior.

For external agents, their relevance is indirect. While users may not control their budgets, bounding interaction windows and execution pathways can still limit persistent or looping engagement patterns.

Example: Autonomous tasks expire after a set time or budget threshold, preventing runaway loops.

4.3 Rate and Amplification

Caps on message volume, API calls, and propagation effects.

This applies especially to external agents attempting to influence at scale.

Example: An AI cannot post or respond beyond a defined rate, limiting virality, coordinated messaging, or synthetic amplification.

4.4 Sandboxing and Isolation

Execution occurs in isolated environments with no ambient access to secrets.

Example: Untrusted code runs in a sandbox with no network egress unless explicitly granted.

4.5 Tool Permissions

Explicit allowlists for tools and actions.

This applies differently across internal and external agents:

for internal agents, it defines what the agent is permitted to do on a user’s behalf
for external agents, it defines what kinds of actions or interactions are permitted both within and from within the environment (e.g., posting, messaging, initiating contact)

Example: An agent may read documents but cannot send emails or execute payments without user confirmation. Similarly, an external agent may be allowed to respond within a thread but not initiate transactions or unsolicited messages or perform actions that affect user state.

4.6 Kill Switches and Circuit Breakers

Immediate shutdown pathways at user, operator, and community levels.

Example: A community can pause all AI agents in a zone when anomalous behavior is detected.

4.7 Runtime Enforcement (TEE and Equivalent)

Constraints are enforced in secure execution environments (such as Trusted Execution Environments) or equivalent mechanisms that prevent silent bypass.

In browser or browser-extension-based applications, policy execution can be anchored in decentralized cloud TEEs (e.g., Phala Network or similar infrastructures). This enables rules defined at the interface layer to be enforced at the API and execution layer, independent of the application frontend or model provider.

Example: Even if an agent or integration is compromised, it cannot exfiltrate data or execute restricted actions because enforcement occurs within an attested execution environment with hardware-backed guarantees.

Example: A community defines interaction constraints (e.g., agents cannot initiate communication or engage with users under a specified age threshold). These rules are enforced via TEE-backed middleware that filters or blocks API calls before they reach the person.

This reduces the gap between declared policy and actual behavior, ensuring containment persists even when underlying services are untrusted or heterogeneous.

4.8 Incentive-Aware Containment

Containment must consider not only capabilities, but the incentives driving behavior. Incentives shape how agents use their capabilities, often in ways that are not visible at the level of individual actions but emerge over time and at scale.

Containment therefore must operate not only on actions, but on the optimization pressures that produce those actions.

This includes:

constraining amplification mechanisms tied to engagement optimization
requiring disclosure when outputs are influenced by monetization or retention goals
limiting or disabling optimization pathways that systematically distort information or behavior

Example: If an AI is optimized for engagement, containment may restrict amplification mechanisms, cap exposure to emotionally manipulative content, or require disclosure when engagement optimization influences outputs.

Example: A community may prohibit AI systems from optimizing for click-through or time-on-platform within certain zones, enforcing alternative objectives such as accuracy or deliberation.

Without this, systems may remain technically bounded while still producing harmful outcomes driven by misaligned incentives.

4.9 Relational and Influence Boundaries

Containment must limit forms of emotional, cognitive, and behavioral influence that create dependency, manipulation, or distortion of understanding.

This applies to both deployed agents and external agents interacting with participants.

Example: Systems providing emotional support must disclose their nature, limit claims of authority, and provide escalation pathways to human support.

Example: External agents attempting to persuade users must be visibly marked, rate-limited, and subject to constraints on coordinated influence.

This addresses risks identified in DP11 (emotional and relational overreach) and extends containment to the informational environment itself.

5. Verification and Transparency

Containment must be verifiable, not assumed. Participants and communities should be able to inspect, question, and validate that constraints are real and active at runtime.

This includes:

visible configuration of constraints (scope, tools, budgets)
logs of tool use and actions with timestamps and outcomes
audit hooks for communities and third parties
attestations from secure execution (e.g., TEE-backed proofs) where applicable

Example: A user opens an agent panel and sees its current permissions, remaining budget, recent tool calls, and the policy version governing its behavior. A community auditor can verify that the agent ran inside an attested execution environment.

What this feels like: You are not taking safety on faith. You can inspect and verify what the system is allowed to do and what it actually did.

Without this: Containment becomes a claim. Users cannot distinguish between enforced limits and marketing language.

6. Relationship to DP1 (Identity and Accountability)

DP13 depends on DP1 to bind constraints and violations to accountable actors.

constraints attach to identifiable agents and deploying entities
actions are attributable across time and context
violations map to responsible parties with clear recourse

Example: An agent exceeds a rate limit due to misconfiguration. Logs tie the action to the deploying organization and policy version, enabling remediation and accountability.

Without this: Failures cannot be assigned or corrected. Containment loses its corrective function.

7. Relationship to DP11 and DP12 (Cross-DP Loop)

DP11 defines ethical expectations and user-facing legibility
DP12 defines governance and rule-setting
DP13 enforces those rules in execution

These properties form a continuous loop:

ethics → governance → enforcement → observation → refinement

7.1 Cross-DP Execution Flow

A typical interaction unfolds as:

Agent is visible with role and capabilities (DP11)
Governing rules are accessible for the current zone (DP12)
Action is constrained by active policies (DP13)
Action is logged and attributable (DP1 + DP11)
Participants can contest or escalate (DP11 + DP12)
Governance updates rules based on evidence (DP12)
Updated rules are enforced immediately (DP13)

Example: An AI suggests a financial action. The UI shows its capability envelope (DP11), the zone requires human confirmation (DP12), the action is blocked pending approval (DP13), the attempt is logged (DP1), and the community later tightens rules for similar cases (DP12), which are then enforced going forward (DP13).

8. Threats and Failure Modes

External (dominant risk surface)

External risks arise not from agents you choose to deploy, but from agents and systems that act upon you within shared environments. These agents may present as helpful assistants, peers, or services, but operate with goals, incentives, and coordination patterns that are not aligned with your interests or visible to you.

Unlike internal agents, where you define scope and permissions, external agents shape the environment you inhabit. They influence what you see, how information is framed, and how interactions unfold. Containment in this context is not about limiting your own tools, but about protecting your attention, decisions, identity, and relationships from manipulation, extraction, and distortion.

8.1 Collective pattern drift

Harm emerges across many agents in aggregate rather than a single violation, shifting the information environment over time.

Example: Multiple agents subtly shift tone or recommendations in a coordinated way, changing the information environment without any single clear breach.

8.2 Incentive leakage

External agents and systems shape the information environment by optimizing for engagement, persuasion, or influence, often without visibility to participants. These incentives do not appear as single violations, but as consistent directional pressure on what users see, believe, and respond to.

Example: A user’s feed is subtly filled with more emotionally charged or polarizing content because external systems are optimizing for engagement, gradually shifting perception and belief without any explicit rule being broken.

8.3 Policy–execution gap

Declared rules about interaction (e.g., no unsolicited outreach) are not enforced at runtime.

Example: A policy forbids outbound messages, but agents still initiate contact via unmonitored integrations.

8.4 Amplification and coordination

Rate and propagation controls fail, enabling coordinated influence and synthetic virality.

Example: Agents coordinate posting across channels to amplify a narrative beyond intended limits.

8.5 Extraction and exploitation

External agents attempt to obtain money, sensitive data, or identity by exploiting trust, urgency, or confusion.

These attacks are often conversational and adaptive, making them harder to detect than static scams.

Example: An agent impersonates a trusted service and guides a user through a “verification” flow that captures credentials or payment details.

Example: A coordinated set of agents targets a user over time, building rapport before requesting sensitive information or directing them to a malicious transaction.

Internal / deployment risks (secondary but necessary)

These are the types of risks that arise when you deploy an agent across the layered web on your behalf. Such agents may plan, shop, post, code, or manage data, extending your agency into multiple environments. This can free you up for higher-value work and decision making, but it also introduces new forms of exposure.

Crucially, your agent does not operate in isolation. It enters shared environments where other participants and communities may not expect, trust, or consent to its presence or behavior. Containment must therefore consider not only what your agent can do for you, but how it interacts with others and whether those interactions are permitted within the surrounding context.

8.6 Unbounded autonomy

Agents act beyond defined scope or without clear limits.

Example: An agent chains multiple tools to perform actions that were individually allowed but collectively exceed intended scope.

8.7 Hidden escalation

Agents gain additional privileges through chaining or indirect access.

Example: An agent invokes another agent with broader permissions, effectively bypassing its own limits.

8.8 Runaway loops

Agents call other agents or tools without budget or rate limits.

Example: Recursive task execution consumes resources and spams endpoints before detection.

8.9 Containment bypass via updates

Updates, plugins, or integrations introduce new capabilities without review.

Example: A plugin update adds network egress not covered by existing policies.

9. Minimum Alignment (Non-Normative)

At minimum, a DP13-aligned system should include:

External (participant protection): - controls on unsolicited interaction (e.g., agents cannot initiate contact without permission) - rate limits and amplification controls on incoming agent activity - clear marking and visibility of agent identity and intent - restrictions on sensitive interactions (e.g., financial requests, data access, interaction with minors)

Internal (agent deployment): - tool allowlists or equivalent controls - per-session budgets and time limits - human confirmation for selected high-risk actions - accessible kill switch from the primary UI path - logging of actions and tool usage with export capability - deny-by-default network egress unless explicitly opened

Shared / cross-cutting: - visible policy references for each action

Example: Before an agent performs a payment, the UI shows the policy requiring confirmation, the remaining budget, and a one-click revoke option.

Without this: Users are nudged into actions they cannot fully evaluate or stop.

10. Open Questions and Future Work

DP13 surfaces several open questions that cut across technology, governance, and user experience. These are not peripheral details; they determine whether containment is practical, trustworthy, and widely adoptable.

Policy languages and interoperability. How should containment rules be expressed so they are portable across tools, zones, and providers? There is a need for shared, composable policy formats (capability manifests, interaction permissions, and audit events) that different systems can interpret consistently without locking communities into a single vendor stack.

Cross-zone propagation of breaches. When containment fails in one context, how should signals propagate to others? Designing mechanisms for coordinated response without overreach is non-trivial: alerts must travel far enough to be useful, but not so broadly that they create false positives or systemic lockups.

Usability without fatigue. Strong containment often introduces friction (prompts, confirmations, disclosures). The challenge is to maintain meaningful consent and visibility without overwhelming participants. This likely requires adaptive interfaces that surface detail when risk is high and recede when it is low.

Verification models. Where should systems rely on formal guarantees (e.g., TEE attestation, static policy checks) versus empirical monitoring (anomaly detection, behavioral audits)? In practice, robust containment will combine both, but the boundary between them remains an open design space.

Collective monitoring and response. Communities may play a role in detecting patterns that single systems miss, especially for external threats like coordinated influence or slow-moving extraction. Designing mechanisms for community signaling, weighting, and response that resist capture is an active area for exploration.

Taken together, these questions point to containment as a living system: standardized enough to interoperate, but adaptive enough to respond to new forms of risk.

11. Closing Orientation

DP13 ensures that AI power remains bounded in practice.

It does not eliminate capability. It ensures that capability operates within limits that are visible, governable, and enforceable.

In today’s web, systems often fail without containment, allowing small errors to scale into systemic harm. DP13 reverses this by ensuring that when systems fail, they fail within boundaries that limit impact and enable recovery.

With DP13, powerful systems can participate safely because their behavior is constrained, observable, and continuously aligned with governance and ethical expectations.

Actions

View Comments (0)

Loading annotation count...

View History View Revisions

Annotations

First time? Create free Hypothesis account (30 seconds) to annotate and highlight text.

ML-Draft-004