ML-Draft-008 · DP4 - Data Sovereignty & Privacy · 9 pg · 4033 words

DP4 – Data Sovereignty and Privacy

1. Purpose of This Draft

This draft articulates Desirable Property 4 (DP4) as the condition under which participants and communities can meaningfully govern data about themselves and their activity in the meta-layer.

DP4 does not treat privacy as a settings menu, a compliance ritual, or a legal disclaimer. It defines the conditions under which claims of ownership, consent, confidentiality, deletion, and portability remain meaningful in practice.

The core claim is that sovereignty over data depends on more than access controls. It depends on whether collection, inference, retention, sharing, and reuse are bounded by visible purposes, governed by revocable permissions, and constrained by structures that communities can understand and audit.

If DP4 is weak, predictable failures follow: consent theater, surveillance-by-default, inference without accountability, lock-in through broken portability, deletion promises that stop at the first vendor boundary, and community rules that cannot survive contact with underlying data pipelines.

DP4 therefore functions as a precondition for multiple later properties. Agency cannot be exercised over invisible data flows. Governance cannot constrain systems that communities cannot inspect. Ethical AI cannot be meaningful where the data it sees, stores, or trains on is structurally uncontrolled.

DP4 does not resolve all legal, jurisdictional, or sector-specific privacy questions. It defines the minimum conditions under which sovereignty and privacy remain real at the interface where data is created, combined, interpreted, and acted upon.

2. Problem Statement

In today’s web, privacy is often presented as disclosure without control.

Participants are shown banners, terms updates, and granular-looking toggles, yet the underlying system still optimizes for maximal collection, indefinite retention, behavioral inference, and partner expansion. In many cases, the formal interface of consent exists while the operational reality of choice does not.

This produces recurring failures:

These failures are not edge cases. They are structural consequences of architectures designed to treat data accumulation as default value creation.

DP4 addresses this by defining data sovereignty as an operational condition. Privacy becomes meaningful only when participants and communities can see the active terms of data use, limit those terms in practice, revoke permissions without fiction, and move or leave without losing the structure of their digital lives.

3. Threats and Failure Modes

3.1 Consent theater

Interfaces bundle unrelated processing into a single act of acceptance.

Example: A participant accepts a terms update to continue using a service and, in doing so, silently authorizes secondary uses of behavioral data for recommendation tuning, advertising, and model training.

Why this matters: The system records consent, but the participant did not experience a meaningful choice. DP4 treats this as a sovereignty failure, not a paperwork issue.

3.2 Purpose creep and secondary use

Data collected for one function expands into new products, ranking systems, partner programs, or model behaviors without a fresh social contract.

Example: Location data collected for safety or delivery is later used for engagement scoring, ad targeting, or brokered partner analytics.

Why this matters: The participant’s mental model of risk becomes false. Trust erodes even where no obvious breach has occurred.

3.3 Illusory portability

Export exists formally but fails functionally.

Example: A participant downloads an archive that contains files and timestamps but omits social graph edges, permission history, role context, provenance, or schemas needed to restore meaningful continuity elsewhere.

Why this matters: Exit is made to look possible while dependency is preserved. DP4 requires portability that preserves usable structure, not only raw payloads.

3.4 Inference without accountability

Systems derive high-stakes conclusions from behavioral traces without clearly governing how those inferences are created, used, challenged, or removed.

Example: A wellness application infers stress or depression risk from typing cadence and browsing patterns, then shares a derived score with an advertising or insurance intermediary.

Why this matters: The participant never explicitly submitted the sensitive category, yet is still acted upon as if they had.

3.5 Retention without sunset

Data persists because retention is cheap, deletion is operationally inconvenient, and analytics cultures prefer indefinite memory.

Example: A participant deletes an account, but vector embeddings, partner datasets, abuse-model features, and backup systems continue to retain traces with no coherent deletion pathway.

Why this matters: Sovereignty requires time bounds. Without them, institutions remember indefinitely while participants bear the burden of asymmetrical memory.

3.6 Cross-context correlation

Identifiers, device graphs, and fingerprinting techniques merge activity across settings that participants experienced as distinct.

Example: Pseudonymous participation in a civic forum is quietly linked to shopping behavior, social browsing, or location history through shared infrastructure.

Why this matters: Plural identity becomes decorative. Communities cannot sustain contextual integrity if correlation silently defeats boundaries.

3.7 False anonymity and weak de-identification

Organizations describe datasets as anonymized even where re-identification remains plausible or contractually enabled downstream.

Example: A mobility dataset stripped of names still exposes sparse routines in a small town, allowing individuals to be reconstructed through outside knowledge.

Why this matters: DP4 requires honesty about residual risk. “De-identified” cannot be treated as a magic word that dissolves responsibility.

3.8 Partner sprawl without propagation

Deletion, revocation, and correction stop at the first layer of control.

Example: A participant deletes messages in one tool, but analytics vendors, cloud backups, and SDK partners continue to retain copies without visibility or participant recourse.

Why this matters: Sovereignty that fails at the first subcontractor boundary is not sovereignty.

3.9 Youth and vulnerable-context overexposure

Defaults optimized for adult engagement expose minors and vulnerable users to data-intensive patterns they are less equipped to assess or contest.

Example: A youth-oriented social tool enables location sharing, behavioral profiling, or AI-mediated emotional inference by default.

Why this matters: DP4 requires higher baselines where stakes are higher. Uniform defaults can produce unequal harm.

4. Core Principle

Data must retain meaning, consent, and accountability as it moves across systems. If data loses its binding to purpose, provenance, or permissions under transformation, sovereignty collapses into simulation.

Data sovereignty and privacy in the meta-layer require that personal and community data be collected, inferred, stored, shared, and reused only under visible, bounded, and governable conditions.

Data sovereignty and privacy in the meta-layer require that personal and community data be collected, inferred, stored, shared, and reused only under visible, bounded, and governable conditions.

Those conditions must include:

In today’s web, these conditions rarely hold together. A system may disclose collection without limiting reuse, provide deletion without propagation, or offer export without restoration value. DP4 treats such partial compliance as insufficient.

The meta-layer reframes privacy as operational control at the point of interaction.

Example: A participant opens a data lens and sees active purposes, relevant processors, current retention clocks, sensitive inferences attached to their account, and downstream systems that have accessed their data. They can revoke training permission, export their activity in an interoperable format, contest a high-risk inference, and receive a propagation receipt for deletion requests.

What this feels like: Privacy stops being a maze of legal text and becomes a set of understandable levers tied to real system behavior.

Without this: Privacy becomes trust in opacity, and opacity fails precisely where accountability matters most.

5. Primary Mechanisms and Structural Conditions

5.1 Purpose binding

Every collection and processing pathway must declare its purpose in terms legible to both participants and communities. Material changes in purpose require visible reauthorization, reclassification, or zone-level review.

Example: A discussion zone permits summarization for moderation assistance but prohibits model training on participant content unless a separate, revocable grant is given.

Without purpose binding, consent collapses into blanket exposure.

5.2 Data minimization by design

Systems must begin from the least collection, retention, and sharing compatible with the function being offered, and expand only through visible, justified choices.

Example: A messaging tool does not upload contacts by default. Contact sync is presented as a distinct choice with plain-language scope and a reversible off switch.

This is not anti-functionality. It is a refusal to make maximal collection the silent baseline.

5.3 Consent stack

Permission must be layered, granular, and revocable, with separate scopes for distinct categories of data use.

This draft uses consent stack as a mechanism-level abstraction: a structured set of permissions that distinguish service provision, analytics, sharing, automation, and training from one another.

Example: A participant permits AI-assisted summarization of their workspace but declines training use and third-party analytics. Revoking training permission does not disable the summarization feature they actually wanted.

The consent stack makes partial participation possible without forcing blanket surrender.

5.4 Meaningful portability

Portability must preserve enough structure to support continuity, not just compliance.

This includes, where technically honest and appropriate:

Example: A participant exports a discussion archive that can be imported into another tool with thread structure, moderation history, authorship context, and trust signals intact.

Without this, “take your data with you” becomes formal rights without real exit.

5.5 Retention clocks and propagation discipline

Retention must be bounded by event-driven or purpose-bound clocks, not indefinite convenience. Deletion, correction, and revocation requests must propagate to known downstream systems with auditable outcomes.

This draft uses retention clocks as a mechanism-level abstraction: visible timers tied to categories of data and stated purposes.

Example: A participant can see that support logs expire in 30 days, abuse-review evidence in 180 days, and AI training exclusion tags apply immediately going forward. When deletion is requested, the system generates a receipt chain showing which processors complied, which are pending, and which limits remain technically unresolved.

DP4 does not require dishonest promises of perfect erasure. It requires propagation discipline and truthful accounting.

5.6 Sensitive inference governance

Derived data can be more consequential than submitted data. High-risk inferences therefore require stronger conditions than ordinary processing.

This includes inferences relating to health, finances, minors, politics, biometric patterns, relational vulnerability, and similar domains of elevated risk.

Example: A system that predicts self-harm risk from behavioral cues must disclose that such inference exists, limit its downstream use, provide human escalation where appropriate, and prohibit repurposing for advertising or engagement optimization.

Inference must be governable as first-class data, not treated as exempt because it was machine-generated.

5.7 Zone-scoped privacy profiles

Communities must be able to define stricter privacy norms within their zones while remaining interoperable with broader infrastructure.

This draft uses privacy profile as a mechanism-level abstraction: a machine-readable expression of the data rules that apply inside a zone.

A privacy profile may specify, for example:

Example: A health-support community publishes a privacy profile that restricts cloud-based inference, blocks third-party SDKs, and requires explicit opt-in before any content can enter training pipelines.

Without zone-scoped privacy profiles, communities may have values but not operational control.

5.8 Auditability and provenance of use

Significant data access, transfer, and inference events must be inspectable in participant-legible and community-legible forms.

This does not require exposing every security detail publicly. It requires enough visibility to support contestation, trust, and oversight.

Example: A participant can see that an automated moderation agent accessed a document under a specific policy version, for a named purpose, with a recorded outcome and timestamp.

Privacy claims that cannot be audited remain aspirational.

5.9 Training and model-use boundaries

Where participant or community content could enter model training, fine-tuning, embedding pipelines, or retrieval systems, those pathways must be separately governed.

Example: A public discussion zone allows search indexing but defaults to no training use. Participants can grant corpus-level permission for research or model improvement on a renewable basis, and declined content carries an exclusion marker through the training pipeline.

This is a direct dependency between DP4 and later AI properties. Ethical AI claims are weak if model access to human data is structurally obscure.

5.10 Jurisdictional and transfer honesty

Cross-border transfers and legal regime changes must be visible as part of the participant’s risk surface.

Example: A participant is shown that a given processor operates under a different legal regime, that redress pathways are limited, and that a community zone therefore blocks that transfer category by default.

Global systems do not excuse vague disclosure. They heighten the need for explicitness.

5.11 Data System Layer: Lineage, Transformation Integrity, and Consent Propagation

Beyond individual mechanisms, DP4 requires a coherent data system layer that preserves lineage, semantics, and permissions across pipelines, services, and time. This layer ensures that data remains trustworthy under transformation, scale, and adversarial use.

5.11.1 Lineage continuity

Failure mode: lineage loss, enabling untraceable reuse and accountability gaps.

5.11.2 Transformation integrity

Failure mode: data laundering, where meaning or risk is altered without trace.

5.11.3 Consent propagation

Failure mode: consent bypass chains, where integrations ignore or reinterpret permissions.

5.11.4 Anti-replay and non-duplication

Failure mode: replay extraction, where the same data yields multiple unaccounted benefits.

5.11.5 Inference binding and governance

Failure mode: inference drift, where derived signals are reused outside their declared context.

5.11.6 Cross-system semantics

Failure mode: semantic drift, where data is misinterpreted after transfer.

This layer does not require centralization. It requires coherence under movement.

6. Governance, Accountability, and Agency Surfaces

DP4 is not satisfied by backend architecture alone. Participants and communities need interfaces through which data conditions become governable.

Participants must be able to:

Communities must be able to:

Example: A civic deliberation zone prohibits third-party trackers and emotional classification systems. Any overlay or agent entering the zone must declare compatibility with the zone’s privacy profile or operate in a visibly constrained mode.

Without these surfaces, privacy remains vendor-defined even when communities appear to have rules.

7. Incentives and Power Analysis

Commercial systems tend to treat data surplus as strategic advantage.

Retention expands because future uses may be profitable. Inference expands because prediction creates leverage. Consent becomes cosmetic where friction threatens growth. SDK ecosystems and downstream processors thrive precisely when participants cannot trace the full chain of use.

DP4 does not assume these incentives disappear. It requires that their effects become visible and contestable.

Example: A platform discloses that feed ranking depends partly on behavioral surplus gathered across sessions. A community operating inside the meta-layer disables that ranking signal within its zone because it conflicts with the zone’s purpose.

This matters because many privacy harms are not caused by one malicious actor. They emerge from ordinary growth logic operating without adequate brakes.

DP4 therefore treats incentive visibility as part of sovereignty. Participants should be able to know when they are not merely receiving a service, but being rendered into a data asset.

8. Community Signals Informing DP4

Across contexts, similar signals recur:

These signals are not abstract. They arise when people sense that the surface language of privacy no longer matches the structure underneath.

DP4 responds to that gap by making data conditions inspectable, debatable, and governable.

9. Non-Goals and Explicit Boundaries

DP4 defines a minimum condition. It does not solve every problem associated with data, secrecy, or identity.

DP4 does not:

These boundaries matter because absolutist privacy claims often collapse under real-world complexity.

For example, some communities may require stronger identity assurance to support trust and accountability. DP4 does not forbid that. It requires that data burdens attached to such assurance remain bounded, visible, and contestable.

Likewise, some inference may be necessary for accessibility, fraud prevention, or urgent safety intervention. DP4 does not deny that. It requires those pathways to be governed explicitly rather than smuggled in under vague necessity claims.

10. Minimum Alignment (Non-Normative)

Minimum alignment is not a policy checklist. It is the threshold at which data sovereignty is enforceable, portable, and resistant to laundering, drift, and silent reuse.

A system that does not meet these conditions may disclose practices, but it does not provide sovereignty.

At minimum, a system claiming DP4 alignment MUST satisfy the following irreducible conditions:

10.1 Purpose binding and enforcement

Failure mode: purpose creep.

10.2 Consent propagation

Failure mode: consent bypass chains.

10.3 Lineage and provenance

Failure mode: lineage loss / data laundering.

10.4 Meaningful portability

Failure mode: illusory portability.

10.5 Retention and propagation discipline

Failure mode: retention without sunset / partner sprawl.

10.6 Inference governance

Failure mode: inference without accountability.

10.7 Auditability of use

Failure mode: opaque processing.

10.8 Interoperability honesty

Failure mode: interop deception.


These conditions define the minimum viable data sovereignty layer of the Meta-Layer.

Partial implementations that omit purpose enforcement, consent propagation, lineage, or propagation discipline MUST NOT be considered aligned with DP4.

11. Open Questions and Future Work

DP4 surfaces unresolved design challenges that require further work:

These are not reasons to delay better defaults. They mark the frontier where DP4 must mature through practice, governance, and implementation evidence.

12. Relationship to Other Desirable Properties

DP4 is foundational and interdependent.

A failure in DP4 propagates upward. If data conditions are opaque, later governance becomes symbolic, ethical AI becomes ungrounded, and participant agency becomes procedural rather than real.

13. Path Toward ML-RFC

Progression from draft to RFC-grade maturity would require:

The goal is not to freeze one final model of privacy. It is to establish durable conditions under which sovereignty claims can be tested, challenged, and improved.

14. Closing Orientation

DP4 is where the meta-layer rejects the old bargain of convenience in exchange for invisibility.

Sovereignty is not achieved when a participant is merely informed that data extraction may occur. It is achieved when the participant and the communities they inhabit can see the operative terms of data use, shape those terms where appropriate, withdraw from them in meaningful ways, and leave without losing the structure of their digital life.

When DP4 is strong, trust in governance, AI, commerce, and collaboration becomes plausible.

When DP4 is weak, every higher-order property is forced to fight against a substrate that quietly converts participation into extraction.

If you want, I can also turn this into a numbered ML-Draft format with Status, Path Toward ML-RFC metadata, and house-style alignment to match your DP11–13 documents exactly.

Build 78 | MLGH Datatracker