# **DP4 – Data Sovereignty and Privacy** ## 1. Purpose of This Draft This draft articulates Desirable Property 4 (DP4) as the condition under which participants and communities can meaningfully govern data about themselves and their activity in the meta-layer. DP4 does not treat privacy as a settings menu, a compliance ritual, or a legal disclaimer. It defines the conditions under which claims of ownership, consent, confidentiality, deletion, and portability remain meaningful in practice. The core claim is that sovereignty over data depends on more than access controls. It depends on whether collection, inference, retention, sharing, and reuse are bounded by visible purposes, governed by revocable permissions, and constrained by structures that communities can understand and audit. If DP4 is weak, predictable failures follow: consent theater, surveillance-by-default, inference without accountability, lock-in through broken portability, deletion promises that stop at the first vendor boundary, and community rules that cannot survive contact with underlying data pipelines. DP4 therefore functions as a precondition for multiple later properties. Agency cannot be exercised over invisible data flows. Governance cannot constrain systems that communities cannot inspect. Ethical AI cannot be meaningful where the data it sees, stores, or trains on is structurally uncontrolled. DP4 does not resolve all legal, jurisdictional, or sector-specific privacy questions. It defines the minimum conditions under which sovereignty and privacy remain real at the interface where data is created, combined, interpreted, and acted upon. ## 2. Problem Statement In today’s web, privacy is often presented as disclosure without control. Participants are shown banners, terms updates, and granular-looking toggles, yet the underlying system still optimizes for maximal collection, indefinite retention, behavioral inference, and partner expansion. In many cases, the formal interface of consent exists while the operational reality of choice does not. This produces recurring failures: - participants cannot tell what is being collected, for what purpose, for how long, or by whom - data collected for one function expands into advertising, analytics, resale, or model training - sensitive inferences are derived from behavior without clear notice, recourse, or deletion pathways - data exports preserve liability but not usability, making exit technically legal but practically costly - account deletion rarely maps cleanly to embeddings, downstream processors, backups, or partner copies - communities cannot enforce stricter data norms inside their spaces because underlying systems remain vendor-shaped These failures are not edge cases. They are structural consequences of architectures designed to treat data accumulation as default value creation. DP4 addresses this by defining data sovereignty as an operational condition. Privacy becomes meaningful only when participants and communities can see the active terms of data use, limit those terms in practice, revoke permissions without fiction, and move or leave without losing the structure of their digital lives. ## 3. Threats and Failure Modes ### 3.1 Consent theater Interfaces bundle unrelated processing into a single act of acceptance. **Example:** A participant accepts a terms update to continue using a service and, in doing so, silently authorizes secondary uses of behavioral data for recommendation tuning, advertising, and model training. **Why this matters:** The system records consent, but the participant did not experience a meaningful choice. DP4 treats this as a sovereignty failure, not a paperwork issue. ### 3.2 Purpose creep and secondary use Data collected for one function expands into new products, ranking systems, partner programs, or model behaviors without a fresh social contract. **Example:** Location data collected for safety or delivery is later used for engagement scoring, ad targeting, or brokered partner analytics. **Why this matters:** The participant’s mental model of risk becomes false. Trust erodes even where no obvious breach has occurred. ### 3.3 Illusory portability Export exists formally but fails functionally. **Example:** A participant downloads an archive that contains files and timestamps but omits social graph edges, permission history, role context, provenance, or schemas needed to restore meaningful continuity elsewhere. **Why this matters:** Exit is made to look possible while dependency is preserved. DP4 requires portability that preserves usable structure, not only raw payloads. ### 3.4 Inference without accountability Systems derive high-stakes conclusions from behavioral traces without clearly governing how those inferences are created, used, challenged, or removed. **Example:** A wellness application infers stress or depression risk from typing cadence and browsing patterns, then shares a derived score with an advertising or insurance intermediary. **Why this matters:** The participant never explicitly submitted the sensitive category, yet is still acted upon as if they had. ### 3.5 Retention without sunset Data persists because retention is cheap, deletion is operationally inconvenient, and analytics cultures prefer indefinite memory. **Example:** A participant deletes an account, but vector embeddings, partner datasets, abuse-model features, and backup systems continue to retain traces with no coherent deletion pathway. **Why this matters:** Sovereignty requires time bounds. Without them, institutions remember indefinitely while participants bear the burden of asymmetrical memory. ### 3.6 Cross-context correlation Identifiers, device graphs, and fingerprinting techniques merge activity across settings that participants experienced as distinct. **Example:** Pseudonymous participation in a civic forum is quietly linked to shopping behavior, social browsing, or location history through shared infrastructure. **Why this matters:** Plural identity becomes decorative. Communities cannot sustain contextual integrity if correlation silently defeats boundaries. ### 3.7 False anonymity and weak de-identification Organizations describe datasets as anonymized even where re-identification remains plausible or contractually enabled downstream. **Example:** A mobility dataset stripped of names still exposes sparse routines in a small town, allowing individuals to be reconstructed through outside knowledge. **Why this matters:** DP4 requires honesty about residual risk. “De-identified” cannot be treated as a magic word that dissolves responsibility. ### 3.8 Partner sprawl without propagation Deletion, revocation, and correction stop at the first layer of control. **Example:** A participant deletes messages in one tool, but analytics vendors, cloud backups, and SDK partners continue to retain copies without visibility or participant recourse. **Why this matters:** Sovereignty that fails at the first subcontractor boundary is not sovereignty. ### 3.9 Youth and vulnerable-context overexposure Defaults optimized for adult engagement expose minors and vulnerable users to data-intensive patterns they are less equipped to assess or contest. **Example:** A youth-oriented social tool enables location sharing, behavioral profiling, or AI-mediated emotional inference by default. **Why this matters:** DP4 requires higher baselines where stakes are higher. Uniform defaults can produce unequal harm. ## 4. Core Principle **Data must retain meaning, consent, and accountability as it moves across systems. If data loses its binding to purpose, provenance, or permissions under transformation, sovereignty collapses into simulation.** Data sovereignty and privacy in the meta-layer require that personal and community data be collected, inferred, stored, shared, and reused only under visible, bounded, and governable conditions. Data sovereignty and privacy in the meta-layer require that personal and community data be collected, inferred, stored, shared, and reused only under visible, bounded, and governable conditions. Those conditions must include: - clear purpose binding - minimization of collection, access, and retention - meaningful consent and withdrawal - portability with practical utility - deletion or attenuation pathways that propagate as far as technically possible - auditability of significant access, inference, and transfer events - community capacity to impose stricter norms within governed zones In today’s web, these conditions rarely hold together. A system may disclose collection without limiting reuse, provide deletion without propagation, or offer export without restoration value. DP4 treats such partial compliance as insufficient. The meta-layer reframes privacy as operational control at the point of interaction. **Example:** A participant opens a data lens and sees active purposes, relevant processors, current retention clocks, sensitive inferences attached to their account, and downstream systems that have accessed their data. They can revoke training permission, export their activity in an interoperable format, contest a high-risk inference, and receive a propagation receipt for deletion requests. **What this feels like:** Privacy stops being a maze of legal text and becomes a set of understandable levers tied to real system behavior. **Without this:** Privacy becomes trust in opacity, and opacity fails precisely where accountability matters most. ## 5. Primary Mechanisms and Structural Conditions ### 5.1 Purpose binding Every collection and processing pathway must declare its purpose in terms legible to both participants and communities. Material changes in purpose require visible reauthorization, reclassification, or zone-level review. **Example:** A discussion zone permits summarization for moderation assistance but prohibits model training on participant content unless a separate, revocable grant is given. Without purpose binding, consent collapses into blanket exposure. ### 5.2 Data minimization by design Systems must begin from the least collection, retention, and sharing compatible with the function being offered, and expand only through visible, justified choices. **Example:** A messaging tool does not upload contacts by default. Contact sync is presented as a distinct choice with plain-language scope and a reversible off switch. This is not anti-functionality. It is a refusal to make maximal collection the silent baseline. ### 5.3 Consent stack Permission must be layered, granular, and revocable, with separate scopes for distinct categories of data use. This draft uses **consent stack** as a mechanism-level abstraction: a structured set of permissions that distinguish service provision, analytics, sharing, automation, and training from one another. **Example:** A participant permits AI-assisted summarization of their workspace but declines training use and third-party analytics. Revoking training permission does not disable the summarization feature they actually wanted. The consent stack makes partial participation possible without forcing blanket surrender. ### 5.4 Meaningful portability Portability must preserve enough structure to support continuity, not just compliance. This includes, where technically honest and appropriate: - stable identifiers - schemas - role metadata - interaction history - provenance markers - permissions history - relationship edges needed for mainstream migration use cases **Example:** A participant exports a discussion archive that can be imported into another tool with thread structure, moderation history, authorship context, and trust signals intact. Without this, “take your data with you” becomes formal rights without real exit. ### 5.5 Retention clocks and propagation discipline Retention must be bounded by event-driven or purpose-bound clocks, not indefinite convenience. Deletion, correction, and revocation requests must propagate to known downstream systems with auditable outcomes. This draft uses **retention clocks** as a mechanism-level abstraction: visible timers tied to categories of data and stated purposes. **Example:** A participant can see that support logs expire in 30 days, abuse-review evidence in 180 days, and AI training exclusion tags apply immediately going forward. When deletion is requested, the system generates a receipt chain showing which processors complied, which are pending, and which limits remain technically unresolved. DP4 does not require dishonest promises of perfect erasure. It requires propagation discipline and truthful accounting. ### 5.6 Sensitive inference governance Derived data can be more consequential than submitted data. High-risk inferences therefore require stronger conditions than ordinary processing. This includes inferences relating to health, finances, minors, politics, biometric patterns, relational vulnerability, and similar domains of elevated risk. **Example:** A system that predicts self-harm risk from behavioral cues must disclose that such inference exists, limit its downstream use, provide human escalation where appropriate, and prohibit repurposing for advertising or engagement optimization. Inference must be governable as first-class data, not treated as exempt because it was machine-generated. ### 5.7 Zone-scoped privacy profiles Communities must be able to define stricter privacy norms within their zones while remaining interoperable with broader infrastructure. This draft uses **privacy profile** as a mechanism-level abstraction: a machine-readable expression of the data rules that apply inside a zone. A privacy profile may specify, for example: - no third-party behavioral advertising - no model training on participant contributions - local-only processing for sensitive content - elevated rules for youth participation - stronger consent requirements for biometric or emotional inference **Example:** A health-support community publishes a privacy profile that restricts cloud-based inference, blocks third-party SDKs, and requires explicit opt-in before any content can enter training pipelines. Without zone-scoped privacy profiles, communities may have values but not operational control. ### 5.8 Auditability and provenance of use Significant data access, transfer, and inference events must be inspectable in participant-legible and community-legible forms. This does not require exposing every security detail publicly. It requires enough visibility to support contestation, trust, and oversight. **Example:** A participant can see that an automated moderation agent accessed a document under a specific policy version, for a named purpose, with a recorded outcome and timestamp. Privacy claims that cannot be audited remain aspirational. ### 5.9 Training and model-use boundaries Where participant or community content could enter model training, fine-tuning, embedding pipelines, or retrieval systems, those pathways must be separately governed. **Example:** A public discussion zone allows search indexing but defaults to no training use. Participants can grant corpus-level permission for research or model improvement on a renewable basis, and declined content carries an exclusion marker through the training pipeline. This is a direct dependency between DP4 and later AI properties. Ethical AI claims are weak if model access to human data is structurally obscure. ### 5.10 Jurisdictional and transfer honesty Cross-border transfers and legal regime changes must be visible as part of the participant’s risk surface. **Example:** A participant is shown that a given processor operates under a different legal regime, that redress pathways are limited, and that a community zone therefore blocks that transfer category by default. Global systems do not excuse vague disclosure. They heighten the need for explicitness. ### 5.11 Data System Layer: Lineage, Transformation Integrity, and Consent Propagation Beyond individual mechanisms, DP4 requires a coherent data system layer that preserves **lineage, semantics, and permissions** across pipelines, services, and time. This layer ensures that data remains trustworthy under transformation, scale, and adversarial use. #### 5.11.1 Lineage continuity - Data MUST carry provenance linking it to source, purpose, and processing context - Derived artifacts (features, embeddings, summaries) MUST reference upstream lineage Failure mode: **lineage loss**, enabling untraceable reuse and accountability gaps. #### 5.11.2 Transformation integrity - Transformations MUST be attributable to an actor (human/AI) and a declared purpose - Material transformations SHOULD be reversible or auditable where feasible Failure mode: **data laundering**, where meaning or risk is altered without trace. #### 5.11.3 Consent propagation - Permissions MUST travel with data across internal and external systems - Downstream processors MUST honor upstream constraints or declare degradation explicitly Failure mode: **consent bypass chains**, where integrations ignore or reinterpret permissions. #### 5.11.4 Anti-replay and non-duplication - Identity- or consent-bound artifacts MUST NOT be reused to gain additional value without attribution - Systems SHOULD detect duplicate extraction across pipelines Failure mode: **replay extraction**, where the same data yields multiple unaccounted benefits. #### 5.11.5 Inference binding and governance - Inferences MUST be treated as first-class data with lineage, purpose, and revocation pathways - High-risk inferences require stricter constraints and auditability Failure mode: **inference drift**, where derived signals are reused outside their declared context. #### 5.11.6 Cross-system semantics - Systems MUST signal when data meaning or guarantees change across contexts - Mappings between schemas MUST preserve or explicitly degrade semantics Failure mode: **semantic drift**, where data is misinterpreted after transfer. This layer does not require centralization. It requires **coherence under movement**. ## 6. Governance, Accountability, and Agency Surfaces DP4 is not satisfied by backend architecture alone. Participants and communities need interfaces through which data conditions become governable. Participants must be able to: - see active purposes, processors, retention clocks, and major inference categories - revoke permissions without unfairly losing unrelated core functionality where technical separation is possible - export their data in forms that preserve practical continuity - correct or contest significant false inferences - understand what deletion means in each category of storage and reuse Communities must be able to: - publish stricter privacy profiles for their zones - reject tools that cannot meet those profiles - audit aggregate compliance without turning privacy governance into a new surveillance regime - preserve governance memory around why a privacy rule exists and when it changed - distinguish individual confidentiality from community-level observability of system behavior **Example:** A civic deliberation zone prohibits third-party trackers and emotional classification systems. Any overlay or agent entering the zone must declare compatibility with the zone’s privacy profile or operate in a visibly constrained mode. Without these surfaces, privacy remains vendor-defined even when communities appear to have rules. ## 7. Incentives and Power Analysis Commercial systems tend to treat data surplus as strategic advantage. Retention expands because future uses may be profitable. Inference expands because prediction creates leverage. Consent becomes cosmetic where friction threatens growth. SDK ecosystems and downstream processors thrive precisely when participants cannot trace the full chain of use. DP4 does not assume these incentives disappear. It requires that their effects become visible and contestable. **Example:** A platform discloses that feed ranking depends partly on behavioral surplus gathered across sessions. A community operating inside the meta-layer disables that ranking signal within its zone because it conflicts with the zone’s purpose. This matters because many privacy harms are not caused by one malicious actor. They emerge from ordinary growth logic operating without adequate brakes. DP4 therefore treats incentive visibility as part of sovereignty. Participants should be able to know when they are not merely receiving a service, but being rendered into a data asset. ## 8. Community Signals Informing DP4 Across contexts, similar signals recur: - fatigue with unreadable policies and false choice architectures - demand for portability that supports real migration - frustration with invisible partner ecosystems and silent SDK extraction - concern about sensitive inferences that participants never explicitly supplied - desire to ask, in operational terms, “what does the system think it knows about me?” - expectation that youth and vulnerable contexts should receive safer defaults, not merely more warnings These signals are not abstract. They arise when people sense that the surface language of privacy no longer matches the structure underneath. DP4 responds to that gap by making data conditions inspectable, debatable, and governable. ## 9. Non-Goals and Explicit Boundaries DP4 defines a minimum condition. It does not solve every problem associated with data, secrecy, or identity. DP4 does not: - guarantee perfect anonymity in all settings - prohibit all inference or all data sharing categorically - replace law, regulation, or sector-specific obligations in health, finance, education, or public safety - mandate one universal privacy culture across all communities - promise technically impossible forms of deletion while downstream copies or model memorization remain unresolved - remove the need for accountability-linked identity in contexts where stronger attribution is legitimately required These boundaries matter because absolutist privacy claims often collapse under real-world complexity. For example, some communities may require stronger identity assurance to support trust and accountability. DP4 does not forbid that. It requires that data burdens attached to such assurance remain bounded, visible, and contestable. Likewise, some inference may be necessary for accessibility, fraud prevention, or urgent safety intervention. DP4 does not deny that. It requires those pathways to be governed explicitly rather than smuggled in under vague necessity claims. ## 10. Minimum Alignment (Non-Normative) Minimum alignment is not a policy checklist. It is the threshold at which data sovereignty is **enforceable, portable, and resistant to laundering, drift, and silent reuse**. A system that does not meet these conditions may disclose practices, but it does not provide sovereignty. At minimum, a system claiming DP4 alignment MUST satisfy the following **irreducible conditions**: ### 10.1 Purpose binding and enforcement - All collection and processing MUST declare purpose and enforce it in execution - Material purpose changes MUST require reauthorization or zone review Failure mode: **purpose creep**. ### 10.2 Consent propagation - Permissions MUST travel with data across pipelines and partners - Downstream systems MUST honor or explicitly degrade constraints Failure mode: **consent bypass chains**. ### 10.3 Lineage and provenance - Data and derivatives MUST carry reconstructable lineage - Significant transformations MUST be attributable Failure mode: **lineage loss / data laundering**. ### 10.4 Meaningful portability - Exports MUST preserve structure needed for practical migration (schemas, relationships, permissions) - Systems MUST disclose omissions and degradation Failure mode: **illusory portability**. ### 10.5 Retention and propagation discipline - Retention MUST be time- or purpose-bound with visible clocks - Deletion/revocation MUST propagate with auditable receipts Failure mode: **retention without sunset / partner sprawl**. ### 10.6 Inference governance - High-risk inferences MUST be disclosed, bounded, and contestable - Inferences MUST support correction or attenuation where applicable Failure mode: **inference without accountability**. ### 10.7 Auditability of use - Participants MUST be able to inspect significant access, transfer, and inference events - Systems MUST provide logs or summaries sufficient for contestation Failure mode: **opaque processing**. ### 10.8 Interoperability honesty - Systems MUST state what is preserved, degraded, or non-transferable across boundaries Failure mode: **interop deception**. --- These conditions define the **minimum viable data sovereignty layer** of the Meta-Layer. Partial implementations that omit purpose enforcement, consent propagation, lineage, or propagation discipline MUST NOT be considered aligned with DP4. ## 11. Open Questions and Future Work DP4 surfaces unresolved design challenges that require further work: - how to standardize interoperable privacy profiles across tools and zones - how to represent retention, revocation, and transfer states in ways ordinary participants can understand - how to govern model memorization and partial deletion in training-intensive systems - how to balance privacy-enhancing technologies with the need for accountability and community oversight - how to manage joint controllership and downstream processor responsibility in federated ecosystems - how to handle collective consent for shared datasets or community archives - how to distinguish emergency access from routine surveillance creep - how to make privacy interfaces usable for low-literacy, multilingual, and neurodiverse participants - how to audit third-party SDK ecosystems without reproducing surveillance in the name of governance - how to preserve interoperability while allowing communities to adopt materially stricter norms These are not reasons to delay better defaults. They mark the frontier where DP4 must mature through practice, governance, and implementation evidence. ## 12. Relationship to Other Desirable Properties DP4 is foundational and interdependent. - **DP1** anchors accountability for who accessed, inferred, retained, or transferred what - **DP2** ensures participants have meaningful agency over data flows and delegation scopes - **DP5** supports identifiers and namespaces that allow identity and reputation to remain portable without forcing correlation - **DP6** depends on commerce patterns that do not require surveillance as the default price of participation - **DP7–DP10** shape whether sovereignty is actually usable through interface quality, comprehension, and incentives - **DP11** depends on bounded visibility into what AI systems may see, remember, infer, and influence - **DP12** provides the community processes through which stricter privacy norms can be defined and revised - **DP13** enforces containment over automated access, training pathways, and runtime data exposure - **DP14–DP15** reinforce provenance, auditability, and transparency of data use and policy change - later governance and ownership properties depend on collective data practices that do not silently strip participants of control A failure in DP4 propagates upward. If data conditions are opaque, later governance becomes symbolic, ethical AI becomes ungrounded, and participant agency becomes procedural rather than real. ## 13. Path Toward ML-RFC Progression from draft to RFC-grade maturity would require: - stable invariants around purpose binding, minimization, meaningful portability, and propagation honesty - a clearer grammar for privacy profiles, retention clocks, consent stacks, and audit events - implementation evidence from zones with different stakes and governance norms - demonstrable ways for communities to adopt stricter data rules without breaking interoperability - better treatment of partial deletion, inference correction, and training exclusions - alignment with privacy law and privacy-enhancing technologies without waiting for legal perfection before improving defaults The goal is not to freeze one final model of privacy. It is to establish durable conditions under which sovereignty claims can be tested, challenged, and improved. ## 14. Closing Orientation DP4 is where the meta-layer rejects the old bargain of convenience in exchange for invisibility. Sovereignty is not achieved when a participant is merely informed that data extraction may occur. It is achieved when the participant and the communities they inhabit can see the operative terms of data use, shape those terms where appropriate, withdraw from them in meaningful ways, and leave without losing the structure of their digital life. When DP4 is strong, trust in governance, AI, commerce, and collaboration becomes plausible. When DP4 is weak, every higher-order property is forced to fight against a substrate that quietly converts participation into extraction. If you want, I can also turn this into a numbered ML-Draft format with Status, Path Toward ML-RFC metadata, and house-style alignment to match your DP11–13 documents exactly.