It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Abstract

Contextual Integrity (CI) defines privacy as governing information flows by contextual norms. LLMs deployed as personal agents remain unreliable at CI disclosure decisions, and existing mitigations degrade task utility. SELFCI is a self-distillation framework that decouples information suppression from task resolution. It jointly optimises two reverse KL divergences over distinct teacher distributions: one preserves task-relevant information, the other enforces minimal disclosure. The combination induces a Product-of-Experts target that aligns the policy with the intersection of capability and privacy. SELFCI outperforms competitive baselines such as GRPO without costly external supervision, and the gains extend to agentic workflows with accumulated private context.

Publication
arXiv 2026
Links