Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

Abstract

People have tried to check if certain copyrighted material is used by LLMs by analysing their characteristic reactions to it, a task known as Membership Inference Attack (MIA). Research so far has mostly reported negative results, finding barely any statistically significant signals. In our paper, we show that meaningful signals only appear at scale: not in sentences or paragraphs, but at the level of documents and more.

Publication
Annual Conference of the North American Chapter of the Association for Computational Linguistics 2025