
We introduce SelfReflect, an information-theoretic metric measuring how faithfully a summary string reflects an LLM’s internal answer distribution. Across interventional and human studies, SelfReflect is sensitive to small deviations and reveals that modern LLMs generally fail to communicate their uncertainties. Faithful summaries emerge when sampling multiple outputs and summarizing them in-context.