Calibrating Large Language Models Using Their Generations Only

Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

Abstract

We can’t trust large language model (LLM) outputs. One of the reasons is that it doesn’t always generate reliable confidence estimates. One could look into the model likelihoods, but even that is infeasible for many black-box models. We show here that it’s possible to train a lightweight external model to infer an LLM’s internal confidence based only on the prompt and answers from the LLM (purely black box).

Publication

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics 2024

Links

arXiv PDF RTAI