Calibrating Large Language Models Using Their Generations Only

Abstract

We can’t trust large language model (LLM) outputs. One of the reasons is that it doesn’t always generate reliable confidence estimates. One could look into the model likelihoods, but even that is infeasible for many black-box models. We show here that it’s possible to train a lightweight external model to infer an LLM’s internal confidence based only on the prompt and answers from the LLM (purely black box).

Publication
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics 2024