← Concept library

Applied LLMs

KTO: Unpaired Preference Learning

KTO aligns language models using only binary good/bad labels per response, avoiding the paired (chosen, rejected) format that makes preference data expensive and brittle to collect.

intermediate · 7 min read · Premium

Collecting preference data is a bottleneck in practice. DPO and RLHF both require pairs - a chosen response and a rejected response for the same prompt. That pairing requirement is harder to satisfy than it sounds: annotators must evaluate two responses simultaneously, and even small timing or framing differences between the two can introduce noise. In many real deployments, you already have logs of model outputs with thumbs-up/thumbs-down ratings, but those ratings were never collected as paired comparisons. They are inherently unpaired.

Kahneman-Tversky Optimisation (KTO), introduced by Ethayarajh et al. at ICML 2024, is built precisely for this regime. It trains directly on binary desirability signals - each example is simply (prompt, response, label) where label is either desirable or undesirable - and matches or exceeds DPO's performance on models from 1B to 30B parameters.

The prospect theory framing

Keep reading with Pro.

You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.

Sign in to save and react.
Share Copied