AlphaFold2 and the Protein-Folding Problem

In his 1972 Nobel lecture, Christian Anfinsen argued that a protein's amino-acid sequence should, in principle, fully determine its folded three-dimensional structure. That claim launched a fifty-year search for a computational shortcut: read the sequence, predict the shape, skip the months of X-ray crystallography or cryo-EM. The search stalled for decades. Then, at the fourteenth Critical Assessment of Structure Prediction (CASP14) in late 2020, DeepMind's AlphaFold2 scored a median GDT of 92.4 across all targets, a level of accuracy that had never before been reached routinely in the competition. On a scale where 100 is a perfect match to the experimental structure, a score in the low 90s means the prediction is often within the error of the experiment itself, on the order of an atom's width. Assessors described the problem, for a large class of proteins, as substantially solved.

It is worth being precise about what was solved. AlphaFold2 does not simulate folding. It does not integrate physical forces over time to watch a chain collapse into its native state. It is a supervised model that maps sequence-plus-evolutionary-context directly to a static set of atomic coordinates. Understanding how it does that, and where the "static" qualifier bites, is the whole lesson.

AlphaFold2 and the Protein-Folding Problem

Why sequence-to-structure was hard

Keep reading with Pro.