Google LLC

07/16/2024 | Press release | Distributed by Public on 07/16/2024 10:23

How we built AlphaFold 3 to predict the structure and interaction of all of life’s molecules

That meant making a database with all the capabilities would have been impossible. Instead, we've released AlphaFold Server, a free tool that lets scientists plug in their own sequences that AlphaFold can then generate molecular complexes for. Since launching in May, researchers have already used it to generate over 1 million structures.

"It's like Google Maps for molecular complexes," says Lindsay Willmore, research engineer at Google DeepMind. "Any user who doesn't know how to code at all can just copy and paste the sequences of their proteins, DNA, RNA or the name of their small molecule, press a button and wait a few minutes. Their structure and the confidence metrics will come out so that they're able to look at and evaluate their prediction."

In order to get AlphaFold 3 to work with this much wider range of biomolecules, the team vastly expanded the data that the newer model was trained on to include DNA, RNA, small molecules and more. "We were able to say, 'Let's just train on everything that exists in this dataset that helped us so much with proteins and let's see how far we can get,'" Lindsay says. "And it turns out we can get pretty far."

Another major change in AlphaFold 3 is a shift in architecture for the final part of the model that generates the structure. Where AlphaFold 2 used a complex custom geometry-based module, AlphaFold 3 uses a generative model that's based on diffusion - similar to our other cutting-edge image generation models, like Imagen - which greatly simplified how the model handles all the new molecule types.

That shift led to a new issue, though: Since so-called "disordered regions" of proteins weren't included in the training data, the diffusion model would try to create an inaccurate "ordered" structure with a defined spiral shape, instead of predicting disordered regions.

So the team turned to AlphaFold 2, which is already extremely good at predicting which interactions would be disordered - which look like a pile of chaotic spaghetti - and which ones were not. "We were able to use those predicted structures from AlphaFold 2 as distillation training for AlphaFold 3, so that AlphaFold 3 could learn to predict disorder," Lindsay says.

"We have a saying: 'Trust the fusilli, reject the spaghetti,'" adds Jonas.