MIT has created machine learning algorithms, the MIT collaborative AI Lab, and Tufts University that can design novel proteins with desired structural properties. The researchers implemented an attention-based diffusion model, a form of machine learning model architecture. The algorithms can produce millions of proteins in just a few days, allowing for rapid portfolio growth.
The artificial intelligence model can potentially create novel proteins with improved properties. To make materials with desired mechanical qualities, such as stiffness or elasticity, they created machine-learning algorithms to manufacture proteins with tailored structural features. Materials based on biological inspiration can replace those derived from petroleum or ceramics while leaving significantly less of an environmental impact.
Professor of civil and environmental engineering and mechanical engineering and senior author Markus Buehler says the models’ ability to learn biochemical relationships that control protein formation will allow for creating new proteins that may be useful in novel contexts.
The protein-based material can create wrappings to keep fruits and vegetable fresher longer without compromising their safety for human consumption. The models only take a few days to produce millions of proteins, providing researchers with a wealth of new possibilities to investigate.
“The design space for proteins that have not yet been discovered by nature is so large that it cannot be efficiently organised using pen and paper alone. Instead, you need to decipher the building blocks of life, the amino acids, and the code DNA uses to direct their assembly into proteins. Until we had deep learning, this was impossible,” Buehler explained.
Amino acid chains are folded into three-dimensional structures to make proteins. The protein’s mechanical characteristics are set by the order in which its constituent amino acids are arranged. Even though millions of proteins produced by evolution have been discovered, scientists believe many amino acid sequences have yet to be found.
Two machine-learning models were built to predict new amino acid sequences that fold into proteins that meet structural design targets. Materials with the same mechanical qualities as petroleum- or ceramic-based alternatives but with a lesser carbon impact may be manufactured using the proteins generated.
The deep learning algorithms that can make predictions about the 3D structure of a protein given its amino acid sequence have sped up the process of finding new proteins. However, the opposite challenge has proven even more difficult to solve; forecasting a series of amino acid structures that achieve design targets. Buehler and his co-workers were able to take on this challenging problem thanks to a recent development in machine learning: attention-based diffusion models.
Because a single change in a very long amino acid sequence can completely alter the structure of a protein, attention-based models are essential for protein design. Adding noise to training data and then learning to recover the data by removing the noise is how a diffusion model acquires the knowledge to generate new data. Their ability to produce high-quality, realistic data that may be conditioned to satisfy a set of target objectives to meet a design demand is often superior to competing models.
The researchers fed the models physically impossible design targets to check that the projected proteins made sense. The models impressed them by generating the nearest synthesisable answer rather than developing implausible proteins.
The algorithm for learning can detect latent connections in the natural world. Bo Ni, a Postdoc in Buehler’s Laboratory for Atomistic and Molecular Mechanics and the paper’s lead author, explains that this gives them the confidence to claim that the results of their model are realistic.
As the next step, they intend to make some of the new protein designs in the lab to verify them experimentally. After that, they plan to continue expanding and refining the models to create amino acid sequences that fulfil more criteria, such as biological roles.