Although huge libraries of drug compounds may hold potential treatments for various diseases, such as cancer or heart disease, ideally, scientists would like to test these compounds against all possible targets. However, conducting such comprehensive screenings takes a lot of work.
Researchers have begun using computational methods to screen those libraries to speed up drug discovery. However, many of those methods also take a long time, as most of them calculate each target protein’s three-dimensional structure from its amino-acid sequence and then use those structures to predict which drug molecules it will interact with.
MIT and Tufts University researchers have now developed an alternative computational approach based on an artificial intelligence (AI) algorithm known as a large language model. These models — such as generative AI— can analyse huge amounts of text and figure out which words are most likely to appear together.
Using this approach, the researchers have achieved the capability to evaluate more than 100 million compounds in a single day, surpassing the capacity of any existing model by a significant margin.
Bonnie Berger, the Simons Professor of Mathematics and leader of the Computation and Biology group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the senior authors of the study emphasises that this study effectively addresses the need for efficient and accurate in silico screening of potential drug candidates. Moreover, the model’s scalability allows for extensive screenings that can assess off-target effects, explore opportunities for repurposing drugs, and determine the impact of mutations on drug binding.
An obstacle to this study lies in the models’ limited ability to eliminate compounds known as decoys. These decoys resemble successful drugs but do not interact effectively with the target protein.
According to Singh, one of the researchers involved in the study, a longstanding challenge in the field is the fragility of these models. Even if a drug or small molecule is slightly different from the true compound in subtle ways, the model may inaccurately predict their interaction.
While some models have been developed to address this fragility, they are typically specialised for specific classes of drug molecules. They are not suitable for large-scale screenings due to the extensive computational requirements.
The researchers utilised the protein model to determine the interactions between specific drug molecules and protein sequences. The proteins and drug molecules were represented numerically and transformed into a shared space through a neural network. The network was trained using known protein-drug interactions, enabling it to learn the association between specific protein features and the ability to bind with drugs without the need to calculate the 3D structure of the molecules.
The model bypasses the requirement for an atomic indication by employing this high-quality numerical representation. Instead, it utilises numerical values to predict the binding potential of a drug. Singh highlights that this approach offers the advantage of avoiding the need for an atomic representation while retaining all the necessary information within the numerical values.
The researchers integrated a training stage using contrastive learning to enhance the model’s resilience against decoy drug molecules. This approach taught the model to differentiate between genuine drugs and imposters.
The researchers are now extending the approach to other drug types, such as therapeutic antibodies. Additionally, this modelling technique holds promise for toxicity screening of potential drug compounds to identify and mitigate any undesired side effects before animal testing.
“Reducing the high failure rates in drug discovery can significantly decrease its cost,” says Singh. This innovative approach represents a notable breakthrough in predicting drug-target interactions and opens up further possibilities for future research to enhance its capabilities.