A research team from the LKS Faculty of Medicine at The University of Hong Kong (HKUMed) has developed more efficient CRISPR-Cas9 variants that could be useful for gene therapy applications. By establishing a new pipeline methodology that implements machine learning on high-throughput screening to accurately predict the activity of protein variants, the team has expanded the capacity to analyse up to 20 times more variants at once without needing to acquire additional experimental data, which vastly accelerates the speed in protein engineering.
The pipeline has been successfully applied in several Cas9 optimisations and engineered new Staphylococcus aureus Cas9 (SaCas9) variants with enhanced gene editing efficiency. The findings are now published in Nature Communications and a patent application has been filed based on this work.
Staphylococcus aureus Cas9 (SaCas9) is an ideal candidate for in vivo gene therapy owing to its small size that allows packaging into adeno-associated viral vectors to be delivered into human cells for therapeutic applications. However, its gene-editing activity could be insufficient for some specific disease loci.
Before it can be used as a reliable tool for the treatment of human diseases, further optimisations of SaCas9 are vital within precision medicine. These optimisations must comprise the boosting of its efficiency and precision by altering the Cas9 protein.
The standard protocol for modifying the protein involves saturation mutagenesis, where the number of possible modifications that could be introduced to the protein far exceeds the experimental screening capacity of even the state-of-art high-throughput platforms by order of magnitude.
In their work, the team explored whether combining machine learning with structure-guided mutagenesis library screening could enable the virtual screening of many more modifications to accurately identify the rare and better-performing variants for further in-depth validations.
The machine learning framework was tested on several previously published mutagenesis screens on Cas9 variants and the team was able to show that machine learning could robustly identify the best performing variants by using merely 5-20% of the experimentally determined data.
The Cas9 protein contains several parts, including protospacer adjacent motif (PAM)-interacting (PI) and Wedge (WED) domains to facilitate its interaction with the target DNA duplex. The research team married the machine learning and high-throughput screening platforms to design activity-enhanced SaCas9 protein by combining mutations in its PI and WED domains surrounding the DNA duplex bearing a (PAM). PAM is crucial for Cas9 to edit the target DNA and the aim was to reduce the PAM constraint for wider genome targeting whilst securing the protein structure by reinforcing the interaction with the PAM-containing DNA duplex via the WED domain.
In the screen and subsequent validations, the researchers identified new variants, including one named KKH-SaCas9-plus, with enhanced activity by up to 33% at specific genomic loci. The subsequent protein modelling analysis revealed the new interactions created between the WED and PI domains at multiple locations within the PAM-containing DNA duplex, attributing to KKH-SaCas9-plus’s enhanced efficiency.
Until recently, structure-guided design has dominated the field of Cas9 engineering. However, it only explores a small number of sites, amino-acid residues, and combinations. In this study, the research team was able to illustrate that screening with a larger scale and less experimental efforts, time and cost can be conducted using the machine learning-coupled multi-domain combinatorial mutagenesis screening approach, which led them to identify a new high-efficiency variant KKH-SaCas9-plus.
The Assistant Professor of the School of Biomedical Sciences, HKUMed stated that this approach will greatly accelerate the optimisation of Cas9 proteins, which could allow genome editing to be applied in treating genetic diseases more efficiently.