Drug Discovery using AlphaFold, Neural Networks, and Docking Algorithms


Details
"Drug Discovery using AlphaFold, Neural Networks, and Docking Algorithms" -by Anirudh Venkatraman, Homestead High School
AGENDA:
Pre-register on Zoom:
6:45 Connection to Zoom chat with speaker and organizers
7:00 SFbayACM intro, upcoming events, introduce the speaker
7:10 presentation starts (~90 min with Q&A)
YouTube Live: and insurance industry
ABSTRACT:
From 2020 to 2022, I embarked on two different drug discovery projects. The first project dealt with small molecule drugs, using machine learning to expedite processes within computational chemistry. On the second project, I worked on protein therapeutics, particularly nanobodies. Unlike the small molecule drugs, synthetic nanobodies and antibodies are more easily accepted by the body since the mimic our immune system. Discovering effective nanobodies remains a time consuming process, mostly involving extensive wet-lab procedures. I was to develop biocomputational techniques to accelerate the determination of nanobodies with higher specificity and affinity to a pathogenic target, reducing costs and time-to-market and thereby saving lives.
The goal for the first project (2020-2021) was to leverage deep learning techniques to develop machine learning models to systematically generate new compounds and estimate their binding
For the project’s first phase, I trained a recurrent neural network training on 100,000 drug-like molecules and generating ~2000 unique and valid molecules, verified with Python’s rdkit library. In phase two, I built a convolutional neural network, training on the BindingDB ataset, to determine how well drugs interact with target proteins.
The model selected the best 500 drugs based on their IC50 value (drug quantity necessary to suppress protein’s function by 50%). Subsequently I ran a differential evolution algorithm to tweak each molecule to minimize its IC50, maximizing its effectiveness against the given protein.
For phase one, the RNN trained for over 200 epochs to 98% validation accuracy and predicted valid molecules 62% of the time. For phase two, the CNN trained for over 200 epochs to 0.57 𝑅2 , and predicted the correct IC-50 score to within 50% of its actual value.
Testing my entire algorithm to predict a drug that binds most effectively to SYK Kinase, PI4KA (protein secreted by Hepatitis C), Histone deacetylase 1, and Glutaminase kidney isoform, I found the four drugs generated by my model’s predictions were all below 25 nanomolars for IC50.
For the second project (2021-2022), on protein therapeutics, particularly nanobodies that retain the binding affinity of antibodies despite being one-tenth the size, are emerging as viable treatment optionsto counter the mushrooming viral variants.
My unique computational approach used ensemble-stacking to develop nanobodies by sequencing the CDR-H3 region of nanobodies - sequence of amino acids in the variable binding region critical to determining specificity. In a search space of about 2020 different CDR-H3 sequences, the model predicted 50 effective sequences against an antigen target, by training on previous phage display data to ccurately predict and rank the affinity of CDR-H3 sequences.
Ensemble-stacking achieved highly accurate predictions of binding affinity, with R2 of 0.75 and a Pearson correlation coefficient of 0.87 in addition to a classification AUROC of 0.965. I tested the top 50 algorithm-generated nanobodies by using 3D computational methods to measure their binding affinity. My approach yielded nanobody candidates with a lower Gibbs Free energy value (a measure of stronger specificity and binding) of -10.24 kcal/mol compared to the best in vitro derived candidates (-9.65 kcal/mol).
Results indicate that biocomputational methods using ensemble-stacking can be successfully leveraged to quickly and accurately generate nanobody candidates thereby accelerating development of successful vaccines.
BIO:
Anirudh Venkatraman is a senior at Homestead High School who enjoys music, science, and cooking. This past year, he won 4th place at the International Science and Engineering Fair for his project De Novo Nanobody Design Using Neural Networks, AlphaFold, and Docking Algorithms, in addition to numerous other awards at the state and local levels. Anirudh has a passion for the field of bioinformatics and in particular, drug discovery, which he hopes to pursue in college. Apart from science fair competitions, Anirudh performs cell-based research at the Molecular Screening Shared Resource at UCLA under the guidance of Professor Robert Damoiseaux. In his free time, Anirudh loves to play the drums and cook elaborate dishes with his sister and cousin.

Drug Discovery using AlphaFold, Neural Networks, and Docking Algorithms