The Path to AI-driven Drug Discovery - Part 3: AI to learn from lab data

In this blogpost series, we have so far outlined the process of drug discovery (Part 1) and taken a deep dive into the potential of AI-driven drug discovery (Part 2). In this third part, we show how AI tools, such as DeepMirror Chem, can fast-track affinity optimisation during hit-to-lead and lead optimisation in two real world cases. For more details, contact us to request our white paper!

AI to fast-track affinity optimisation in drug discovery

AI has shown great promise in fast-tracking drug discovery (Part 2). In the final blogpost of this series, we wanted to further explore with real data how AI can accelerate drug discovery.

As we’ve previously seen in Part 1 of the series, the first steps in drug discovery are i) identifying a biological target (related to a disease) (Target ID) and ii) finding a molecule that can bind with high affinity to it (Hit ID and H2L). We focused our attention on two targets: the main protease in SARS-CoV2, the virus that causes COVID-19, and HIF2a, a protein involved in cancer (Fig. 1). To identify which molecules can bind to these targets, a drug discovery researcher needs 1) to design an experiment selecting which molecules to test, and 2) test them in the lab to measure their affinity against a target. We called this an ‘experimental cycle’, in other words, the steps involved in each ‘batch’ of experiments (Fig 1).

Testing for affinity is carried out by measuring IC50, or the concentration required to reduce a target’s activity by half. The lower the IC50, the lower the amount of drug required to exert an effect on the target, therefore the higher the affinity. In other words, lower IC50 means higher affinity.

Experimental cycles of design and testing of compounds required to identify new drugs against a disease target — **Figure 1: We simulated drug discovery in search for a molecule against SARS-CoV2 and cancer.** We focused our attention on two targets, the main protease in SARS-CoV2 and HIF2a. The main protease in SARS-CoV2 is indispensable for replication of coronaviruses and blocking it (with an inhibitor for example) stops the virus from propagating and therefore the progression of the disease. HIF2a is an important protein involved in several aspects of cancer such as proliferation. Inhibiting overactive HIF2a has the potential to restrain cancerous cell growth. The initial stages of drug discovery involve identifying a molecule that can bind with high affinity to a biological target. Therefore, researchers must 1) design an experiment by selecting which compounds to test and 2) test them in the lab to measure affinity against the target. IC50, the concentration required for half maximal activity, is a measure of affinity (low IC50 = high affinity).

‍The COVID Moonshot and HIF2a inhibitors datasets

To simulate experimental cycles, we identified small molecules we could simulate testing. We curated two datasets containing lists of small molecules and their experimentally measured affinity (IC50) against the main protease in SARS-CoV2 and HIF2a. These datasets were the COVID Moonshot project dataset, and the results from a HIF2a inhibitor patent.

The COVID Moonshot project is an open-science, collaborative project spanning 150 scientists. It was born from a ‘twitter storm’ with the aim of quickly developing an anti-viral drug against COVID-19 (DNDi, COVID Moonshot). The dataset contains 2,062 molecules and their lab measured affinity (IC50). One of these molecules has successfully progressed into pre-clinical development. However, we didn’t know the identity of this molecule, so we defined ‘successful drug candidates’ as molecules with very high affinity with an IC50 < 0.05uM; 10 molecules in total.

The second dataset is a patent containing a collection of inhibitors against HIF2a and their lab measured IC50 (patent US-9908845-B2). The dataset contains 326 molecules of which 2 were selected for pre-clinical development because they had an IC50 < 0.01uM and good drug-like properties (which we will explore in a future case study).

Conventional vs AI-assisted drug discovery

To assess the effect of AI on the speed of drug discovery, we compared a simulated scenario where researchers did not use AI to design experiments (Conventional) (Fig 2, Scenario A), to a second simulation where they used AI to aid in designing experiments based on previous lab results (AI-assisted, powered by DeepMirror Chem) (Fig 2, Scenario B). To simulate testing of small molecules in the lab, we started with a dataset containing only the small molecules. Each time a molecule was selected for testing, we labelled its experimentally measured IC50 in the dataset. In both cases, we calculated the number of experimental cycles required to find (select for testing) real drug candidates.

Conventional drug discovery cycles versus AI-assisted (DeepMirror Chem) drug discovery cycles — **Figure 2: Accelerating drug discovery with AI.** We simulated two scenarios: A) Conventional drug discovery: we designed experiments by selecting molecules at random and testing these molecules. B) AI-assisted drug discovery: we designed experiments by using DeepMirror Chem to learn from the results of previous experimental cycles, predict IC50 and design the next experimental cycle from the molecules with highest predicted IC50.

In conventional drug discovery, researchers decide on a shortlist of molecules to synthesise and test. We approximated conventional drug discovery by selecting sets of molecules at random for each cycle (30 for HIF2a inhibitors and 50 for COVID Moonshot) and subsequently testing these molecules. This step could be more complex, as research teams could include an expert medicinal chemist, who could help the small molecule selection with the intuition they built over many years.

In AI-assisted drug discovery, the same experimental cycle takes place, but researchers use the results of each cycle to systematically learn and inform the design of the next. To simulate AI-assisted drug discovery, we used DeepMirror Chem to suggest the first set of molecules to test. We used the results of these tests to predict the IC50 of the remaining untested molecules and selected the molecules with highest predicted affinity for the next round of testing. We kept repeating this process: selecting the highest predicted affinity molecules, testing them, and providing this new information to our platform. With every round, the algorithm had more information with which to better predict IC50.

For both datasets and both approaches (AI-assisted and Conventional), we stopped this iterative process once we found at least 5 (COVID Moonshot) or at least 2 (HIF2a inhibitors) high affinity (low IC50) molecules in the datasets.

2-4x acceleration of drug discovery

In our simulations, researchers assisted by AI (using DeepMirror Chem) could identify high affinity molecules 2-4 times quicker when compared to conventional drug discovery (Fig 3). At least 5 high affinity molecules against the main protease in SARS-Cov2 were found within 5 experimental cycles, while it took an average of 20 cycles for conventional drug discovery simulations (Fig 3a).

Data showing AI (DeepMirror Chem) accelerates drug discovery by 2-4x — **Figure 3: AI accelerates drug discovery by a factor of 2 to 4 when using DeepMirror Chem. a.** Mean experimental cycle required to identify at least 5 high affinity (IC50<0.05mM) compounds in COVID Moonshot Project simulating conventional and AI-assisted drug discovery (using DeepMirror Chem). n molecules = 2,062. b. Mean experimental cycle required to identify both high affinity (IC50<0.01mM) pre-clinical candidates in HIF2a inhibitor patent simulating conventional and AI-assisted drug discovery (using DeepMirror Chem). n molecules = 326. For both datasets, conventional drug discovery simulations = 500, AI-assisted simulations = 5.

Similarly, DeepMirror Chem could find both HIF2a inhibitor candidates 2 times quicker than conventional drug discovery simulations (Fig 3b). On average, it took DeepMirror Chem 5 cycles to identify both pre-clinical candidates, versus an average of 11 cycles for conventional drug discovery simulations.

Take home message

The application of AI in drug discovery is already showing immense promise in reducing the discovery times of novel candidates. However, life sciences companies currently rely on i) creating partnerships with AI companies, which requires extensive management and communication, or ii) building internal teams and platforms, which is expensive and time-consuming (Part 2). In our previous blogpost, we identified what sets DeepMirror apart from AI consultancies or partnerships: we provide access to AI-driven drug discovery from day one via no-code and intuitive software. No need to engage with external stakeholders, build internal teams, or wait for tool development.

Hit-to-Lead, the stage at which affinity is most improved during drug discovery (See Part 1) requires on average 1.5 years and $2.5M, and Lead Optimisation requires 2 years and $10M (Paul et al., 2010). Together, that’s on average 3.5 years and $12.5M. AI has the potential to halve (or even reduce to a quarter!) the amount of time and cost invested in these drug discovery efforts. In this final blogpost of the series, we demonstrate, with real-world datasets, the increase in speed that AI enables compared to a conventional approach by using DeepMirror Chem to optimise small molecule affinity. DeepMirror Chem allows researchers to reap the benefits of adding AI into a pipeline without the worries of managing a partnership, building a team, or waiting for tool development, at a fraction of the cost.

If you want more technical detail about the inner workings of this case study or want to learn more about DeepMirror Chem, get in touch!

What about other properties?

Do you want to predict properties other than affinity, but do not have the experimental data yet? Using DeepMirror Chem, users can access certified models trained on our proprietary databases. Certified models can be used to predict small molecule properties such as solubility (lipophilicity), toxicity or metabolism (ADME) without any experimental data!

References

DNDi, COVID Moonshot Project https://dndi.org/research-development/portfolio/covid-moonshot/

Hif2a dataset, patent US-9908845-B2 https://pubchem.ncbi.nlm.nih.gov/patent/US-9908845-B2

Paul, S.M., Mytelka, D.S., Dunwiddie, C.T., Persinger, C.C., Munos, B.H., Lindborg, S.R., Schacht, A.L., 2010. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nature Reviews Drug Discovery 2010 9:3 9, 203–214. https://doi.org/10.1038/nrd3078

Learning

Top Drug Discovery Software Solutions to Watch in 2025

Max