"The true method of knowledge is experiment." William Blake
According to a McKinsey analysis (AI in biopharma research: A time to focus and scale), nearly 270 companies (worldwide) are working right now in the AI-driven drug discovery industry (50% in US), but approximately only a 15% of them have an asset in preclinical development and those with new molecular entities in clinical development (phase I and II) have predominantly in-licensed assets or have developed assets using traditional ☎️ techniques. Moreover, over half the capital invested in this space is concentrated in only ten companies (all based in the US or UK), since it is very difficult for biopharma companies and investors to evaluate AI-driven players.
Always according to the same analysis, AI has already delivered some value across the biotech research chain for:
Hypothesis generation capabilities by combining real-world data (RWD), genomics data and scientific literature through a knowledge graph,
Large-molecule-structure inference, 100 times acceleration in time to generation of protein structures,
Computer vision technology with higher accuracy than classical approaches, harnessing deep-leaning approaches (for instance, convolutional neural networks),
In silico medicinal chemistry using approaches such as molecular property prediction in an iterative screening loop,
In silico chemi-informatics, over 100 times the number of in silico experiments possible compared with previous screening, and faster time for design of compounds,
Drug repurposing for rapid identification of novel indications for existing investigational new drugs or marketed drugs via genomic information and pathways, accelerating time to new treatments for patients, and finally for
Indication finding leveraging genomics, prioritising indications to pursue for novel mechanisms of action (MoAs) and prioritising or deprioritising ongoing programs within clinical plan by stopping low probability of success programs early and reducing patient burden in clinical trials.
In particular, the year in review article AI’s evolving role in drug discovery and development in 2023 provides an overview of AI’s increasing traction in drug discovery and development in 2023. Just to mention a few of the most important news in this space:
January 2023: Absci deploys zero-shot generative AI in antibody design
March 2023: MIT reveals DiffDock, which could support faster, safer drug development
March 2023: Nvidia launches BioNeMo Cloud to boost drug discovery with generative AI
May 2023: FDA addresses AI/ML in drug development
June 2023: Insilico Medicine’s AI drug enters phase 2 study
June 2023: Sanofi reveals plan to put AI at the center of its operations
July 2023: AI-aided drug ulotaront fails phase 3 studies
August 2023: Generative AI tool boasts 79% accuracy in predicting clinical trial outcomes
August 2023: Recursion uses AI to bridge the protein and chemical universe, predicting targets for 36 billion compounds
September 2023: AstraZeneca’s announces AI-based drug discovery pact with Verge Genomics
September 2023: Merck KGaA announces alliances with BenevolentAI and Exscientia for AI-assisted drug discovery
October 2023: Nature notes that AI in drug discovery “needs a reality check”
October 2023: Insmed and Google Cloud collaborate to slash drug development timelines with AI.
But apart from all effort, that undoubtedly has been made, we need also to address what went wrong.
For example, Aaron Daugherty Vice President Discovery at Aria Pharmaceuticals (Symphony™ is Aria's proprietary drug-discovery AI-platform) said 📢 a couple of months ago in the article Artificial intelligence: a great crash of hype into reality:
📌 “Much of the early hype was due to too many companies saying they can use AI to find drugs faster and better, but without acknowledging that high quality lab and clinical science is a prerequisite for ultimate success.”
📌 “Experts in technology and data need an equal seat at the table alongside biologists, chemists, clinicians and others to make an impact. The future is one of integration between all the disciplines necessary to make an effective medicine.”
📌 “All measures point to 2023 being the year where we move from hype and unrealistic expectations into the integration of AI in pharmaceutical discovery that is focused on the problems we are trying to solve with realistic expectations.”
📌 “My hope is that this crash of hype into a new realistic reality does not dampen further investment or interest, because there is still significant progress being made – just not at the level hyped over the last decade. In fact, we are entering an exciting and promising new phase of AI in drug discovery; one underscored by a clearer understanding of the true value of AI in this data-heavy industry: for problem solving.”
Moreover, in the article “AI in Drug Discovery in China, Hype or Hope?” published by EqualOcean, the author points out to the importance of valuable clean data (literature, large public databases and proprietary databases) but also to the importance of the experimental validation of AI algorithms in this data-intensive industry.
He is actually saying that apart the problem we have with the dirty data 🧹🧽 (such as skewed data sets and biased data that the scientists need to explain that to the algorithms), there is also the problem with the experimental validation of the AI algorithms, that is impeded by two aspects:
firstly, there is no universal protocol to follow about what correct validation should be, in another way to say, there is no standard in parallel when AI delivers new compounds, and
the second problem lies in the progress of validation itself.
In other words, we use data to train an algorithm and generate novel AI compounds, and for that reason the data set we use is intrinsically related to the numerical performance and the prospective performance of the method/model itself. Therefore, the so-called validation of the AI algorithms could just be replication of the construction process. Thereby, the scientists need to lead their own way to validate their AI algorithms/AI-generated molecules, leveraging their expertise. At the end of the day, wet-lab experiments, biological assays and clinical trials are also needed to ensure that the AI generated molecules are active both in vivo and in vitro. And that requires time as Alex Zhavoronkov, the founder of Insilico Medicine, told 📢 EqualOcean:
“Unlike in other areas where AI generates pictures, music or text, you get validation almost immediately because you get almost immediate feedback by looking at the picture, listening to music and reading the text. In biology, it is not like that and you have to wait”.
In any case apart the problems with dirty data and experimental validation of algorithms, also in China ⛩️ effort has been made in AI drug discovery and there have been some demonstrated successful cases showing the validation of AI algorithms in drug discovery like with the following companies:
Accutar Biotech (AC0682, AC0176)
Galixir (unknown)
MindRank.AI (MDR001, EMDR001)
Nutshell Therapeutics (NST001, NST004) and
Insilico Medicine (ISM001-055, ISM012-077, ISM004-1057D, undisclosed)
But let's talk about the data mess of this industry.
Try for a moment to imagine the drug development process just as a set of three boxes, that is the discovery box 🟨, the preclinical box 🟦 and the clinical box 🟩.
Each of these boxes is siloed from another, with no linearity and communication throughout the entire process of drug development due to mutual distrust among scientists.
Then, throughout the entire drug development namely the three boxes, each company or lab generates terabytes of data, kept hidden behind a firewall (big data unpublished). Partly, this is due to strict regulatory and compliance standards and partly due to an extremely competitive environment 🥊 in which these companies (or research labs) operate. For that reason, most of this data sits in silos and for a long time companies did not actually consider this data “suitable” for retrospective analysis. Which might be good, we are talking about “dirty” data here.
And it gets even better with the published data.
Apparently, all published data from all the three boxes very often (meaning 50–70% of the times) lack reproducibility (with false negative and false positive results), while the big data such as the omics and the negative results are left unpublished.
If truth be told, there’s an unspoken rule in the pharmaceutical industry:
“Half of all biomedical research produced by scientists— and that it was supposed to be innovative lead-generators for pharma — will ultimately prove false”.
As a matter of fact, regarding the “data mess” and the reproducibility crisis (Biomedical data mining ⛏️) we are in :
“Editors, peer reviewers, and readers should be aware that certain characteristics of the author team, the journal, and the publication might be associated with questionable research practices.”
From
(February 2023)
Let’s go back to the 3 boxes now.
The discovery box is left without regulations, the preclinical box is somehow regulated and the clinical box is completely regulated. Even though discovery (and innovation) usually arrive from the absence of strictly enforced rules and regulations — in fact, antibiotics were discovered by accident — there is a limit to the complete absence of rules. Usually once you have surpassed that limit, “anarchy” 🏴☠️ might start.
In the preclinical box candidate drugs are tested on small animals in cages in some basement and in an aseptic environment, while we all know that humans have a normal life in a normal environment the so-called “exposome,” (air particles, pollutants, viruses, and everything we come into contact with each day). While in the clinical box candidate drugs are tested on humans and well-designed experiments are carried out in laboratories.
Imagine now each of these clinical boxes full of a finite number of tiny smaller boxes. Each smaller box represents a patient or a lab experiment (more or less). For each of these smaller boxes, pharmas have built around them huge protective shields 🛡️ (regulations) in order to monitor every parameter that can affect — and they don’t want that — the millions (even billions) of variables inside each smaller box. But when it comes to the discovery and the preclinical box, the finite number of these tiny smaller boxes (that is, variables of the system) are left almost without regulations, apart some good laboratories practises that scientists should follow.
Moreover, the internal stability of each smaller box is “controlled” by monitoring and stabilising the x, y, z parameters external to the smaller boxes, boxes that on top of that are not always in the same facility. In other words, we don’t have smart real time (real life) monitoring of the a, b, c parameters inside each smaller box (“When the Pharma Giants Met the Tech Giants”) in a closed system (in the same facility). Not to mention that sometimes experiments are done not only in different laboratories but also in different countries, and then all the data is pooled together, making the data mess even a bigger mess.
Hopefully, this problem is going to be effectively resolved by using the smart healthcare wearables (internal sensors) for patients, while real-time logistically smart sensors — monitoring internal parameters while doing experiments in the lab and in a single facility — are being developed right now: robotic closed loop laboratories.
Let’s see a few examples now:
Cellino Bio (CEO & Co-Founder Nabiha Saklayen) has a platform that combines label-free imaging and high-speed laser editing with ML to automate cell reprogramming, expansion and differentiation as a closed loop cell therapy manufacturing company, enabling thousands of patient samples to be processed in parallel in a single facility, combining biology, laser physics, gene editing tools and ML. Strateos, which started as Transcriptic (a SaaS-based biotechnology company providing robotic solutions for biology labs) in June 2019 announced a merger with 3Scan—a digital 3D tissue model specialist—to form the company Strateos. The new company combines the engineering capabilities of the two previous companies to help automate chemical, biological and 3D image analysis in a closed loop robotic laboratory. Strateos that is considered a pioneer in the development of remote access laboratories and lab automation software for life science research, announced the availability of an integrated solution for small molecule discovery programs seeking a faster, automated way to perform Design, Make, Test and Analyse (DMTA) cycles “Closing the Loop from Idea to Data and Accelerating Cycle Times for Faster Drug Discovery”. Celeris Therapeutics has also under construction a robotic wet lab facility for closed-loop drug discovery, IBM Research and Arctoris also are accelerating a closed loop drug discovery and also Tencent's AI drug discovery platform breaks through the closed-loop capability of high-throughput dry and wet experiments.
Finally, just a few words about Hope and Hype in AI Drug Discovery seen as an Endless Feedback Loop.
What best describes the relationship between hope and hype in the AI drug discovery industry is the concept of the Chinese philosophy Yin and Yang, where two seemingly opposite or contrary forces may actually be complementary, interconnected and interdependent in the world, and how they may give rise to each other as they interrelate to one another working together to maintain continuity (in this case jobs and the economy).
In this Yin and Yang ☯️ (or if you like it more “hope 🤞 and hype 😵💫”) AI drug discovery model we are going through right now, everything started when the more conservative establishment of the traditional drug development suppressed independent thought for AI drug development and label it as a threat to its hegemony. In this way, it just created an evolutionary force that drove AI drug discovery pioneers and innovators to push forward change, for example introducing massively AI in drug discovery (and finally hope arrived for everyone).
In the course of that changing process, more or less in the last ten years, the AI innovators had to bring the rest of us along with them, thereby mitigating the disruptive impact of AI innovation and ensuring the benefits were adequately shared.
But, apparently, the AI innovators have pushed ahead too fast, for example by filling the system with too much hope and not enough pragmatism, so hype is in the air. Accordingly, the more moderate part of the society is waiting to fight them back by suppressing more the independent and novel thinking of the new AI paradigm for drug discovery. Eventually, and after a while (namely after a crisis and a lot of people losing their jobs or money), this Yin and Yang ☯️ loop ➰ of hope and hype will start again with a new hope.
Long story short, during this endless puzzle game seen as an endless feedback loop of great expectations and pragmatism 🛞, both parts should co-evolve (human learning for creativity and survival mechanisms), and while they both want exactly the same thing — the good society and the survival of the fittest — they will always disagree 💪 continuously on whether change or technological stagnation is the best thing for our society.
This time they disagree on AI going along like best friends with drug discovery. But, interestingly, while they continuously disagree on AI/ML tools for drug discovery they push forward human learning (… actually we “learn” 🧠 from our failures, struggles, mistakes and disappointments). On top of that, every time two groups disagree in nature a third group usually arrives that has learned from the initial two groups and eventually starts a new loop of hope (and hype 🚀), thus safeguarding continuity while everyone is having a fight 💥….
Thanks,
MetaphysicalCells is a newsletter about Science, Technology and AI Drug Discovery by Marina T Alamanou.
Marina (based in Greece) is a cellular and molecular biologist with over 20 years of experience in academia, startups and multinationals. Her professional experience spans a wide range of areas: Cancer research, Preclinical drug development (small molecules, peptides), Cancer biomarker discovery research, Molecular diagnostics industry experience and Clinical project management. Currently, she is a Life Science Consultant offering solutions throughout product development.