Exploring AI Drug Discovery 💡
“ChatGPT and other Generative AI platforms will have huge implications for business productivity.” By Hendrith Vanlon Smith Jr, CEO of Mayflower-Plymouth, Business Essentials
“I’m not a fan of going out with extravagant promises about how we’re going to transform everything within three years. Human biology is hard: intervening in something that can cause someone to die is a very risky proposition.” By Daphne Koller
From Financial Times: “Will AI turbocharge the hunt for new drugs?”
Generative AI Drug Discovery
According to a McKinsey report (Download 📩 the report here: “The economic potential of generative AI: The next productivity frontier”), across pharma and medical products generative AI has the potential to generate $60-110 billion (that is a 3-5% of the total industry value which is $2.6 trillion to $4.4 trillion across all industries, when by comparison the UK’s entire GDP in 2021 was $3.1 trillion) in the following segments:
Research and Drug Discovery (that rapresents 20% of revenues on R&D, while the development of a new drug takes an average of 10 to 15 years),
Customer Documentation Generation,
Generating Content for Commercial Representatives, and
Contract Generation.
Very briefly, generative AI is powered by very large machine learning models that are pre-trained on vast amounts of data, commonly referred to as foundation models (FMs). A subset of FMs, called large language models (LLMs), are trained on trillions of words across many natural-language tasks.
In particular, generative AI applications such as ChatGPT, GitHub Copilot, Stable Diffusion and others can perform a range of routine tasks, such as the reorganisation and classification of data, write text, compose music and create digital art. For example, ChatGPT was released in November 2022. Four months later, OpenAI released a new LLM called GPT-4 with markedly improved capabilities. Similarly, by May 2023, Anthropic’s generative AI, Claude, was able to process 100,000 tokens of text, equal to about 75,000 words in a minute—the length of the average novel—compared with roughly 9,000 tokens when it was introduced in March 2023. And in June 2023, Orca a Progressive Learning from Complex Explanation Traces of GPT-4 from Microsoft was presented.
Regarding TechBio, Google’s medical LLM called Med-PaLM (built on similar technology to ChatGPT) released last year, harnesses the power of Google’s LLMs aligned to the medical domain to more accurately analyse images and respond to questions. Med-PaLM 2 was the first LLM to perform at an “expert” test-taker level performance on the MedQA dataset of US Medical Licensing Examination (USMLE)-style questions, reaching 85%+ accuracy, and it was the first AI system to reach a passing score on the MedMCQA dataset comprising Indian AIIMS and NEET medical examination questions, scoring 72.3%.
Another example, comes from Nvidia that has the following generative AI Models for Medical Imaging: The Monai Model Zoo (Medical Open Network for AI Imaging models) that hosts a collection of medical imaging models in the MONAI Bundle format. MONAI is an open-source project, it is built on top of PyTorch and is released under the Apache 2.0 license. There are 21 models currently available.
In general, the generative AI models in medical imaging can be used for
Data Augmentation,
Image Enhancement,
Image Reconstruction and Segmentation, and
Anomaly Detection
in areas such radiology 🩻, pathology 🔬🥼🩺, surgical planning 🪡🗓️ and disease progression modelling 📐📏.
While 10 Use Cases of Generative AI in Life Sciences (by Cem Dilmegani) are the following ➡️ :
Novel molecule generation (VAEs and GANs)
Protein sequence design
Synthetic gene design
Data augmentation for model training (create synthetic data)
Imputation of missing data (fill in missing medical data in life science datasets)
Virtual patient generation (create synthetic patient and healthcare data)
Single-cell RNA sequencing (scRNA-seq) data denoising
Image-to-image translation
Text-to-image generation and
Simulating biological processes
Going back to the McKinsey report, the authors recommend that before integrating generative AI into operations, pharma executives should be aware of some factors that could limit their ability to capture its benefits:
The need for a human in the loop. Companies may need to implement new quality checks on processes that shift from humans to generative AI, such as representative-generated emails, or more detailed quality checks on AI-assisted processes, such as drug discovery. The increasing need to verify whether generated content is based on fact or inference elevates the need for a new level of quality control 🚦.
Explainability. A lack of transparency into the origins of generated content and traceability of root data could make it difficult to update models and scan them for potential risks; for instance, a generative AI solution for synthesising scientific literature may not be able to point to the specific articles or quotes that led it to infer that a new treatment is very popular among physicians. The technology can also “hallucinate,” or generate responses that are obviously incorrect or inappropriate for the context. Systems need to be designed to point to specific articles or data sources 📚, and then do human-in-the-loop checking.
Privacy considerations. Generative AI’s use of clinical images and medical records could increase the risk that protected health information will leak, potentially violating regulations that require pharma companies to protect patient privacy 🔏🙅♂️.
📌 For more about A.I, robotics, quantum computing, Web3, investing, venture capital, startups, business and technology trends across all industries follow 👉 @Michael Spencer and subscribe to
LLMs and antibodies
BioStrand (a subsidiary of ImmunoPrecise Antibodies Ltd or IPA) a company at the intersection of biotech discovery, biotherapeutics and AI, has a patented LENSai™ Integrated Intelligence Technology, powered by HYFTs®, that seamlessly integrates massive data from diverse data sources, to enhance LLMs and accelerate antibody discovery 🔎🔬.
In particular, by analysing diverse data sources LENSai can identify new antibody targets and predict binding affinities, as well as design and optimise antibody sequences and predict potential side effects, immunogenicity and therapeutic applications. Then, the HYFT technology powered by ML and NLP, uncovers meaningful patterns, sequences, connections, and insights from data, empowering scientists to make faster and more informed decisions. Moreover, HYFTs BioStrand’s universal language for biological data, can unify, organise and standardise all omics data, which enables scientists to collaborate and perform more efficient and accurate analyses.
On March 16, 2023, IPA announced the financial results for the third quarter fiscal year 2023, which ended January 31, 2023 and highlighted regarding BioStrand the following:
BriaCell’s AI-based drug discovery program, is now using BioStrand’s LENSai to analyse and prioritise potential drug targets for cancer treatment. BriaCell is a clinical stage immunotherapy company developing treatments that destroy ☠️ cancerous tumours by boosting the body's own cancer fighting cells.
The first monetisation 💰 of BioStrand’s NLP and immunogenicity modules is a fact, allowing pharmaceutical companies to extract and analyse data from scientific publications and identify potential immunogenicity issues with drug candidates. And
BioStrand's SAAS-based data management platform (allowing researchers to store, manage and analyse large amounts of genomic, transcriptomic and proteomic data using AI and HYFTs technology) has been awarded a first-round RFP (Request for Proposal) by an undisclosed pharmaceutical company. If awarded 🥇, this program is expected to help the pharmaceutical company to accelerate its drug discovery process and improve its R&D efficiency.
BioStrand (Belgium 🍫) that raised a total of €2.7M in funding over 3 rounds was acquired last year by IPA (ImmunoPrecise Antibodies, Canada 🍁) for €20M.
Quantum AI for the exploration of the chemical space
Entos.AI ‘s platform combines innovations in AI-driven molecular design and high-throughput experimentation. Entos Orbnet quantum AI offers 100x faster Chemical exploration, High-throughput chemistry with 1000x faster synthesis and High-throughput biology with 1000x faster assays.
On April 18, 2023, Entos presented the preclinical data for the company’s first clinical candidate, the ENT-H1 a highly differentiated irreversible HER2 inhibitor that exhibits over 1000-fold selectivity against EGFR, generated using its proprietary AI-driven drug discovery platform.
Entos (a generative AI startup to watch 👀) was selected by Forbes Magazine for the 2022 list of groundbreaking AI companies, has a Caltech-NVIDIA collaboration (using NVIDIA GPUs, NVIDIA SDKs TensorRT and RAPIDS) and has raised so far $53M.
Other Generative AI Drug Discovery Startups to Watch 👀
Alchemab by using ML and LLMs to read and process an antibody’s amino acid sequence is focusing on the discovery and development of naturally occurring protective antibodies for hard to treat diseases.
On June 22, 2023, Alchemab has announced data relating to its ATLX-1088 a first-in-class human antibody targeting CD33, a cell surface protein, which is understood to play a significant role in Alzheimer’s disease. Alchemab Therapeutics has raised a total of $85M in funding over 4 rounds.
Etcembly’s T-cell receptor design platform is powered by NVIDIA GPUs and uses generative AI approaches to produce candidates for T cell receptor (TCR) immunotherapies.
On December 14, 2022, Etcembly has announced the prediction and optimisation of multiple TCR assets for targeting leukemias and solid tumours.
Evozyne combines engineering and DL technology to design highly functional, synthetic proteins. They use the NVIDIA BioNeMo for large language models that generate high-quality proteins that can speed drug design.
On January 12, 2023, it was announced that by using the pre-trained AI model from Nvidia BioNeMo, Evozyne created two proteins with significant potential in healthcare and clean energy. Evozyne has raised a total of $63.4M in funding over 2 rounds.
Innophore, a deep-tech drug and enzyme discovery company, is using its Catalophores™ platform, powered by NVIDIA’s BioNeMo service with its product Cavitomix, a tool that allows users to analyse protein cavities from any input structure.
BioNeMo for drug discovery is part of the NVIDIA Clara, a suite of software and services that powers AI healthcare solutions, that includes also Holoscan for medical devices, Parabricks for genomics and MONAI for medical imaging.
New Equilibrium Bioscience has built an hybrid AI-experimental platform to catalyse drug discovery for the undrugged class of intrinsically disordered proteins, by using generative AI to seed simulations of protein dynamics, AI to model protein physics with quantum chemical accuracy and ML to identify transient, druggable conformations of these experimentally-invisible proteins, and all powered by NVIDIA GPUs. New Equilibrium Biosciences has raised a total of $10.5M in funding over 4 rounds.
Peptilogics’ Nautilus™ is a generative AI platform that enables peptide drug design and lead optimisation, running on Peptilogics’ in-house N4 supercomputer featuring NVIDIA GPUs. Nautilus integrates Peptilogics’ proprietary peptide representation and generative algorithms with computational chemistry and biophysics to reduce the cost, time and risk of drug design.
On August 15, 2022, EpiAxis Therapeutics, a leading epigenetics company for the treatment, diagnosis and monitoring of cancer and the prevention of its recurrence, and Peptilogics announced that they have entered a collaboration to leverage AI for drug discovery to inhibit epigenetic oncology targets. Peptilogics has raised a total of $44.1M in funding over 7 rounds
Peptone, a computational biology company, uses an NVIDIA-powered supercomputer powered to apply physical modelling methods and ML algorithms to find the most probable conformations of specific disordered proteins.
On 9 June 2022, Peptone closed a $40M Series-A funding round led by F-Prime and Bessemer Venture Partners (for a total of $42.4M).
Relation Therapeutics uses GPU accelerated generative ML techniques to combine data from human genetics, single-cell omics, functional genomic and ML in a single, engineered design to transform drug discovery (ActiveGraph ML). Relation is also pioneering a “Lab-in-the-Loop” that can integrate active learning at every step of drug discovery, from predicting cell states to the validation of new targets.
On February 22, 2023, Relation announced the opening of their flagship integrated wet–dry laboratory and offices, that comprise 5,500 sq ft, located at the heart of London’s Knowledge Quarter. Relation Therapeutics has raised a total funding of $25M over 2 rounds.
Vyasa, provides a suite of DL-based applications to help customers accelerate analytics across the drug discovery pipeline from small compound analysis to clinical trial design and research to regulatory preparation.
Certara (a global leader in biosimulation and software solutions helping the entire drug development lifecycle) acquired Vyasa.
Variational AI uses state-of-the-art ML in a data-efficient method to rapidly generate novel and diverse compounds that are optimised for multiple properties to avoid the most common causes of drug attrition. Variational AI works with leading biopharmaceutical partners, is developing its own internal pipeline and uses NVIDIA GPUs to train its generative AI platform, Enki, which is a variational autoencoder that generates de novo small molecule therapeutics. Variational AI has raised a total of $3.7M in funding over 2 rounds.
📌 The Best 🥇🎖️ Generative AI Drug Discovery company so far is Insilico Medicine offering an AI-driven, NVIDIA GPU-accelerated drug discovery platform. One of its offerings, Chemistry42, is powered by generative AI to identify drug-like molecular structures and suitable physicochemical properties. For more:
Insilico Medicine's A.I. Drug Discovery Progress
By AI Supremacy
By Forbes
NVIDIA Webinar 📺
Accelerating Gene Variant Detection With Deep Learning Webinar by NVIDIA.
Join on July 17, 2023, Dr. Tychele Turner, Principal Investigator and Assistant Professor from the Washington University School of Medicine in St. Louis, as she shares her lab's development of a computational, hybrid CPU/GPU workflow called HAT, to detect de novo variants from whole-exome and whole-genome sequencing datasets. She’ll focus on areas where genomics tools benefit best from GPU-based acceleration, including her lab’s use of NVIDIA® Parabricks®, a suite of GPU-accelerated and deep learning industry-standard genomic analysis tools for next-generation sequencing data.
By attending this webinar, you’ll learn:
What a de novo variant is and how it’s detected from sequencing data
Key uses of deep learning-based variant calling to detect de novo variants
Key differences between short-read and long-read sequencing
Key uses of GPU-based acceleration to assess long-read sequencing data
The webinar is presented by Dr. Tychele Turner and Harry Clifford, Head of Genomics Products, NVIDIA. 👉 Register.
📌 The Decoding Bio Snapshot 2023 (by
) is a multi-contributor effort to contextualise the underpinning shifts in culture, scale + automation, AI/ML, and data accessibility and what these mean for the broader therapeutic and synthetic biology verticals (Link 🔗 🖇️ to the full report here.)The snapshot aggregates learnings around areas such as AI-driven design, Infra for Bio, Extreme Biology, Target Discovery, Screening, Biomanufacturing and provides contextual examples of companies building in those verticals. Over forty companies generously contributed their own companies overviews, info on pipelines, and commercial traction.
, Andrew Pannu, Patrick Malone, Shelby Newsad , Jesse Johnson , David Li and Luis Voloch.Molecular Modelling
The
, a community of like-minded scientists sharing knowledge to accelerate the pace of innovation in AI-enabled drug discovery, has just release Graphium, the open-source library for training molecular GNNs at scale, into the datamol.io ecosystem that is an open-source toolkit that simplifies molecular processing and feateurization workflows (featurization is the set of techniques that are used to obtain new information from pre-existing data in a dataset) for ML scientists in drug discovery.According m2d2, Graphium (Graphium: A Powerful and Flexible Python Library for Training Molecular GNNs at Scale) stands out, because it’s designed for graph representation learning on real-world chemistry tasks. Graphium has rich and expressive built-in molecular featurizers and provides access to SOTA GNN architectures via an extensible API. By using Graphium you can easily implement the best/recent GNN models via a configuration file, with the degree of flexibility necessary for research.
Until next time ☀️🌞😎,