Artificial intelligence in drug discovery (part 1)
AI tools and startups for primary and secondary screening
Drug discovery is how new medications are discovered and historically drugs were mostly found by identifying active ingredients from traditional medicines purely by chance. Subsequently, classical pharmacology was used to investigate chemical libraries including small molecules, natural products, or plant extracts and find those with therapeutic effects.
Today drug discovery involves:
screening hits (high throughput screening in vitro and secondary assays),
medicinal chemistry (design and synthesis),
optimisation of hits to reduce potential drug side effects (increasing affinity and selectivity), and
in silico studies that in combination with cellular functional tests are used to improve the functional properties of the drug candidates.
Moreover, drug discovery is broadly divided into four major stages: target selection, target validation, compound screening and lead optimization where AI-based methods are now increasingly used.
For example AI is used for:
predicting the 3D structure of a target protein,
designing new molecules,
quantum mechanics calculation of compound properties,
computer-aided organic synthesis,
real-time image-based cell sorting,
developing assays and many others.
I review in this article some of the different AI tools and startups for primary and secondary screening during drug discovery.
🔳 AI in primary drug screening
The process of finding a new drug against a chosen target for a particular disease involves screening (high-throughput screening) wherein large libraries of chemicals are tested for their ability to modify a target. Furthermore, drug screening is divided in primary screening that allows direct high throughput measurements on cells of small compounds (chemical libraries) and secondary screening designed to confirm hits efficacy by a series of functional cellular assays.
During screening, AI tools are mostly used for sorting and classification of cells by image analysis (image-based profiling). For image-based phenotyping after treatment, one can distinguish two approaches:
The first approach includes screening applications (often called also high-content) that are focused on a pre-defined, specific phenotype with the aim to identify drugs (or drug targets) that can modulate it (i.e. modulate the subcellular localisation of a specific protein).
The second application of high-throughput imaging is the more global profiling of perturbations after cell treatment, and is complementary to techniques like transcriptional profiling. For this reason, the subcellular structures are stained with fluorescent dyes or fluorescently labeled antibodies are used to ‘paint’ and visualise cells and sub cellular structures, while automated image analysis is subsequently used to profile the phenotype of these cells.
In general, computer vision can extract multivariate feature vectors of cell morphology such as cell size, shape, texture and staining intensity without further human intervention. All large-scale studies employ segmentation approaches — cell images are segmented from the background by varying the image contrast — to accurately define cellular outlines prior to feature extraction. Tamura texture features (coarseness, contrast, directionality, line-likeness, regularity and roughness) and wavelet-based texture features re then extracted, and principal component analysis (PCA) technique is used for reducing the dimensionality of the extracted features. AI-based methodologies are then trained to classify different cell types. Among the tested methods, the least-square support vector machine (LS-SVM) method — a set of related supervised learning methods that analyse data and recognise patterns — shows the highest classification accuracy.
For cell sorting, AI-based image analysis decision-making needs to be sufficiently rapid to accurately separate different cell types. For this reason, most modern image-activated cell sorting (IACS) devices measure optical, electrical and mechanical cell properties for highly flexible and scalable automation of cell sorting. These instruments allow high-speed digital image processing and decision-making within a few tens of milliseconds.
🔳 AI in secondary drug screening
The goal of secondary screening is to generate added information in order to transition from a relatively large number of hits generated during the primary screening to a more manageable number of higher quality compounds that can hopefully become lead candidates.
This secondary screening paradigm — named also focused or knowledge-based screening — involves selecting from the chemical library smaller subsets of molecules that are likely to have activity at the target protein based on knowledge and literature.
This type of knowledge-based screening has given rise — using pharmacophores (a pharmacophore is a 2D or 3D arrangement of chemical features essential for biological activity) and molecular modelling — to virtual screens of compound databases. Moreover, during the secondary screening the small compound libraries are screened in specifically designed assays, such as:
biochemical — kinase/ATPase assays, protease assays, protein interaction assays — or
cell based — reporter assays, viability assays, GPCR and ion channel assays, qPCR.
Let’s see now how AI can help during secondary screening.
▪️For example, the toxicology profile of a compound is an important parameter during screening, so ensuring good absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties is crucial to a given drug’s success.
An AI solution to the above problem comes from the DeepTox algorithm, that gave outstanding results in the Tox21 Data Challenge, a contest in which the participating groups attempted to computationally predict 12000 environmental chemicals and drugs for 12 different toxic effects in specifically designed assays.
The DeepTox algorithm first normalises the chemical representations of the compounds, from which a large number of chemical descriptors are computed and used as the input to ML methods. The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment . The descriptors are categorised as:
Static descriptors, that include atom counts, surface areas and the presence or absence of a predefined substructure in a compound. The presence and absence of 2500 predefined toxicophore features (a toxicophore is chemical structure or a portion of a structure that is related to the toxic properties of a chemical), and other chemical features extracted from standard molecular fingerprint descriptors are also calculated. And
Dynamic descriptors, that are calculated in a prespecified way and despite a potentially infinite number of different dynamic features, the algorithm keeps the dataset within manageable limits.
In typical test cases, the DeepTox algorithm shows good accuracy in predicting the toxicology of compounds.
▪️Another problem during secondary screening is assay interference caused by small molecules. A number of approaches have been derived that allow the flagging of potentially “badly behaving compounds”, “bad actors”, or “nuisance compounds”. Usually, these compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters.
The solution to this problem comes from Hit Dexter, a recently introduced ML approach that predicts how likely a small molecule is to trigger a positive response in biochemical assays (including also the binding of compounds based on “privileged scaffolds” to multiple binding sites). The models used by Hit Dexter were derived from a dataset of 250,000 compounds with experimentally determined activity for at least 50 different protein groups.
The new Hit Dexter 2.0 web service is now avalaible which now covers both primary and secondary screening assays, providing user-friendly access similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.
▪️Moreover, quantitative structure-activity relationship (QSAR) based approaches have proven to be very valuable in predicting physicochemical properties, biological activity, toxicity, chemical reactivity and metabolism of chemical compounds. QSAR studies attempt to build mathematical models relating physical and chemical properties of compounds to their chemical structure. Such mathematical models could be used to inform pharmacological studies by providing an in silico metholodogy to test or rank new compounds for desired properties without actual wet lab experiments. In fact, QSAR studies are already used extenstively to predict pharmacokinetic properties such as ADMET, and now are also increasingly being accepted within regulatory decision-making process as an alternative to animal tests for toxicity screening of chemicals.
However, a variety of Deep neural networks (DNNs) — complex computational models such as computer vision and natural language processing — have also generated promising results for QSAR tasks (DeepNeuralNet- QSAR). Previous work showed that DNNs can routinely make better predictions than traditional methods, such as random forests, on a diverse collection of QSAR data sets. It was also found that multitask DNN models — those trained on and predicting multiple QSAR properties simultaneously — outperform DNNs trained separately on the individual data sets in many, but not all tasks.
Additionaly, in a typical QSAR study, MMPs (matched molecular pair) are generated via retrosynthesis rules for de novo design tasks. (An MMP analysis investigates a single localised change to a drug candidate and its impact on the molecular properties and bioactivity of the molecule.) Then three ML methods,
namely random forest, RF (a classification algorithm consisting of many decisions trees),
gradient boosting machines, GBMs (a ML technique for regression and classification problems) and
DNNs that were previously applied without MMP,
are used to extrapolate to new transformations, fragments and modifications.
Moreover, with the dramatic increase of public databases (such as ChEMBL — a manually curated database of bioactive molecules with drug-like properties and Pubchem — a database of chemical molecules and their activities against biological assays) that contain a large number of structure–activity relationship (SAR) analyses, MMP with ML has been used to predict many bioactivity properties such as oral exposure, distribution coefficient (logD), intrinsic clearance and ADMET.
AI startups involved in screening
Delta 4 conducts in silico screening prior to experimental screening, Micar Innovation utilises tools to shorten discovery and screening, lead optimization and ADMET studies, Synsight analyses data from molecular modeling and high-content screening, Remedium uses AI to reduce dependence on mass screenings — both virtual and chemical — through rapid selection of small molecule agonists, antagonists, or functional mimics of protein drug candidates, Quantitative Medicine analyses many drug discovery factors simultaneously, such as effects, side effects and toxicity, Phenomic AI analyses cell and tissue phenotypes in microscopy data, Aiforia uses images (tissues and cells) uploaded to their cloud, CaroCure predicts toxicity to weed out toxic compounds and increase the efficacy of computational drug discovery, Variational AI optimises pharmacological activity, synthesizability and ADMET and Peptic optimises development of peptide drugs, which have high selectivity and low toxicity.
H.C. Stephen Chan, Hanbin Shan, Thamani Dahoun, HorstVogel, Shuguang Yuan, Advancing Drug Discovery via Artificial Intelligence (2019)
Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, and Jonathan E. Allen, AMPL: A Data-Driven Modeling Pipeline for Drug Discovery (2020)
Thank you for reading 💙
And if you liked this post why not share it?
#science #health #pharma #AI_drugdiscovery #drugdiscovery #AI #biotechAI #pharma_AI