Transforming drug discovery through the application of AI and ML to data
TechBio: Data Driven Drug Discovery
Top Stories
Nvidia dives deeper into AI drug development with Amgen, Recursion partnerships
NVIDIA Generative AI Is Opening the Next Era of Drug Discovery and Design
Isomorphic inks deals with Eli Lilly and Novartis for drug discovery
Why Is TikTok Parent ByteDance Moving Into Biology, Chemistry And Drug Discovery?
AI-driven drug discovery is poised to boom in 2024 | The AI Beat
Hi everyone and welcome back to another edition of MetaphysicalCells all about TechBio
⚙️ New Atlantis
NewAtlantis (US)—dedicated to marine biodiversity and marine metagenomics for saving our oceans and preserving marine biodiversity—is about to quantify marine biodiversity 🐠 🪸 with open meta omics to power sustainable and scalable ocean regeneration. At New Atlantis they believe that the phylogenetic diversity of the largely un-sequenced marine microbes and planktonic communities could provide a lot of novel potential secondary metabolites that might be of commercial interest. And with a partnership with Bacalhau—a platform for faster, cost efficient and more secure distributed computation, enabling users to run arbitrary Docker containers and WebAssembly images as tasks—they want to study the genome of the marine micro-organisms sampled directly from the ocean environment, using new high-throughput DNA sequencing technology and bioinformatics tools.
But there is more. Gordon Gould—the co-founder of New Atlantis—and his team don’t want just to protect the Marine Protected Areas (MPAs), the national parks of the oceans around the globe. They want also to empower the communities around the MPAs—that most of the times lack the resources to adequately protect life in their waters due to reliance on precarious government funding and traditional philanthropy—by providing MPAs with a viable business model that generates revenues based on MPA conservation efforts and outcomes achieved in their waters, by creating an open marine metagenomics biodiversity analytics platform from which ecosystem health can be evaluated in order to issue blue carbon and biodiversity credits as a revenue source for MPAs.
But what do you get when you blend the blue economy with biology and data science? You actually get a time travel back to our origins that gives you the possibility to study biological pathways, proteins and genes (of known and unknown metabolites) in different time points throughout evolution, enabling in the end the pharma sector and the tech sector to build more efficient models for drug discovery.
⚙️ Atomic Tessellator
Atomic Tessellator (New Zealand, 2022) is a modern computational chemistry accelerated by AI. At Atomic Tessellator they are applying the latest in DL to Materials Science. They are an ab initio, in silico company, focused on Thermodynamics, Molecular Mechanics, Kinematics, Catalysis, Surface Chemistry and Reaction Pathways. They offer:
AI Catalyst Discovery Lab (search, refine, verify): an AI assisted similarity search for desirable catalytic activity profile, uploading your own slabs or query their database of 10s of thousands. And extend successful simulations to act as seeds for more refined searches and automatically filter based on reactivity, selectivity, surface stability and many more.
GPU visualization tools: Charge density grids scaling to millions of voxels, Isosurfaces in real time. Animated charge flow and support for uploading your own cubefiles.
Generative AI - Retrosynthetic Pathways
Fork simulations, and
Process Simulation (Interoperable, Refine): Changeable compute backends for different reaction network discovery. Viewing results on-platform or download with the API. Extracting candidate adsorbates for downstream analysis and gas phase, liquid phase and surface phase simulations, with support for multiple reaction vessels.
⚙️ Cradle Bio
Cradle Bio (Netherlands) exited stealth in 2022 and is using generative AI for protein engineering (enzymes, vaccines, peptides and antibodies) and cell factories 🏭. Cradle’s design platform makes it easy for everyone to start building products with biology instead of oil or animals, leveraging generative ML models to transform how biologists design and optimize proteins. In particular, synthetic biology can generate environmentally friendly alternatives to 60% of everything humans consume (including food, clothing, materials, medicines and chemicals), by adapting the genes of microorganisms such as bacteria and fungi to create ‘cell factories’ that use programmable proteins to produce a wide variety of everyday products: from milk and meat, grown without farming animals, to plastics created without petrochemicals, materials for clothing or electronic components, or even personalized medicines. The challenge of this process is that building these proteins is currently a costly and laborious iterative process based on trial and error.
Cradle solves this problem by giving scientists the ability to ‘reverse engineer’ proteins with the desired specific properties and has built a working platform that is already being used by a number of early stage design partners and so far nine leading industry partners, including Johnson & Johnson Innovation, Novozymes and Twist Bioscience. Cradle’s operations include a ‘wet lab’, which allows the company to generate data to train their ML models. It has offices in Delft, The Netherlands and Zurich, Switzerland and a team of ML and biotech research specialists with experience at many of the world’s leading technology and biotech companies, including Google, Google X, Zymergen, Uber and Perfect Day.
The co-founder and CEO Stef Van Grieken and the co-founder Elise de Reus presented last year at the Slush Product Launch their product, and the European biotech startup Cradle, backed by Index Ventures and Kindred Capital, has raised so far a total of $33M.
⚙️ Peptone
Peptone—founded in 2018 and born out of 30 years of academic research at the universities of Cambridge, Oxford, ETH Zurich and Groningen—is a translational biophysics company focused on drugging Intrinsically Disordered Proteins. The focal point of their protein therapeutics platform (Oppenheimer) is an intersection of physics, molecular biology and next generation supercomputing.
Proprietary NMR experiments (Nuclear Magnetic Resonance spectroscopy) coupled with HDX-MS (Hydrogen deuterium exchange mass spectrometry) as well as other experiments are all used to generate ambiguous restraints for modeling of proteins ensembles of Intrinsically Disordered Protein targets. Then their proprietary Molecular Dynamics engine uses the experimental restraints to simulate non-canonical behavior of the disordered targets and identify the most plausible drugging sites. Their ML methods generate compact but diverse libraries of high quality protein binders against the best ranking spatial disordered epitopes. Finally, their advanced biophysical laboratory performs lead selection and end to end testing. A data driven decision is made if the binder selection process requires more iterations.
Dr. Kamil Tamiola, Dr. Patrik Foerch and Dr. Ben Owens—who are key members of their executive team—will attend the annual JP Morgan Healthcare Investment Conference in San Francisco in 2024, to provide updates on the rollout of their innovative HDX-MS technology for both small and large-molecule drug discovery, which is specifically designed for intrinsically disordered proteins mostly known for their involvement in conditions such as Alzheimer’s and Parkinson’s.
Peptone—selected as one of Nature Biotechnology’s top 🎩 academic spinouts in 2023—has raised a total of $42.4M in funding over 2 rounds.
⚙️ Native Labs
Native Labs (UK) founded by Louis Brookstein is a company on a mission to revolutionize bioprocessing, by utilizing the cutting edge in automation, data and AI, in order to pioneer innovative solutions that streamline and supercharge bio-process development and manufacturing, enabling to bring higher quality products to market faster and cheaper.
⚙️ CorrDyn
CorrDyn (US, 2015) is a data-driven technology consultancy 🏗️ that enables scalable and affordable growth. At CorrDyn, they have a team of experts in data, engineering and technology who partner with clients to enable innovation, efficiency and scale. They provide strategic planning and implementation that allows for continual improvement on a sustainable budget, with a defined return on investment. More about their capabilities (Data Engineering, Business Intelligence, Enterprise Software, Digital Transformation, ML, Systems Integration, Process Automation and Technology Strategy) here.
Their primary experience is in developing data infrastructure for biotech firms, especially high-throughput data pipelines, low-latency reporting, ML models and ML operations platforms. For example, for a healthcare e-commerce company the CorrDyn team:
Developed a data pipeline from each SaaS provider and proprietary system to a BigQuery data warehouse, utilizing serverless processing and storage methods to keep infrastructure costs manageable and forecastable.
Created complex metadata attribution rules to determine which product category, parent SKU, and child SKU should be associated with each revenue line item, cost line item, and return line item.
Normalized data into a single source of truth for sales, product expenses, advertising expenses, and returns at the product category, parent SKU, and child SKU levels. And
Built an automated business intelligence suite in Looker to provide visibility into sales, marketing, and operations to all decision-makers.
Data in Biotech by CorrDyn is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences (guests: Cody Schiffer, Vera Mucaj, Harry Rickerby, Pradeep Ravindra and many more).
⚙️ Biolexis Therapeutics
Biolexis Therapeutics (US, 2021) is at the forefront of drug discovery, thanks to a groundbreaking AI-Enabled drug discovery process, the MolecuLern. By using their proprietary set of over half a million real, wet lab/empirical data points, they analyze a spectrum of protein/small molecule interactions to determine high-quality hits against the protein targets of interest, all with unprecedented speed, exceptional accuracy and IP-rich qualities.
The hit validation and selection process utilizes their set of target product profiles and ADME-TOX parameters incorporating their MolecuLern™ Hot Spot residue validation, binding mode, binding energetics and human feedback loops. The best hit compounds are further optimized using ML/ human-in-the-loop structure-based drug discovery methods, ADME-Tox property predictions and virtual/experimental screening, allowing better drugs with fewer iterations. Their lead optimization process heavily involves real data, human feedback loops, structure-based activities, and structure-activity relationship (SAR) for lead selection and new chemical entities.
They have a vast pipeline 🧳 including a range of promising oncology (by their partner Signalexis), metabolic (by Metabolexis a subsidiary of Biolexis), neurologic and anti-inflammatory for autoimmune diseases drug candidates.
Biolexis Therapeutics raised a total of $14.73M.
⚙️ Digitalis Commons
Digitalis Commons (New York, San Francisco, Boston) is a non-profit organization that builds frontier-advancing, scalable solutions that have an outsized impact on important problems in health and health care. They partner with technical innovators, entrepreneurs, investors, philanthropic groups and funding agencies to tackle technical and commercial barriers to creating and implementing these solutions. Their foundational partnership is with the US Advanced Research Projects Agency for Health (ARPA-H). Equipped with $2.5 billion in federal dollars, ARPA-H is funding the research and development of new ways to tackle the hardest problems in health, including cancer, diabetes and Alzheimer’s disease.
In July 25, 2023, Digitalis Commons—that is the non-profit affiliate of Digitalis Ventures—entered into a Partnership Intermediary Agreement to provide commercialization services to the ARPA-H Project Accelerator Transition Office (PATIO). This first-of-its-kind agreement aims to speed up and smooth the process of getting breakthrough innovations in health to the American public. Digitalis Ventures invests in 👛:
Life Sciences:
Ascend (UK) was founded by Monograph Capital to address the need for quality manufacturing capacity for any gene and cell therapy developer. Their core expertise is in adeno-associated virus vectors and they are building capabilities by acquiring and merging with experienced teams with validated technology.
Base5 Genomics (US, 2018) is offering an immunogenomic platform for precision medicine: Data Generator™ is a sample-to-solution service that generates ground truth resolution reference sequences for immunogenomic regions and other polymorphic loci. Insight Lens™ is powered by the most complete immune reference graph available. It provides context to Data Generator output, is a force multiplier to existing short read sequencing data, and supports queries to unveil the ground truth characteristics for any areas of interest.
Bonum Therapeutics (US, 2021) is focused on a proven technology that utilizes allosteric regulation to create targeted, highly active and less toxic medicines. The therapeutic component consists of a protein-based drug, such as a cytokine, antibody binding domain, receptor, or enzyme. This therapeutic component is only active when the sensor component is bound to its target. The sensor component consists of an antibody binding domain directed against a tissue- or cell type-specific protein or metabolic marker. Sensor binding induces the therapeutic to undergo a conformational change into its active form. Sensors can be designed to target any entity that can be bound by an antibody, including peptides, proteins, and metabolites.
Code Ocean (New York, 2016, raised $37M) provides one trusted place for life science researchers to share research data and computational assets, to collaborate more quickly and to communicate more effectively. Code Ocean, is an integrated and first of its kind library and workbench that preserves a record of all coding, data and software used in computational research, and guarantees reproducibility by maintaining complete visibility of the research lineage.
Simon Adar is the co-founder and CEO of Code Ocean, a researcher turned entrepreneur who founded Code Ocean as part of his postdoc at Cornell-Tech, Cornell University. His background is in signal and image processing, hyperspectral imaging, and spectroscopy. On November 15, 2023, Code Ocean announced its ongoing partnership with the Allen Institute. Less than a year into their partnership, the Allen Institute has achieved a 4x increase in workflow efficiency, in addition to recognizing many other benefits of increased reproducibility, interoperability and collaboration.
Elegen (US, 2017) is an innovative DNA synthesis company to rapidly produce long, linear DNA without compromising accuracy or quality. ENFINIA™ DNA is long linear, NGS-verified DNA that arrives in just 6-8 business days, and is >2X longer and >20x more accurate in comparison to gene fragments from industry-leading suppliers. ENFINIA DNA can be used for many synthetic biology applications:
plant engineering,
cell and gene therapy,
mRNA vaccines,
metagenomic mining,
enzyme/pathway engineering, and
CRISPR therapeutics.
Elemental Machines (US, 2015) connects virtually any lab equipment, from any brand, with any function, from any era, to monitor usage, temperature, humidity, light, vibration and more. Their cloud connected dashboard and plug-and-play sensors begin transmitting data 60 seconds after unboxing, connecting you to your data anywhere in the world. Elemental Machines has raised a total of $59.9M and last year presented Element-D, the newest addition to its platform. Is trusted by Perkin Elmer.
Galatea Bio (US, 2020) has a mission to leverage world class ancestry algorithms and direct them to better understand underserved Latin American populations all while maintaining a strong commitment to ethical research.
Girihlet (US, 2012) is decoding T cell biology to treat immune disorders with ImmuneScanner, a diagnostic tool to monitor autoimmune treatments to help patients and doctors who treat them, to pick the best treatments possible. A collaboration with Mayo clinic has been established to study the use of ImmuneScanner in studying the effect of treatments on Ankylosing Spondylitis.
Second Genome is mining the microbe to discover novel medicines. The company was founded in 2010 by Corey Goodman, a venture capitalist and former Pfizer executive, and Todd DeSantis, the company's vice president for informatics as of 2019. They have a very sophisticated platform, to identify a specific strain or strains that elicit a specific biology, to identify genes and proteins that are responsible for the specific biology. Once a protein candidate with the best biological properties is selected then they identify and characterize the human target and develop a deep understanding of that bacterial protein - human target interaction. The therapeutic modality for the final drug candidate is selected based on deep understanding of that bacterial protein-human target interaction.
Second Genome has raised a total funding of $63.7M over 5 rounds from 14 investors.
Terray Therapeutics (US, 2018) has a small molecule screening and optimization platform. Their experimental platform is built for generative AI-driven drug discovery, integrating chemical experimentation and computation on an unprecedented scale. They explore molecules and targets more broadly and deeply with a sophisticated combination of ultra-high throughput experimentation, generative AI, biology, medicinal chemistry, automation and nanotechnology. Their experimental dataset includes more than two billion unique target-ligand binding measurements, growing by 150 million new measurements every month. They can screen 2M molecules against a target in four minutes and convert 25 TB of image data into binding affinities daily, allowing them to rapidly identify potent and selective molecules. Terray Therapeutics has raised a total of $80M.
Health technology and Services:
Onc.AI (US, 2020) is a privately held digital health company on a mission to radically improve oncology decision making. By leveraging a market-leading oncology real-world dataset (diagnostic imaging, EMR, labs, genomics), Onc.AI is developing a pipeline of AI models to fulfill the potential of precision oncology. Their first product will help medical oncologists make treatment decisions for metastatic lung cancer patients treated with PD-1 immunotherapy. Onc.AI has raised a total funding of $31M over 2 rounds from 13 investors.
Somatix (Israel/US, 2015) is harnessing wearables, AI and Big Data analytics, and real-time gesture detection technology to monitor unique behaviors such as medication intake, smoking 🚬 and daily hydration. In particular, SafeBeing™'s one-of-a-kind real-time gesture detection technology monitors a user’s hydration levels, sleep and activity data to provide insights into fall and ulcer risk, medication intake and smoking habits. As part of Roche's "Building Tomorrow Together" initiative, run by Beta-i, Somatix has been selected to provide its SafeBeing AI-enabled remote patient monitoring solution in the care of dementia patients in Portugal.
⚙️ John Snow Labs
John Snow Labs (US, 2015) is a healthcare AI and NLP company and the developer of the Spark NLP library—an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages—that has more than 1,000 state-of-the-art pre-trained transformer models available in multiple languages. The company is also the creator and host of The NLP Summit, further educating and advancing the NLP community.
In April 2023, John Snow ❄️ Labs released a new LLM called BioGPT-JSL (the first closed-book medical Q&A LLM based on BioGPT) with capabilities tuned specifically to the medical domain (interpret medical research, produce clinical text, condense clinical encounters, simplify patient inquiries). In April 2013, Oracle Health announced that its Cerner Enviza life sciences division will work with John Snow Labs to develop a new AI methodology to enhance computerized queries, or phenotyping, of digital patient data and clinical notes to support the FDA's Sentinel Initiative drug studies.