Digital imaging
In biology, imaging—the process of visualizing biological activity—is an umbrella term that may refer to biological imaging, bioimaging, pathology and medical imaging:
Biological imaging—refers to technologies for viewing with microscopy (light, fluorescence, confocal) biological substances that have been fixed/prepared for monitoring. Maximum-resolution, two-positive charge fluorescent excitation microscopy, fluorescence redistribution after photobleaching and fluorescence resonance energy transfer are some of the recent advancements in biological imaging. Bioimaging in a broader sense is a digital technology based medical advancement that has to do with real-time visualization of biological processes, and includes X-ray and ultrasound pictures, MRI, 3D and 4D body images utilizing Computed Tomography (CT) scans and DEXA scans.
Histo-pathology—refers to the examination of a biopsy or surgical specimen by a pathologist with a light or fluorescence microscope—and digital pathology—includes the acquisition, management, sharing and interpretation of pathology information including slides. For more: AI in Digital Pathology. And
Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues, for example cardiovascular imaging, breast imaging, lung imaging etc. For more: AI-based medical imaging tools.
In general, data mining and knowledge extraction from all imaging data is not only about peer-reviewed literature and stored imaging data, but is also about medical records—computer tomography and magnetic resonance imaging scans, signals from electroencephalograms, specimen analysis and clinical data from patients— and imaging data from histopathology labs and from unpublished Proprietary Lab Data—observations and lab notes—making biomedical data imaging a real beast to tame!
In fact, the number of proprietary image file formats i.e. read/extract metadata is very large, with the Bio-Formats library currently supporting over 161 different file formats, indicating the extension of the file that you should choose if you want to open/import a dataset in a particular format. This table keeps increasing since the bioimaging community (unlike other domains) has not yet agreed on a single standard data format for collecting images which can be generated by all acquisition systems. Instead, each vendor has a proprietary file format for image collection.
In relation to image data mining, RDMkit—the online guide containing good data management practices applicable to research projects from the beginning to the end—is reviewing everything you need for image collection such as (source Bioimaging data):
Open source libraries from vendors for parsing proprietary file formats,
Open source translators (Bio-Formats/Java that supports over 150 file formats, OpenSlide primarily for whole-slide imaging/WSI formats and AICSImageIO/Python that wraps vendor libraries),
Permanent conversion (Open Microscopy Consortium/OME a consortium that has created solutions for bioimage data models, data translation and data transformation and has built the image data management system OME Remote Objects/OMERO, and the bioformats2raw & raw2ometiff toolchain provided by Glencoe Software that allows the more performant conversion of your data),
Cloud storage (OME is currently developing a next-generation file format/NGFF that you can use since most current image file formats are not suitable for cloud storage),
Metadata (OME model an XML-based representation of microscopy data for storing data on biological imaging, 4DN-BINA-OME-QUAREP/NBO-Q the 4DN Initiative and BINA propose light Microscopy Metadata Specifications that extend the OME Data Model and Recommended Metadata for Biological Images/REMBI),
Solutions for data collection such as:
Agnostic platforms that can be used to bridge between domain data: iRODS and b2share.
Image-specific data management platforms: OMERO, Cytomine-IMS, XNAT, MyTARDIS and BisQue. For example, OMERO handles all your images in a secure central repository. You can view, organize, analyze and share your data from anywhere you have internet access. Work with your images from a desktop app (Windows, Mac or Linux), from the web or from 3rd party software. Over 150 image file formats are supported, including all major microscope formats.
Ontologies Resources available at: Zooma, Ontology Lookup Service and BioPortal. And
Solutions for Data publication and archiving, and many more.
Moreover, new data resources are emerging constantly to create a central bioimage archive for biological data for all imaging modalities of life-sciences—that aims to maximize the use of valuable imaging data, to improve reproducibility of published results that rely on image data and to facilitate development of both novel biological insights from existing data and new image analysis methods—and include:
Euro-BioImaging: a EU funded project hosted by EMBL that offers open access to imaging technologies, training and data services in biological and biomedical imaging. Euro-BioImaging consists of imaging facilities, called Nodes, that have opened their doors to all life science researchers.
BioImage Model Zoo—created by the AI4LIFE consortium—is a community-driven, fully open resource where standardized pre-trained models can be shared, explored, tested and downloaded for further adaptation or direct deployment in multiple end user-facing tools (e.g., ilastik, deepImageJ, QuPath, StarDist, ImJoy, ZeroCostDL4Mic, CSBDeep), aiming to lay the groundwork to make DL methods for bioimaging findable, accessible, interoperable and reusable across software tools and platforms (BioImage Model Zoo: A Community-Driven Resource for Accessible Deep Learning in BioImage Analysis).
Global BioImaging, an international network of imaging infrastructures and communities, which was initiated in 2015 by a european funded project,
Quality Assessment and Reproducibility for Instruments and Images in Light Microscopy, QUAREP-LiMi, aiming at improving the reproducibility of light microscopy experiments in Life and Material Sciences,
German BioImaging - Society for Microscopy and Image Analysis e.V., GerBI-GMB, that connects and supports microscopists and bioimage analysts as well as core facilities throughout Germany,
XNAT has become the data archiving tool of choice for multiple neuroimaging projects and research labs around the world,
Mars provides a collection of ImageJ2/Fiji commands to find, fit, track and characterize single molecules. The algorithms provided are routinely used to analyze data arising from multicolor single molecule TIRF microscopy (that allows for visualization of single molecules by eliminating out-of-focus fluorescence and enhancing the signal-to-noise ratio) and DNA flow stretching experiments (that allows imaging of many DNA molecules). Primary image analysis commands generate Molecule Archives which contain collections of individual biomolecule and image metadata records,
Research Data Management for Microscopy RDM4mic group that deals with the topic of research data management for microscopy, and
The online community Image.sc around software-oriented aspects of scientific imaging, particularly (but not limited to) image analysis, processing, acquisition, storage and management of digital scientific images.
Image segmentation and AI
Image segmentation is a commonly used technique in digital image processing and analysis to partition an image into multiple parts or regions, often based on the characteristics of the pixels in the image, that is using both non AI and AI techniques like random forest (shallow learning) and convolutional neural networks (DL) that have emerged recently as powerful methods for analysis.
In a fully non AI/ML segmentation system, the researcher must use a specialist image analysis software to predefine all of the different operations (filtering, edge detection, thresholding, etc) as well as their parameters (filter radius, threshold value, etc) to perform the segmentation (Artificial intelligence for image analysis in microscopy). On the other hand, shallow learning segmentation aims to alleviate some of the burden of the fully hand-designed approach, by automating a large part of the configuration process, except “feature engineering” that the researcher must still use its domain knowledge to decide on a candidate set of operations. Contrary to shallow learning, DL segmentation goes a step further and places even the feature engineering under the control of the computer.
One particular DL architecture that has proved itself to be very effective is the U-Net, which has become a real workhorse in image segmentation, a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg. In general, the training of a DL model is governed by “hyperparameters”, that is external configuration variables that data scientists use to manage ML model training and are manually set before training a model. Whereas, the “weights” are commonly model parameters that are used for lower-level tuning such as weight, a feature, or an instance.
In 2023, a group of researchers proposed a novel approach to 3D widefield microscopy reconstruction (and not with specialized equipment and techniques like confocal microscopy) through semantic segmentation of in-focus and out-of-focus pixels. For this, they explored a number of rule-based algorithms commonly used for software-based autofocusing and applied them to a dataset of widefield focal stacks, proposing a computation scheme allowing the calculation of lateral focus score maps of the slices of each stack using these algorithms. Furthermore, they identified algorithms preferable for obtaining such maps. Finally, they proposed a surrogate model based on a deep neural network, capable of segmenting in-focus pixels from the out-of-focus background in a fast and reliable fashion. The deep-neural-network-based approach allows a major speedup for data processing making it usable for online data processing (A weak-labelling and deep learning approach for in-focus object segmentation in 3D widefield microscopy).
Another example is Mesmer, a DL-enabled segmentation algorithm trained on TissueNet, that is an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms, giving us a DL-enabled segmentation algorithm for Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Mesmer achieved human-level performance for whole-cell segmentation, and enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. Finally, ilastik is an open source toolkit for interactive ML, a point-and-click interface to help detect not just cells and nuclei but also features such as microtubules and vesicles.
AI/ML solutions and tools for imaging
⚙ Altis Labs (Canada, 2019) is a computational imaging company accelerating clinical trials with its AI platform Nota, that enables sponsors to ingest imaging data from sites and CROs, creating the single source of truth for imaging data. Nota also hosts proprietary AI models that generate 100+ clinically meaningful predictions automatically from each scan for exploratory research.
Altis Labs is part of an international project that includes AstraZeneca and Bayer to advance the use of "digital twins" in clinical trials, and in June 2023, Altis Lab announced the closing of its $6M seed financing.
⚙ Owkin (France/US, 2016) is a French-American full-stack AI biotech that identifies new treatments, optimizes clinical trials and develops diagnostics using histology slides and omics. They are offering:
➡️ Multimodal patient data, like:
📎 MOSAIC, a landmark research project to create a multimodal dataset with spatial transcriptomes from 7,000 patients in 7 cancer types. This is the largest spatial omics atlas to date.
Spatial-omics is an overarching term for different technologies that allows overlaying of -omics data onto tissue images. Beyond the identification of molecular subpopulations of cells, these approaches can provide information regarding the (spatial) localization of the identified subpopulations within the tissue of origin, their proximity with each other and with the extracellular matrix, blood vessels and other tissue components.
➡️ Subgroup discovery: They apply AI to multimodal, KOL-curated data to subtype patients and identify novel biomarkers to inform drug discovery, de-risk clinical trials and develop and deploy diagnostics in clinical practice.
➡️ AI drug discovery (for novel drug targets and drug positioning) and AI drug development (to increase the probability of success of clinical trials).
➡️ AI diagnostics: They pre-screen for biomarkers and predict outcomes—giving healthcare providers a fuller picture of a patient’s disease. For example,
📎 MSIntuit™ CRC is a CE-marked AI diagnostic that provides a prescreen approach for digital pathologists. MSIntuit is used for Microsatellite instability (MSI), a key genomic biomarker that plays an important role in the treatment of colorectal tumors (CRC) patients, from H&E/WSI (hematoxylin-eosin-safran-stained/whole slide images). On June 14, 2023, Owkin successfully validated MSIntuit™ CRC AI that is now integrated into clinical workflows via France's largest network of pathologists, the Medipath.
MSI is the condition of genetic hyper-mutability that results from impaired DNA mismatch repair (MMR).
📎 RlapsRisk™ BC is an AI diagnostic to help pathologists and oncologists determine the right treatment pathway for early breast cancer patients. RlapsRisk BC assesses the risk of distant relapse at 5 years of ER+/HER2- early invasive breast cancer patients, post surgery, from HES/WSI and clinical data.
Regarding WSI, The ACROBAT challenge aims to advance the development of whole-slide-image (WSI) registration algorithms that can align WSIs of breast cancer tissue sections that were stained with immunohistochemistry or haematoxylin and eosin.
On June 14, 2023, Owkin successfully validated its MSIntuit™ CRC AI solution for colorectal cancer screening, a technology that is now integrated into clinical workflows via France's largest network of pathologists, the Medipath. And in January 2014, Owkin and Evotec entered into an AI-powered strategic partnership in oncology, immunology and inflammation.
Owkin is founder-led by Thomas Clozel, MD, Oncologist, and Gilles Wainrib, PhD, Professor of ML, is trusted by 8 biopharmas, is first in class in AI diagnostics and well-funded with million dollars (a total of $304.1M) raised from leading biopharma companies (Sanofi & BMS) and venture funds (Fidelity, Google Ventures & BPI among others).
⚙ Flywheel (US, 2012) a cloud-based company with a medical imaging AI platform helps imaging/data scientists accelerate drug development initiatives and builds reliable, scalable medical imaging solutions that seamlessly integrate advanced AI technology from NVIDIA and other technology partners to enable robust imaging data management and analysis. During drug screening, Flywheel can offer the following solutions to the imaging research labs:
Metadata management with search,
Automated pre-processing & pipelines,
ML workflow,
Customisation via APIs, Python, & Matlab,
Provenance,
BIDS support (Brain Imaging Data Structure is a standardized format for organizing and describing neuroimaging data and study outputs), and
Secure collaboration.
Regarding brain research, Flywheel accelerates neuroimaging breakthroughs with:
BIDS conversion & curation
BIDS Apps: MRIQC that leverages BIDS compliant datasets in order to perform quality assessments, fMRIPrep for preprocessing of functional magnetic resonance imaging etc.)
Support for MRI, CT, PET, EEG, digital pathology and more
Robust viewer features; streamlined annotation and segmentation
Centralized multimodal database
Easy conversion of DICOM (the international standard to transmit, store, retrieve, print, process, and display medical imaging information) to NIfTI (a data format for the storage of Functional Magnetic Resonance Imaging (fMRI) and other medical images)
More than 50 Gears—standardized plug-in applications—specifically for neuro research functions and
Extensible by design, with open architecture for access by any platform.
On June 27, 2023, Flywheel announced it has raised $54M in Series D funding co-led by Novalis LifeSciences LLC and NVentures, NVIDIA’s venture capital arm. Microsoft also participated in the round, along with insiders Invenshure, 8VC, Beringea, Hewlett Packard Enterprise, Intuitive Ventures, iSelect, Gundersen Health System, Seraph, and Great North Ventures. Faegre Drinker Biddle & Reath LLP served as counsel to Flywheel in connection with the financing.
⚙ Paige (US, 2018) is a global leader in end-to-end digital pathology solutions and clinical AI with the first Large Foundation Model using over one billion images from half a million pathology slides across multiple cancer types, and is developing with Microsoft a new AI model that is orders-of-magnitude larger than any other image-based AI model existing today configured with billions of parameters. For this collaboration, Paige is incorporating up to four million digitized microscopy slides across multiple types of cancer from its unmatched petabyte-scale archive of clinical data, and will utilize Microsoft’s advanced supercomputing infrastructure to train the technology at scale and ultimately deploy it to hospitals and laboratories across the globe using Azure.
In January 9, 2024, Paige announced the release of a groundbreaking product developed from Paige’s Pathology Foundation Model, Virchow, that can detect cancer across more than 17 different tissue types including skin, lung and the gastrointestinal tract, along with multiple rare tumor types and metastatic deposits. The early success of their Foundation Model has been possible due to the size, quality, and diversity of the datasets they used to build it
⚙ Synsight is a deep tech company developing a screening technology that enables the development of effective first-in-class drug candidates (for RNA targeting) based on a discovery platform of AI and cell imaging. The French Synsight developed the Microtubule Bench technology (MT bench®), from a research tool to an industrialized cell testing to screen molecules by microscopy, making possible to identify and quantify the modulations of small molecules on proteins and nucleic acids with convergence of High-Content.
To target interactions between mRNA and RNA-binding proteins, MTBench® requires the fusion of a fluorescently tagged RNA-binding protein of interest with a microtubule-binding domain (MBD). When expressed in a cell, the fusion protein is directed onto the microtubules, where it can behave like a bait and bind prey mRNA. The MTBench® assay employs high-throughput microscopy to derive a correlation score from a single test condition, reflecting the colocalization between the fluorescence of baits and preys on the same microtubules. This score is then translated into a mathematical regression slope, serving as the assay output. In MTBench®, a positive slope indicates an interaction, while a reduced slope value signifies the inhibition of this specific interaction in the presence of an active compound. This straightforward method allows for easy adaptation to target other interfaces, like interactions between proteins or between a protein and a specific RNA sequence.
Since its initial introduction in 2015, Synsight has miniaturized and standardized the MTBench® technology through the application of lab automation and robotics, and has integrated it into its drug discovery process that incorporates their advanced computer-aided drug design algorithms and screening cascades. Last year the company secured $1.5 M from an international investor to expand its pipeline.
⚙ Pictura Bio (20221) a spin out from the University of Oxford, combines the power of single-molecule fluorescent microscopy with ML to identify pathogens in under a minute. Pictura Bio is creating the first universal testing platform (PIC-ID) to provide accurate, digital imaging-based identification of infectious disease, that is faster, simpler and more cost-effective than conventional molecular diagnostics, moving infectious disease testing out of the lab and taking it anywhere it’s needed. It combines molecular labeling, computer vision and ML to create a universal diagnostic platform that looks directly at a patient sample and can identify which pathogen is present in a matter of seconds. This is not just a faster lateral flow test or respiratory infection PCR panel – it’s a fundamentally different way of identifying ALL infectious agents.
With two patented innovations—a universal fluorescent coating and a DL neural network that can analyze and classify digital microscopy images—the Pictura Bio team is focusing on packaging the PIC-ID technology into a simple desktop ‘lab in a box’, named VISTA (Instant Recognition Identification System) around the size of a domestic microwave, comprising a single-purpose high-powered fluorescent microscope and image capture and processing technology.
Last year PicturaBio announced that it has been awarded Defence and Security Accelerator (DASA) funding through the Point of Care Diagnostics at the Front Line competition and has raised a total of $3.87M over 5 rounds.
⚙ Eyenuk (US, 2010) is a pioneering company in medical AI, that recently gained FDA approval for an innovative technology that analyzes images of the back of your eye and immediately detects if you have some form of diabetic retinopathy. The EyeArt system autonomously analyzes patient's retinal images acquired, using an integrated fundus camera, and robustly detects signs of disease while returns an easy-to-read report in under 60 seconds. The EyeArt system by incorporating a number of DL and image analysis algorithms automatically assesses the quality of the images, detects the presence and extent of lesions, and determines level of disease based on internationally recognised clinical scales. Eyenuk's total funding is over $43M.
For more “Biomedical imaging and artificial intelligence (2nd part)”.
Until next time 💮,