Building the future of Digital Pathology from small data

Written by Ashna Ahmad

The process of analysing pathology images, also called slides, to diagnose cancer and other diseases is still largely driven by expert human pathologists. With one in two people developing cancer at some point in their lives,  it is becoming increasingly important to diagnose and treat cancers early – but pathologists are in limited supply and take several hours to analyse a single slide and find hallmarks of diseases in patient samples (Figure 1). Exclusively employing human pathologists unnecessarily lengthens the time to diagnosis and leads to increased costs for public healthcare systems already strained by global health emergencies such as the COVID19 pandemic.

Figure 1: A whole-slide-image (WSI) of a prostatectomy specimen. Pathologists need to manually scan through the whole image to detect whether cancerous tissue is present. Cancerous tissue might be present in small subregions (also called “patches”) as the one highlighted in yellow. Scanning through the whole image might take many hours of exhausting precision work. If the region is missed, the patient could be misdiagnosed.

For this reason, digital pathology – the technology used to analyse information from a digital slide – presents us with incredible opportunities to make diagnostic processes much simpler and faster. Digital whole-slide imaging (WSI) provides a reliable platform for many basic analysis tasks, as well as image-sharing between different teams, and automated image analysis methods using WSI are growing in sophistication and capability. In 2021 developed the first FDA-approved digital pathology tool for prostate cancer detection [1], but new breakthroughs are needed before digital tools become sufficiently accurate and low-cost to effectively assist pathologists in their diagnoses, which are currently carried out through manual light microscopic evaluations as the industry standard.

One of the main costs associated with the development of digital pathology tools is the sheer amount of data curation required. Creating an AI model for automated image analysis may require years of model training and optimisation, and huge training sets made of tens of thousands of individual WSIs. The model employed by needed more than 12,000 WSIs to be powerful enough to be used for diagnosis [2]. Populations and diseases vary over time, and AI models need to be regularly and rapidly updated. This requires even more annotated WSIs. If we want to use digital pathology as a method to cut down the time and expenses involved in WSI analysis, it is crucial to develop models which can work with small labelled datasets.

Meet DeepMirror Spark. For Digital Pathology!

This is where DeepMirror Spark, our Breakthrough Discovery Platform, comes in. Our semi-supervised technology can build high-accuracy models with 10-100x less labelled data than conventional training methods. We previously showed how the platform performed for instance segmentation in microscopy image data (see here). Strikingly, our technology is also effective for the most important WSI applications: semantic segmentation of sub-cellular structures in digital slide images (i.e. separating regions into different tissue types), and classification of image regions (i.e. answering whether a particular region contains cancerous tissue or what clinical outcome is associated with a given image). Accurate segmentation and classification are instrumental in quickly identifying potentially unhealthy tissue and separating patient images into groups for detailed analysis. Making these processes as fast and effective as possible could have an enormous impact on the diagnostic process.

We tested DeepMirror Spark on both classification and semantic segmentation of WSI patches. Figure 2 shows results from building an image classification model for the PatchCamelyon dataset, a collection of 200,000 images containing either some tumour tissue or only non-tumour tissue. Our model produces consistently more accurate classifications compared to conventional training, and performance plateaus at a dataset size of 1,000 images, 200x smaller than the original dataset, at an Area Under Curve (AUC) score of >0.90. Achieving highly accurate classification results with 1,000 training images, rather than 10x or even 100x that amount, would enable the rapid model development needed for large-scale AI deployment of classification models in digital pathology. We are carrying out pilot studies with partner organisations to bring this breakthrough advances to the clinic.

Figure 2: Side-by-side comparison of DeepMirror Spark and conventional training for image classification. We trained a custom classifier neural net on the PatchCamelyon dataset ( which contains 92×92 pixel images that depict either tissue contain tumour or no-tumour. In total the dataset contained ~200,000 images. (A) Example classifications for conventional training and DeepMirror Spark compared with Ground Truth (GT) using only 1,000 images. (B) Cross validated area under the curve (AUC) scores for varying amounts of labelled data. DeepMirror Spark outperforms conventional training at each dataset split.

Using DeepMirror Spark for semantic segmentation produced even more ground-breaking results. Using a modified version of the popular DeepLabv3+ network we again performed an ablation study in which we systematically increased the number of samples used for training with DeepMirror Spark and a conventional training algorithm (Figure 3). As a benchmark we used the PESO dataset which contains patient prostate epithelium samples in which cancerous tissue has been segmented. Networks trained with DeepMirror Spark reached peak performance (a median IoU of ~0.93) with just 200 (!) samples. Using conventional training, scores were highly variable between cross-validation steps and generally below 0.6 IoU, even when using 2,000 (10x more) samples.

Figure 3: Side by side comparison between DeepMirror Spark and conventional training for the semantic segmentation of tumours in prostate epithelium (PESO dataset). We trained a customised DeepLabv3+ semantic segmentation model both with DeepMirror Spark and conventional training. To do so we sampled 10,000 patches from the full dataset and trained the network with an increasing number of samples using either DeepMirror Spark or conventional training. (A) Example segmentations for 256×256 pixel patches and the corresponding Jaccard Scores for models trained with 200 patches. (B) Cross-validated Jaccard Score (Intersection over Union)) as a function of the number of used patches (256×256 pixels). As seen, conventional training performance is highly variable while DeepMirror’s training reaches peak performance with as few as 200 patches.

These results demonstrate the breakthrough capabilities of our discovery platform to build the future of digital pathology. DeepMirror Spark is an exciting step towards a cost-effective AI solution capable of assisting medical professionals in the long, labour-intensive analysis workflows of current pathology.