DeepMirror Spark for single instance segmentation in biomedical images

Written by Max

Almost all biomedical image analyses start with the acquisition of a few images, be it a clinical trial in which MRI scans are acquired from patients or a laboratory trying out new treatments on cultured cells. To extract clinical & scientific insights from these images, researchers and clinicians could benefit from recent advances in Artificial Intelligence (AI), for example to automatically detect specific regions of interest in these images, or to automatically classify images into multiple categories. While these tasks can be carried out reliably by human experts, this takes a lot of time and money. AI can help but needs to be trained with giant datasets that have been painstakingly curated by hand. Human-driven data curation (also called annotation) often take months or years and poses a barrier to the application of AI to new problems.

But what if one could reduce the number of annotations required to train AI? With reduced annotations, researchers & clinicians would be able to get AI up and running for new applications in no-time, accelerating breakthroughs in the biomedical sector. This would have tremendous real-world applications since while the amount of raw data is often vast, the number of validated data is not. At DeepMirror, we set out to tackle this challenge by considering how humans learn. To learn to recognize cats, for example, humans do not need a giant dataset of annotated cat images. Often a few are enough and by cross referencing with other images a human can accurately locate cats in their vicinity. We wondered if we could replicate this with an AI technique called semi-supervised learning. In semi-supervised learning, AI is trained with a few known & validated datapoints, such as for example a few images in which cats have been annotated, and a large dataset of non-annotated examples. After almost 2 years of iterating on semi-supervised learning algorithms, we built a modern framework that focuses on real world applicability, i.e., is easily adjustable for any kind of AI task.

Meet DeepMirror Spark, a semi supervised AI training platform that uses specialized data augmentations (i.e. distorting data so that it “looks” like new data), adversarial learning (using two competing AI with the objective of improving AI performance), and other techniques, to train AI on small datasets!

Our early clients often struggled with instance segmentation, which is the accurate detection of individual object shapes from images. Instance segmentation can for example be used to detect the size of cancers in radiology images and the abundance of biomarkers in histopathological images. To test and benchmark our DeepMirror Spark platform, we applied it to a customized biomedical instance segmentation architecture based on the popular UNet (Figure 1).

AI training performance

This UNet-based implementation of DeepMirror Spark alone can be used for a myriad of different image analysis tasks. In Figure 2 & 3, for example, we trained a UNet to detect Islets of Langerhans in human pancreas images with just 46 annotated images, and brain tumors in MRI scans with only 75 images. In both cases conventional (i.e., non semi-supervised) training led to lower training performance and worse segmentation.

Ablation Studies on radiology & cell biology image data

To test DeepMirror Spark’s performance as a function of annotated images we performed ablation studies, i.e. we took several publicly available and internally generated image datasets and varied the number of labelled images. In doing this experiment we were able to quantify the improvement that DeepMirror Spark has over conventional training, for small datasets (Figure 4-6). 

We observed that DeepMirror Spark enabled us to reach the maximum Average Precision (AP) score with only 20-300 images depending on complexity of the data. Conventional training of the same network failed to reach the same AP scores with any number of images from a small dataset, implying that one can generate functional AI much faster using DeepMirror Spark.

What’s next

We are now working on extending the DeepMirror Spark platform to perform other tasks beyond instance segmentation. In the next few weeks, we will add image classification, semantic segmentation and more. Additionally, we are working on integrating other datatypes such as genomic sequences and molecular structures for gene editing experiments and drug discovery! Stay tuned for further blogposts on these topics 🙂

If you want to add AI to your projects in the fastest way possible, get in touch with us at enquiries@deepmirror.ai to do a pilot study on your dataset!

Figures

Figure 1: Schematic – DeepMirror Spark for UNet based instance segmentation. To use DeepMirror Spark for instance segmentation we trained a custom UNet with the platform algorithm using both labelled and unlabeled images of biological cells. The trained UNet can then be used for instance segmentation on other images to generate annotations or perform analysis.

Figure 2: Detecting Islets of Langerhans with DeepMirror Spark. We trained a UNet to detect islets of Langerhans (or pancreatic islets) in human pancreatic slices with both DeepMirror Spark and conventional non-semi-supervised training. Cells inside pancreatic islets produce insulin and measuring their abundance is important in diabetes research & diagnosis.  While the network that was trained with Spark was able to pick up Islets, the one that was trained conventionally did not. The Root Mean Squared Error (RMSE) of the validation dataset during training shows this difference. The networks were trained with 46 labelled images and 1087 unlabeled ones. Image source: Novo-Nordisk histologic image analysis challenge (https://www.innovitaresearch.com/2020/04/28/novo-nordisk-challenge-histologic-image-analysis-of-pancreatic-tissue/).

Figure 3: Detecting Brain Tumors with DeepMirror Spark. We trained a UNet to detect brain tumors in MRI scans with both DeepMirror Spark and conventional non-semi-supervised training. Diagnosing tumors rapidly without constant expert supervision would free up radiologists’ time to focus on the important cases. While both networks were able to pick up the tumor in the example, the conventionally trained one also mis-classified other parts of the image as tumors. The Root Mean Squared Error (RMSE) of the validation dataset during training shows this difference. The networks were trained with 75 labelled images and 750 unlabeled ones. Dataset obtained from: (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157112)

Figure 4: Ablation study of neuron segmentation. Here we iteratively increased the number of labeled images used for training to a maximum of 80 images. The left hand side shows the raw image and a manual labeling. The Average Precision (AP) score for semi-supervised training with DeepMirror Spark outperformed conventional training at all labelled dataset sizes and reached a plateau after ~20 images for DeepMirror Spark. All quantifications were done on a separate test dataset. The Dataset was generated and annotated by Eva Kreysing from the Franze laboratory at the University of Cambridge.

Figure 5: Ablation study of brain tumour segmentation. Here we iteratively increased the number of labeled images used for training to a maximum of 640 images. The left hand side shows the raw image and a manual labeling. The Average Precision (AP) score for semi-supervised training with DeepMirror Spark outperformed conventional training at all labelled dataset sizes and reached a plateau after ~300 images. All quantifications were done on a separate test dataset. Dataset taken from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157112.

Figure 6: Ablation study of generic cell segmentation (Human Protein Atlas dataset). Here we iteratively increased the number of labeled images used for training to a maximum of 320 images. The left hand side shows the raw image and a manual labeling. The Average Precision (AP) score for semi-supervised training with DeepMirror Spark outperformed conventional training at all labelled dataset sizes and reached a plateau after ~10 images. All quantifications were done on a separate test dataset. Data taken from https://www.proteinatlas.org and annotated by hand.