On the Detection of COVID-19 through Radiology and Unsupervised Learning

14 Jul 2020

By Weichen Huang

COVID-19 (Coronavirus Disease 2019) is an infectious viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that has claimed over 572,000 lives worldwide and injured many more. The first cases were seen in Wuhan, China, in late December 2019 before it spread globally and became a pandemic. The virus caused a devastating effect on daily lives, public health, and the global economy. It is critical to detect the positive cases as early as possible so as to prevent the further spread of this epidemic and to quickly treat affected patients. Definitive diagnosis of COVID-19 requires a positive RT-PCR test. Current best practice advises that chest radiological imaging such as computed tomography (CT) and X-ray have vital roles in early diagnosis and treatment of this disease. It is stated that CT is a sensitive method to detect COVID-19 pneumonia and can be considered as a screening tool with RT-PRC. In this article, we present a novel method that combines human lung x-ray images and deep learning models to reliably and quickly detect COVID-19 in real time.

Around two months ago, we proposed COVID-Efficientnet, which combines state of the art deep learning architectures with high quality labelled datasets to detect COVID-19 using X-ray images of lungs and is also able to detect diseases such as Pneumonia. However, a major flaw to this method is supervision. The COVID-19 X-ray datasets have to be labelled by humans and as to this day, this cannot be automated reliably. This is where my new method comes in: Unsupervised Contrastive Learning. we use a new method dubbed SimCLR by Google Research (A Simple Framework for Contrastive Learning of Visual Representations, https://arxiv.org/pdf/2002.05709.pdf).

SimCLR proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. It showed that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) and introduced a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations,and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these three innovations, SimCLR is able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a7% relative improvement over previous state-of-the-art, matching the performance of a supervisedResNet-50.

We exploit this by contrasting the lung x-ray images of COVID-19 patients and of normal patients. Using this method, we achieved 92% accuracy on the test dataset, which is competitive compared to its supervised predecessors. Though this method isn’t perfect and can lead to a few negative samples due to the methods used in unsupervised learning. This issue, however, has been addressed in recent papers such as Debiased Contrastive Learning (https://arxiv.org/abs/2007.00224) and our model can be greatly improved upon. Nonetheless, we present this particular work as a proof of concept and will be developing it over the coming weeks.

In summary, we present a novel method of using Unsupervised Contrastive Learning for radiological detection of COVID-19. This can achieve competitive results compared to supervised methods. However, it contains many issues such as negative samples.

The code to SimCLR is available here: https://github.com/google-research/simclr

Thanks for reading!