Handling Severe Data Imbalance in Chest X-Ray Image Classification With Transfer Learning Using Swav Self-Supervised Pre-Training

Ever since the COVID-19 outbreak, numerous researchers have attempted to train accurate Deep Learning (DL) models, especially Convolutional Neural Networks (CNN), to assist medical personnel in diagnosing COVID-19 infections from Chest X-Ray (CXR) images. However, data imbalance and small dataset sizes have been an issue in training DL models for medical image classification tasks. On the other hand, most researchers focused on complex novel methods instead and few explored this problem. In this research, we demonstrated how Self-Supervised Learning (SSL) can assist DL models during pre-training, and Transfer Learning (TL) can be used in training the models, which can produce models that are more robust to data imbalance. The Swapping Assignment between Views (SwAV) algorithm in particular has been known to be outstanding in enhancing the accuracy of CNN models for classification tasks after TL. By training a ResNet-50 model pre-trained using SwAV on a severely imbalanced CXR dataset, the model managed to greatly outperform its counterpart pre-trained in a standard supervised manner. The SwAV-TL ResNet-50 model attained 0.952 AUROC with 0.821 macro-averaged F1 score when trained on the imbalanced dataset. Hence, it was proven that TL using models pre-trained through SwAV can achieve better accuracy even when the dataset is severely imbalanced, which is usually the case in medical image datasets.

Communications in Mathematical Biology and Neuroscience

Hery Harjono Muljo, Bens Pardamean, Gregorius Natanael Elwirehardja, Alam Ahmad Hidayat, Digdo Sudigyo, Reza Rahutomo, Tjeng Wawan Cenggoro

Read Full Article