Exploring EfficientNet Variants in Auxiliary Signal Guided Knowledge Encoder-Decoder Framework

The growing number of medical images has led to radiologist burnout, which seriously impacts the radiologist’s performance. To address the previously mentioned issue, an Auxiliary Signal Guided Knowledge (ASGK) multimodal encoder-decoder framework was designed to automatically generate the medical report based on the proposed medical graph and natural language decoder. It utilizes DenseNet-121 as the image encoder. With DenseNet-121 lack of computational and memory efficiency, this study aims to explore the potential of EfficientNetB0 to EfficientNetB4 as an ASGK image encoder substitute. The framework is trained with IU X-Ray dataset for 30 epochs, with Adam optimizer, a learning rate of 0.01 with 0.8 decay rate, binary cross entropy loss for the medical tags, and cross-entropy loss for the generated medical captions. During the framework training process with each image encoder, the parameter that achieves the highest CIDEr score on the validation set is considered the best image encoder parameter and will be used on the test set. On the test set, EfficientNetB3 as an ASGK image encoder has been shown to increase the CIDEr score to 0.35, a significant increase from the 0.28 CIDEr score obtained by the ASGK using DenseNet-121. This score is only a 1% decrease from the best validation score. It suggests that not only EfficientNetB3 increases the framework’s performance, it is also less prone to overfitting. This study has demonstrated that EfficientNetB3 is a potential image encoder substitute for DenseNet-121 in the ASGK framework.

Authors:
Matthew Martianus Henry, Nur Adhianti Heryanto, Bens Pardamean

2024 9th International Conference on Computer Science and Computational Intelligence (ICCSCI)

Read Full Article