Convolutional Neural Networks for Scops Owl Sound Classification
Adopting a deep learning model into bird sound classification tasks becomes a common practice in order to construct a robust automated bird sound detection system. In this paper, we employ a four-layer Convolutional Neural Network (CNN) formulated to classify different species of Indonesia scops owls based on their vocal sounds. Two widely used representations of an acoustic signal: log-scaled mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) are extracted from each sound file and fed into the network separately to compare the model performance with different inputs. A more complex CNN that can simultaneously process the two extracted acoustic representations is proposed to provide a direct comparison with the baseline model. The dual-input network is the well-performing model in our experiment that achieves 97.55% Mean Average Precision (MAP). Meanwhile, the baseline model achieves a MAP score of 94.36% for the mel-spectrogram input and 96.08% for the MFCC input.
International Conference on Computer Science and Computational Intelligence 2020
Alam Ahmad Hidayat, Tjeng Wawan Cenggoro, and Bens Pardamean