Abstract:
The paper address the task of identifying multiple bird species from audio recordings. The proposed approach uses one of the pre-trained Deep Convolutional Neural Network (DCNN), VGG-16 model to learn the bird’s vocalization through a sliding window analysis on melspectrogram. We adopted an aggregation strategy to decide on the test file in which sigmoid outputs are aggregated and normalized. The candidates with maximum probability scores are assumed to be birds present in the audio recording. The proposed method is evaluated on the Xeno-canto bird sound database. We used bird calls from 10 different species. Mel-spectrograms (visual features) generated from the bird calls were used as input for VGG-16. The performance is also compared with the MFCC-DNN approach. The proposed visualization-based system reports an average F1-score of 0.65 and it outperforms the acoustic cue-based MFCC-DNN approach.