Shanshan Xie, Jiangjian Xie, Yang Liu, Lianshuai Sha, Ye Tian, Jiahua Dong, Diwen Liang, Kaijun Pan, Junguo Zhang. 2025: Step-by-step to success: Multi-stage learning driven robust audiovisual fusion network for fine-grained bird species classification. Avian Research, 16(1): 100280. DOI: 10.1016/j.avrs.2025.100280
Citation: Shanshan Xie, Jiangjian Xie, Yang Liu, Lianshuai Sha, Ye Tian, Jiahua Dong, Diwen Liang, Kaijun Pan, Junguo Zhang. 2025: Step-by-step to success: Multi-stage learning driven robust audiovisual fusion network for fine-grained bird species classification. Avian Research, 16(1): 100280. DOI: 10.1016/j.avrs.2025.100280

Step-by-step to success: Multi-stage learning driven robust audiovisual fusion network for fine-grained bird species classification

  • Bird monitoring and protection are essential for maintaining biodiversity, and fine-grained bird classification has become a key focus in this field. Audio-visual modalities provide critical cues for this task, but robust feature extraction and efficient fusion remain major challenges. We introduce a multi-stage fine-grained audiovisual fusion network (MSFG-AVFNet) for fine-grained bird species classification, which addresses these challenges through two key components: (1) the audiovisual feature extraction module, which adopts a multi-stage fine-tuning strategy to provide high-quality unimodal features, laying a solid foundation for modality fusion; (2) the audiovisual feature fusion module, which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion. Experiments were conducted on the self-built AVB81 and the publicly available SSW60 datasets, which contain data from 81 and 60 bird species, respectively. Comprehensive experiments demonstrate that our approach achieves notable performance gains, outperforming existing state-of-the-art methods. These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return