Enhancing Kahikatea Aerial Imagery Classification through Vision Transformers with Registers

Dacrycarpus dacrydioides (widely known as Kahikatea) is a rare tree species originated from New Zealand that plays a vital role to maintain environmental equilibrium in wetland areas. Thus, accurate classification of this type of trees in aerial imagery is critical for conservation and ecological monitoring purposes. This study investigates the effectiveness of ViT-based models with register tokens (a memory mechanism designed to improve spatial representation) in identifying Kahikatea trees from full-size aerial images. Four variants of state-of-the-art ViT models (DINOv2-small and DINOv2-base both with and without register tokens) are compared against a traditional CNN-based model, EfficientNetB0. Each model was fine-tuned using Kahikatea aerial imagery dataset acquired by Time-Evolving Data Science and Artificial Intelligence for Advanced Open Environmental Science (TAIAO) project and evaluated through 5 iterations. The results demonstrate that models with register tokens significantly outperform their counterparts, with DINOv2-small-register achieving the highest performance with average score of 0.94 ± 0.02 in all metrics, while the baseline EfficientNetB0 achieving score of 0.83 ± 0.04 in accuracy, precision, and F1-score and 0.84 ± 0.04 in recall across 5 iterations. These findings highlight the superiority of ViT with registers in enhancing the capability for fine-grained image classification, particularly when object regions are small and sparsely distributed across the image.
Authors:
Advendio Desandros, Alyssa Imani, Matthew Martianus Henry, Mahmud Isnan, Bens Pardamean
2025 International Conference on Computer Science and Computational Intelligence (ICCSCI)