[P33] Adaptive Vision Transformers for Enhanced Semantic Segmentation in River Landscape Monitoring and Disaster Management
Affiliation | Taiwan Tech |
---|---|
Author | Sahoo Jyostnamayee |
Co-Author | Hao-Yung Chan(Taiwan Tech) Meng-Han Tsai(Taiwan Tech) |
Keywords
- Vision Transformers
- Drone Imagery
- Riverscape Monitoring
Outline
Effective disaster management in riverscape ecosystems is crucial for sustainable development and requires innovative approaches to monitor and mitigate risks. Public engagement through volunteered geographic information (VGI) fosters disaster resilience and raises awareness of river ecosystems, supporting government efforts to create actionable insights. High-resolution drone imagery enhances these initiatives by providing detailed visual data of river landscapes, helping identify critical landmarks such as humans, vehicles, buildings, vegetation, and water bodies. However, the complexity of river landscapes in drone images presents challenges, as these environments often feature overlapping elements, varying scales, and dynamic conditions. Traditional segmentation techniques struggle to balance local details with the broader context. Vision Transformers (ViTs) have emerged as a solution to these limitations. By adopting the self-attention mechanism from natural language processing, ViTs treat images as sequences of patches, capturing both local and global dependencies. The adaptive nature of Vision Transformers enables them to adjust to the complex, multi-scale, and dynamic features of river landscapes and ensures effective segmentation across varied conditions. This approach has the potential to revolutionize disaster management by enhancing multi-class segmentation for precise identification of critical landmarks, which facilitates faster, more efficient resource allocation and decision-making in dynamic river environments.