A Multi-Task Deep Learning Framework for Segmentation of Interrelated Structures

  • Zalouk, Ahmed (Birmingham City University)
  • Ismail, Khalid (Birmingham City University)
  • Zaalouk, Mohamed (Ain Shams University)
  • Krzyzanowski, Michal (Birmingham City University)
  • Vakaj, Edlira (Birmingham City University)

Please login to view abstract download link

The accurate segmentation of complex structures in imaging data remains a central challenge in computer vision, particularly in situations where multiple related targets must be identified simultaneously [1]. This challenge is further amplified when target regions exhibit strong spatial dependency, high variability in scale and shape, and hierarchical relationships [2]. Such properties commonly appear in applications that require consistent multi-output predictions under domain-specific constraints, such as in medical imaging and autonomous driving [3,4]. In this work, we present a multitask segmentation framework based on deep learning designed for situations that require simultaneous prediction of multiple interrelated structures. The proposed approach adopts a shared encoder with multiple decoders, one dedicated to each task. This allows the model to specialise in each output while benefiting from shared feature representations. To improve information flow across different resolution scales, our framework employs enhanced feature fusion at every skip connection between the shared encoder and the task-specific decoders to enhance representation learning, which allows the model to perform task-specific specialisation at the output level. The framework is particularly suited for scenarios in which one target structure is embedded within another, necessitating coordinated and anatomically coherent segmentation outputs. The methodology has been evaluated on two-dimensional imaging data from a medical imaging application. The proposed approach has demonstrated the ability to handle related complex structures while supporting multi-output segmentation. Although motivated by a specific application domain, our framework is generalisable and can be applied to a wide range of computer vision problems involving interrelated multitask segmentation.