An Integrated Modal Decomposition and Vision Transformer Framework for Human Heart Failure Prediction Based on Limited Echocardiography Databases

  • Zhao, Zhuoqun (Universidad Politécnica de Madrid)
  • Bell-Navas, Andrés (Universidad Politécnica de Madrid)
  • Garicano-Mena, Jesús (Universidad Politécnica de Madrid)
  • Le Clainche, Soledad (Universidad Politécnica de Madrid)

Please login to view abstract download link

Cardiovascular diseases (CVDs) represent the most formidable threat to human health, accounting for nearly 32\% of all global fatalities\cite{who1999cvds}. As clinical data volumes surge, there is an urgent need for automated diagnostic systems capable of early and precise cardiac risk assessment. This study presents a novel deep-learning architecture specifically engineered for the real-time diagnostic analysis of human echocardiography sequences, focusing on the automated identification of cardiac disease types. The framework operates via a dual-phase methodology: initially, raw echocardiography videoes are processed into a collection of annotated images for the proper training of machine learning algorithms, including using Higher Order Dynamic Mode Decomposition (HODMD)\cite{le2017higher}, a data-driven reduced order model generally used for the analysis of nonlinear dynamical systems, for data augmentation and myocardial feature extraction\cite{Groun2022higher}. The second phase implements a Vision Transformer (ViT) optimized through self-supervised learning (SSL), effectively mitigating the challenges of limited clinical datasets\cite{bell2025automatic}. Empirical results validate the superiority of this HODMD-ViT integration method, demonstrating significantly higher accuracy in identifying cardiac disease types based on human echocardiography data compared to Convolutional Neural Network (CNN) and various ViT models.