An Integrated Modal Decomposition and Vision Transformer Framework for Human Heart Failure Prediction Based on Limited Echocardiography Databases
Please login to view abstract download link
Cardiovascular diseases (CVDs) represent the most formidable threat to human health, accounting for nearly 32\% of all global fatalities\cite{who1999cvds}. As clinical data volumes surge, there is an urgent need for automated diagnostic systems capable of early and precise cardiac risk assessment. This study presents a novel deep-learning architecture specifically engineered for the real-time diagnostic analysis of human echocardiography sequences, focusing on the automated identification of cardiac disease types. The framework operates via a dual-phase methodology: initially, raw echocardiography videoes are processed into a collection of annotated images for the proper training of machine learning algorithms, including using Higher Order Dynamic Mode Decomposition (HODMD)\cite{le2017higher}, a data-driven reduced order model generally used for the analysis of nonlinear dynamical systems, for data augmentation and myocardial feature extraction\cite{Groun2022higher}. The second phase implements a Vision Transformer (ViT) optimized through self-supervised learning (SSL), effectively mitigating the challenges of limited clinical datasets\cite{bell2025automatic}. Empirical results validate the superiority of this HODMD-ViT integration method, demonstrating significantly higher accuracy in identifying cardiac disease types based on human echocardiography data compared to Convolutional Neural Network (CNN) and various ViT models.
