Dynamic Multi-Scale Graph Embeddings For Cross-Modal Representation of Physical Fields
Please login to view abstract download link
Many inverse problems in computational mechanics and geophysics involve reconstructing physical fields from sparse and irregularly distributed sensors. Learning stable representations of such fields on graphs remains challenging due to well-known pathologies of deep message-passing networks, including over-smoothing and loss of expressivity under fixed-topology assumptions. Seismic source localization and magnitude estimation constitute a prototypical class of nonlinear inverse problems on irregularly sampled spatio-temporal fields, where network geometry strongly constrains inference. In this work, we propose a framework for learning dynamic multi-scale graph embeddings that represent spatio-temporal physical fields while enabling cross-modal representation learning. Node states evolve through Spatio-Temporal Graph Convolution blocks in which connectivity is adaptively updated from both geometric proximity and feature-space similarity. Intermediate embeddings are concatenated across layers to form a hierarchical multi-scale representation. This mechanism acts as an implicit regularizer against over-smoothing. A variance decomposition based on repeated K-fold cross-validation and multiple optimization runs enables us to disentangle variability due to data partitioning from that due to stochastic training. Dynamic graph construction reduces the data-induced variance, yielding more statistically consistent estimators in low-sample regimes. In practice, near-expert performance is already achieved with on the order of 1000 events, a behavior observed consistently across two seismological networks with different station densities and geometries. Magnitude errors remain below 0.15 RMSE units, and predictions remain stable under strong input perturbations (narrow-band filtering, random temporal shifts) as well as in under-represented high-magnitude regimes, highlighting robustness to both signal degradation and sample imbalance. A complementary mathematical analysis of the aggregation operator reveals an emergent spatial attention mechanism that selectively emphasizes the most informative sensors, in a manner consistent with expert analyst strategies. The resulting high-dimensional latent space supports contrastive, partially unsupervised cross-modal alignment with textual bulletins and reports, enabling bias correction and zero-shot classification of previously unseen events.
