WCCM ECCOMAS 2026

Topology-Aware Multi-Agent Reinforcement Learning for Scalable Infrastructure Management via Zero-Shot Graph Transfer

Arcieri, Giacomo (ETH Zurich)
Duthé, Gregory (ETH Zurich)
Kalathiparambil Kennedy, Lidiya (ETH Zurich)
Boggia, Luca (ETH Zurich)
Papakonstantinou, Konstantinos G (The Pennsylvania State University)
Straub, Daniel (Technical University of Munich)
Chatzi, Eleni (ETH Zurich)

In session: MS139B - Optimization under Uncertainty II

Please login to view abstract download link

Managing large-scale infrastructure systems, such as railway networks, wind farms, and traffic grids, presents a complex sequential decision-making challenge. The interconnected nature of these systems introduces critical dependencies, where local actions propagate through the network topology, influencing global performance and costs. While Multi-Agent Reinforcement Learning (MARL) offers a favorable framework for optimizing such decentralized systems, standard approaches suffer from the curse of dimensionality, rendering training computationally intractable for real-world networks comprising thousands of components. To address this bottleneck, we propose a scalable framework based on topology-aware MARL agents. By leveraging graph neural networks and graph transformers, our agents learn policies conditioned on local topological features rather than fixed agent identities. By design, this architectural choice unlocks a powerful capability: zero-shot transfer learning. We demonstrate that agents trained on small, computationally manageable graph patches can be deployed directly onto large-scale, unseen networks without retraining. This work benchmarks the proposed approach across two distinct infrastructure management domains: (i) railway maintenance planning, utilizing real-world data to optimize repairs under spatial deterioration correlations and economies of scale; and (ii) offshore wind farm maintenance, optimizing intervention logistics under aerodynamic wake effects. Results indicate that topology-aware agents significantly outperform decentralized heuristics and standard MARL baselines, successfully coordinating actions to exploit system-level efficiencies while reducing training time.