WCCM ECCOMAS 2026

Evaluating Large Language Models as Optimizers for Engineering Design and Control

Starostina, Alexandra (Helmholtz Center Hereon)
Bali, Kartik (Helmholtz Center Hereon)
Hermann, Alexander (Technical University Hamburg)
Shojaei, Arman (Helmholtz Center Hereon)
Nair, Raman (Technical University Hamburg)
Cyron, Christian (Technical University Hamburg)
Linka, Kevin (Technical University Hamburg)
Aydin, Roland (Technical University Hamburg)

In session: MS090F - Machine Learning For Computational Mechanics Across Scales VI

Please login to view abstract download link

This study tests whether large language models (LLMs) can serve as reliable, lightweight optimizers for engineering design and control when paired with physics-based solvers. We adopt a simple propose–evaluate loop: the LLM proposes bounded candidates; an external solver returns objective and constraint values; and the accumulating interaction history guides subsequent proposals. The solver is treated as a black box, and no gradient information is required. We compare the LLM-driven approach against standard numerical baselines (e.g., L-BFGS-B with line search) under identical bounds, stopping rules, and evaluation budgets. Optimization is performed across a heterogeneous set of problems, including synthetic optimization benchmarks and representative engineering applications. Efficiency is measured by the number of solver calls, while accuracy and feasibility are summarized via medians and interquartile ranges across repeated runs with fixed random seeds. We additionally study prompt design through ablations that add or remove lightweight domain guidance—such as explicit constraint formatting, admissible parameter relations, and trend cues—and evaluate a simple hybrid strategy that initializes a conventional optimizer from the LLM’s final proposal. Across the considered problems, the LLM consistently produces feasible, high-quality solutions and often requires fewer evaluations to reach practically relevant accuracy levels, while remaining competitive in terms of final solution quality. These effects are particularly visible for noisy or non-smooth objectives, where gradient-based methods can be fragile. We discuss limitations, including stochastic variability and the absence of formal convergence guarantees, and outline an evaluation framework aimed at reproducible, budget-aware benchmarking.