A Framework for Sketch-Based Generation and Validation of Multibody Models Using LLMs

  • Pieber, Michael (University of Innsbruck)
  • Manzl, Peter (University of Innsbruck)
  • Möltner, Tobias (University of Innsbruck)
  • Weyrer, Sebastian (University of Innsbruck)
  • Humer, Alexander (Johannes Kepler University Linz)
  • Gerstmayr, Johannes (University of Innsbruck)

Please login to view abstract download link

Large Language Models (LLMs) have enabled new applications beyond conversational AI, including the generation of simulation models. While LLMs can generally generate code from sketches or descriptions, physically correct validation for multibody models remains an open challenge. The novelty of this work lies in a validation-driven, multi-stage framework that enables the reliable generation and assessment of physically correct multibody models from sketches. This process is facilitated by the use of parameterized reference systems. To this end, the proposed pipeline employs a Vision-Language Model (VLM) to parse sketch images and first identify candidate mechanical elements. In a second step, the VLM combines the sketch input with the detected items to construct a structured Intermediate Representation (IR) that captures physically relevant parameters, and interconnections. Subsequently, code generation proceeds in two further steps using a coding-specialized LLM. First, the IR is used to systematically select the required EXUDYN simulation elements (e.g., point masses, springs, joints, gravity) from a predefined library with associated documentation and usage examples. This differs from standard Retrieval-Augmented Generation (RAG) as the context is predefined for each element. Second, the IR and the retrieved simulation elements are translated into executable Python code for the EXUDYN multibody simulation framework. Validation encompasses both executability and physical correctness. Generated models are executed and compared numerically against ground-truth reference implementations. Additionally, sketches can be rendered from generated models, enabling visual verification of initial configurations through the same VLM. To enable systematic evaluation, we automatically generate thousands of sketch parametrized and image transformed variants from parametrized ground-truth models. All experiments are conducted using open-weight models on local hardware, enabling reproducible, scalable generation of validated multibody datasets. Initial results demonstrate 100% executability on simple multibody problems, with future work targeting video-based verification for assessing dynamic behavior.