Polymer Inverse Design in Extreme Low-Data Regimes via Graph-Grammar Machine Learning
Please login to view abstract download link
Traditional polymer discovery relies on trial-and-error experimentation, which is time-consuming and inefficient. Although machine learning (ML) approaches have accelerated materials discovery, state-of-the-art methods remain highly data-intensive, typically requiring training datasets on the order of 10^4–10^8 samples—an amount often impractical to obtain through experiments or simulations. We address a fundamental challenge: how can polymer molecules be reliably designed with target physical properties in extreme low-data regimes (e.g., fewer than 100 data points)? We introduce the Matrix-based Chemistry-Constrained Graph Grammar (MCG) framework, a zero-shot, non-parametric inverse-design approach with no pretraining or fine-tuning. MCG operates on graph-grammar–based representations of polymer molecules [1], which are inherently data-efficient for ML-driven molecular design. In contrast to existing graph-grammar approaches [1] that often fail to enforce essential chemical constraints—resulting in unrealistic structures such as broken rings or missing functional groups—MCG explicitly integrates domain-specific chemical knowledge into grammar extraction and molecule generation. This integration enables chemical validity, class specificity, and synthesizability by construction. When coupled with Monte Carlo Tree Search and a lightweight property predictor trained on minimal data, MCG enables conditional inverse design of polymers with targeted properties. We benchmark MCG across diverse polymer classes, including acrylates, self-healing vitrimers, high-temperature polyimides, and covalent organic frameworks. MCG accurately designs polymers with targeted properties using as few as 100 data points, without any pretraining on large external datasets. The framework further achieves substantially improved synthesizability scores, high structural diversity, superior data efficiency, and faster computational performance compared to state-of-the-art methods. Selected MCG-designed polymers are currently being pursued for synthesis, with preliminary assessments by chemists indicating experimental feasibility. Together, these results demonstrate that MCG enables chemically valid, property-targeted, and experimentally actionable polymer inverse design in extreme low-data regimes.
