Hyperbolic State-Space Models for Hierarchical Representation Learning
Please login to view abstract download link
Selective state-space models excel at long-sequence modeling, but their capacity for language representation– in complex hierarchical reasoning– remains underexplored. Most large language models rely on flat Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this, we propose Hierarchical Mamba (HiM), integrating efficient Mamba2 with hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincaré ball or Lorentzian manifold with “learnable” curvature, optimized with a hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. Experimental results show both HiM variants effectively capture hierarchical relationships across four linguistic and medical datasets, surpassing Euclidean baselines, with HiM-Poincaré providing fine-grained distinctions with higher h-norms, while HiM-Lorentz offers more stable, compact, and hierarchy-preserving embeddings-favoring robustness (Our code is publicly available at https://github.com/BerryByte/HiM). By unifying selective state-space modeling with hyperbolic representation learning, HiM offers a scalable paradigm for hierarchy-aware language modeling, with implications for hierachical reasoning in scientific text, biomedical knowledge discovery, and next-generation long-context LLMs.
