AI-Driven Density-Based Clustering for Large-Scale GPS Data Analysis

  • Chehoudi, Eya (UQAT)
  • Ahabchane, Chahid (UQAT)
  • Mrad, Hatem (UQAT)

Please login to view abstract download link

The huge amount of Global Positioning System (GPS) data is an important issue for scientific machine learning operations in the age of high-frequency spatio-temporal sensing. These datasets have never been better for modeling cities, raw trajectories are often corrupted by sensor noise, spatial drift, and excessive redundancy. These artifacts require sophisticated data and dimensionality reduction techniques to convert raw measurements into organized, high-fidelity representations appropriate for subsequent computer analysis. This paper provides a data-driven framework that uses density-based clustering as an unsupervised learning method to identify and leverage low-dimensional features in large-scale trajectory datasets. Density-based methods, on the other hand, can easily handle non-convex manifolds and different sample densities, which are common in real-world mobility. This lets them find important spatial features while getting rid of random noise. Our method is a strong data compression pipeline that turns dense point clouds into representational topological structures. The following efficiently reduces the number of dimensions in the data without losing any important physical or spatial information. By preserving these structural features, the framework enables more computationally efficient and interpretable workflows for trajectory reconstruction and operational behavior analysis. Our approach demonstrates that unsupervised learning is an important part of preprocessing and reducing data to aligns with the demand for scalable, feature-preserving methods in modern computer research and engineering.In addition, the proposed framework is designed to be generic and adaptable to various large-scale spatio-temporal datasets beyond mobility analysis, including environmental sensing, urban monitoring, and infrastructure analytics. The methodology can be integrated into existing data processing pipelines without requiring prior knowledge about data distributions. This flexibility makes the approach suitable for real-world applications where data characteristics may vary significantly across time and space. By improving data quality, the framework contributes to more reliable downstream learning, modeling, and decision-support systems in data-intensive scientific and engineering domains.