Proposal of data augmentation method for small-sample quantitative data
Please login to view abstract download link
Deep learning’s reliance on big data often leads to overfitting in small-sample contexts. While image augmentation is well-established, applying techniques like SMOTE[1] to quantitative data remains problematic due to the risk of introducing noise. This study proposes a parametric data augmentation methodology to enhance generalization by preserving the underlying statistical structure of small-sample datasets. The procedure begins with Min-Max normalization to scale feature to a [0,1] range. We then estimate the class-specific mean and variance for each feature, assuming a Gaussian distribution. By sampling from these parameters, the method generates synthetic data that inherits the original dataset’s intrinsic structure, enabling robust training even with limited samples. Evaluations using the Iris dataset[2] and a medical rat feeding dataset[3] (Raman spectra regression) show that our method significantly outperformed simple oversampling and baseline models. By expanding the representational range while preserving the distribution, the proposed approach achieved superior predictive accuracy. These results demonstrate the method’s effectiveness in data-scarce domains where deep learning was previously challenging to implement.
