Sparse Machine Learning for Predicting Dry Matter Content in Food Hyperspectral Data

Richness of hyperspectral dataset that contains spectral and spatial information of the observed samples often comes with the challenges due to high dimensionality and noise. Sparse machine learning techniques address these issues by preserving only the informative spectral bands to reduce model complexity and to improve generalization. This study evaluated two sparse regression methods: Ensemble Canonical Correlation Analysis (EnCCA) regression and Least Angle Regression (LARS), for predicting Dry Matter Content (DMC) from hyperspectral data acquired from SpectroFood’s Leek and Mushroom datasets. To run the experiment, both datasets were split into 80% train and 20% test. Models’ performance was evaluated on the test set using the coefficient of determination (R²) and Prediction Interval Coverage Probability (PICP) to measure accuracy and robustness under uncertainty. Compared to LARS, EnCCA regression achieved higher R² values with acceptable PICP on the Leek dataset (R 2 : 0.87, PICP: 69.97%) and achieved higher in both metrics on Mushroom dataset (R 2 : 0.71, PICP: 86.00%). Despite giving decent PICP scores, LARS failed to surpass EnCCA regression’s R 2 on both datasets. However, EnCCA selected a larger feature space than LARS, which led to a slightly less interpretable and heavier regression model. These findings highlight EnCCA’s potential application in food the industry, specifically as a tool to automate quality control process and chemical analysis using hyperspectral imaging techniques.
Authors:
Advendio Desandros, Matthew Martianus Henry, Alyssa Imani, Muhammad Rezki Rasyak, Mahmud Isnan, Bens Pardamean
2025 International Conference on Cybernetics and Intelligent Systems (ICORIS)