Automated Machine Learning Driven Stacked Ensemble Modeling for Forest Aboveground Biomass Predictio

Automated Machine Learning Driven Stacked Ensemble Modeling for Forest Aboveground Biomass Predictio

Abstract:

Modeling and large-scale mapping of forest aboveground biomass (AGB) is a complicated, challenging, and expensive task. There are considerable variations in forest characteristics that create functional disparity for different models and needs comprehensive evaluation. Moreover, the human-bias involved in the process of modeling and evaluation affects the generalization of models at larger scales. In this article, we present an automated machine learning framework for modeling, evaluation, and stacking of multiple base models for AGB prediction. We incorporate a hyperparameter optimization procedure for automatic extraction of targeted features from multitemporal Sentinel-2 data that minimizes human-bias in the proposed modeling pipeline. We integrate the two independent frameworks for automatic feature extraction and automatic model ensembling and evaluation. The results suggest that the extracted target-oriented features have an excessive contribution of red-edge and short-wave infrared spectrum. The feature importance scale indicates a dominant role of summer-based features as compared to other seasons. The automated ensembling and evaluation framework produced a stacked ensemble of base models that outperformed individual base models in accurately predicting forest AGB. The stacked ensemble model delivered the best scores of R 2 cv = 0.71 and RMSE = 74.44 Mgha −1 . The other base models delivered R 2 cv and RMSE ranging between 0.38–0.66 and 81.27–109.44 Mg ha −1 , respectively. The model evaluation metrics indicated that the stacked ensemble model was more resistant to outliers and achieved a better generalization. Thus, the proposed study demonstrated an effective automated modeling pipeline for predicting AGB by minimizing human-bias and deployable over large and diverse forest areas.