An Ensemble Model for Fake Online Review Detection Based on Data Resampling, Feature Pruning

An Ensemble Model for Fake Online Review Detection Based on Data Resampling, Feature Pruning

ABSTRACT

With the widespread of fake online reviews, the detection of fake reviews has become a hot
research issue. Despite the efforts of existing studies on fake review detection, the issues of imbalanced
data and feature pruning still lack sufficient attention. To address these gaps, the present study proposes
an ensemble model for the detection of fake online reviews. The model consists of four steps, and the first
three steps are proposed to optimize the base classifiers: (i) Data resampling: We propose a novel way to
address the data imbalance problem by combining the resampling and the grid search technique. (ii) Feature
pruning: We propose an ablation study to drop unimportant features. (iii) Parameters optimization: We apply
the grid search algorithm to determine suitable values of the relevant parameters for each base classifier.
(iv) Classifier ensembling: We apply majority voting and stacking strategies to integrate the optimized base
classifiers. The proposed data resampling method is also applied for the meta-classifier in the stacking
ensemble model. This study produces advances in terms of combining different methods or algorithms into
a model and the results show that the proposed ensemble model outperforms some existing techniques,
thereby providing a new way to solve the data imbalance and feature pruning issues in the field of fake
review detection.