Abstract:
The popularity of the Android platform in smartphones and other Internet-of-Things devices has resulted in the explosive of malware attacks against it. Malware presents a serious threat to the security of devices and the services they provided, e.g. stealing the privacy sensitive data stored in mobile devices. This work raises a stacking ensemble framework SEDMDroid to identify Android malware. Specifically, to ensure individual's diversity, it adopts random feature subspaces and bootstrapping samples techniques to generate subset, and runs Principal Component Analysis (PCA) on each subset. The accuracy is probed by keeping all the principal components and using the whole dataset to train each base learner Multi-Layer Perception (MLP). Then, Support Vector Machine (SVM) is employed as the fusion classifier to learn the implicit supplementary information from the output of the ensemble members and yield the final prediction result. We show experimental results on two separate datasets collected by static analysis way to prove the effectiveness of the SEDMDroid. The first one extracts permission, sensitive API, monitoring system event and so on that are widely used in Android malwares as the features, and SEDMDroid achieves 89.07% accuracy in term of these multi-level static features. The second one, a public big dataset, extracts the sensitive data flow information as the features, and the average accuracy is 94.92%. Promising experiment results reveal that the proposed method is an effective way to identify Android malware.