Differential Gene Expression Prediction by Ensemble Deep Networks on Histone Modification Data

Differential Gene Expression Prediction by Ensemble Deep Networks on Histone Modification Data

Abstract:

Predicting differential gene expression (DGE) from Histone modifications (HM) signal is crucial to understand how HM controls cell functional heterogeneity through influencing differential gene regulation. Most existing prediction methods use fixed-length bins to represent HM signals and transmit these bins into a single machine learning model to predict differential expression genes of single cell type or cell type pair. However, the inappropriate bin length may cause the splitting of the important HM segment and lead to information loss. Furthermore, the bias of single learning model may limit the prediction accuracy. Considering these problems, in this paper, we proposes an En semble deep neural networks framework for predicting D ifferential G ene E xpression (EnDGE). EnDGE employs different feature extractors on input HM signal data with different bin lengths and fuses the feature vectors for DGE prediction. Ensemble multiple learning models with different HM signal cutting strategies helps to keep the integrity and consistency of genetic information in each signal segment, and offset the bias of individual models. Besides the popular feature extractors, we also propose a new Residual Network based model with higher prediction accuracy to increase the diversity of feature extractors. Experiments on the real datasets from the Roadmap Epigenome Project (REMC) show that for all cell type pairs, EnDGE significantly outperforms the state-of-the-art baselines for differential gene expression prediction.