A Comprehensive Analysis of Machine Learning Approaches With Fingerprint Amalgamation and Data Balan

A Comprehensive Analysis of Machine Learning Approaches With Fingerprint Amalgamation and Data Balan

Abstract:

Computational drug repurposing is an efficient method to utilize existing knowledge for understanding and predicting their effect on neurological diseases. The ability of a molecule to cross the blood-brain barrier is a primary criteria for effective therapy. Thus, accurate predictions by employing Machine learning models can effectively identify the drug candidates that could be repurposed for neurological conditions. This study comprehensively analyzes the performance of the well-known machine learning models on two different datasets to overcome dataset-related biases. We found that random forest and extratrees (i.e., tree-based ensembled models) have the highest accuracy with mol2vec fingerprint for BBB permeability prediction, attaining AUC_ROC of 0.9453 and 0.9601 on BBB and B3DB dataset, respectively. Additionally, we have analyzed the impact of the data balancing technique (i.e., SMOTE) to improve the specificity of the models. Finally, we have explored the impact of different fingerprint combinations on accuracy. By employing SMOTE and fingerprint combination, SVC attains the highest AUC_ROC of 0.9511 on BBB dataset. Finally, we used the best-performing models of the B3DB dataset to evaluate the BBB permeability for drugs intended to be used for repurposing. Model validation for repurposing predicted the non-passage for most antihypertensive drugs and passage for CYP17A1 cancer drugs.