Building Minimal Classification Rules for Breast Cancer Diagnosis in Python

Building Minimal Classification Rules for Breast Cancer Diagnosis in Python

Abstract:

A rule based classifier is widely applied in breast cancer diagnosis. The classifier with a good performance of disease classification have been developed and highly required over the past decades. Since classification rules are derived from previous diagnosis with a large amount of features, it challenges to build a minimal number of rules with high performance while retaining all diagnosis information. The Principal Component Analysis (PCA) is known as a lossless data reduction technique with good classification performance. Therefore, this paper aims at finding the best performance classifier giving minimal classification rules by employing PCA. Based on experiment result on Wisconsin Breast Cancer data set, the J48 decision tree classifier is found to be the best among the three classifiers: J48 decision tree, Reduced Error Pruning Tree, and Random Tree.