Preprocessing of Breast Cancer Images to Create Datasets for Deep CNN

Preprocessing of Breast Cancer Images to Create Datasets for Deep CNN

Abstract:

Breast cancer is the most diagnosed cancer in Australia with crude incidence rates increasing drastically from 62.8 at ages 35-39 to 271.4 at ages 50-54 (cases per 100,000 women). Various researchers have proposed methods and tools based on Machine Learning and Convolutional Neural Networks for assessing mammographic images, but these methods have produced detection and interpretation errors resulting in false-positive and false-negative cases when used in the real world. We believe that this problem can potentially be resolved by implementing effective image pre-processing techniques to create training data for Deep-CNN. Therefore, the main aim of this research is to propose effective image pre-processing methods to create datasets that can save computational time for the neural network and improve accuracy and classification rates. To do so, this research proposes methods for background removal, pectoral muscle removal, adding noise to the images, and image enhancements. Adding noise without affecting the quality of details in the images makes the input images for the neural network more representative, which may improve the performance of the neural network model when used in the real world. The proposed method for background removal is the “Rolling Ball Algorithm” and “Huang's Fuzzy Thresholding”, which succeed in removing background from 100% of the images. For pectoral muscle removal “Canny Edge Detection” and “Hough's Line Transform” are used, which removed muscle from 99.06% of the images. “Invert”, “CTI_RAS” and “ISOCONTOUR” lookup tables (LUTs) were used for image enhancements to outline the ROIs and regions within the ROIs.