A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction

A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction

admin

admin

Feb 3, 2024 - 17:33

0 303

Abstract:

Software Defect Prediction (SDP) is one of the most vital and cost-efficient operations to ensure the software quality. However, there exists the phenomenon of class overlap in the SDP datasets (i.e., defective and non-defective modules are similar in terms of values of metrics), which hinders the performance as well as the use of SDP models. Even though efforts have been made to investigate the impact of removing overlapping technique on the performance of SDP, many open issues are still challenging yet unknown. Therefore, we conduct an empirical study to comprehensively investigate the impact of class overlap on SDP. Specifically, we first propose an overlapping instances identification approach by analyzing the class distribution in the local neighborhood of a given instance. We then investigate the impact of class overlap and two common overlapping instance handling techniques on the performance and the interpretation of seven representative SDP models. Through an extensive case study on 230 diversity datasets, we observe that: i) 70.0% of SDP datasets contain overlapping instances; ii) different levels of class overlap have different impacts on the performance of SDP models; iii) class overlap affects the rank of the important feature list of SDP models, particularly the feature lists at the top 2 and top 3 ranks; IV) Class overlap handling techniques could statistically significantly improve the performance of SDP models trained on datasets with over 12.5% overlap ratios. We suggest that future work should apply our KNN method to identify the overlap ratios of datasets before building SDP models.

Click Here To See More

Tags:

Previous Article

A Comparison of Deep Reinforcement Learning Models for Isolated Traffic Signal C...

3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth a...

What's Your Reaction?

1

Like

1

Dislike

1

Love

1

Funny

1

Angry

1

Sad

1

Wow

Related Posts

Reinforcement Learning for Edge Device Selection Using ...

admin Jan 19, 2024 0 20

Feature Aggregation via Attention Mechanism for Visible...

admin Jan 31, 2024 0 17

Electric Vehicle Lithium ion Battery Ageing Analysis un...

admin Feb 1, 2024 0 14

PrintsGAN Synthetic Fingerprint Generator

admin Feb 1, 2024 0 14

Intrusion Dection System in Python

admin Apr 21, 2021 0 21

Optimized Active Power Dispatching of Wind Farms Consid...

admin Jan 30, 2024 0 13

Comments