DEFECTS - Comparable and Externally Valid Software Defect Prediction
Description
The comparability and reproducibility of empirical software engineering research is, for the most part, an open problem. This statement holds true for the field of software defect prediction. Current research shows that this leads to actual problems regarding the external validity of defect prediction research. Multiple replications conducted by different groups of researchers led to different findings than prior research. Moreover, problems with the currently used data sets were discovered and it was demonstrated that these problems may change conclusions. Thus, defect prediction research faces a replication crisis if these problems are ignored. Within this project, we plan to create a solid foundation for comparable and externally valid defect prediction research. Our approach rests on three pillars. The first pillar is the quality of the data we use for defect prediction experiments. The current studies on data quality do not cover the impact of mislabeled data. This kind of noise affects not only the creation of defect prediction models, but also their evaluation. We will statistically evaluate the noise in current data sets. Based on our findings, we will improve the state of the art of defect labeling and generate large data set with less noise. The quality of our data will be statistically validated. The collected body will be larger than the available defect prediction data sets and thereby facilitate a better generalizability and external validity of results. The second pillar is the replication of the current state of the art. Since prior replications were already contradictory to the original experiments, we believe that a broader replication effort is necessary. Current replications consider only parts of the state of the art, e.g., classifier impact or cross-project defect prediction. Most of the state of the art still was never replicated and diligently compared to other approaches or naïve baselines. Most experiments only used small data sets, which is a key factor for the problems with external validity. We will conduct a conceptual replication of the state of the art of defect prediction. Through this, we will improve the external validity of the defect prediction state of the art and lay the groundwork for a better external validity of future work. The third pillar are guidelines for defect prediction research. In case we cannot get researchers to avoid anti-patterns that led to bad validity of results, our efforts to combat the replication crisis of defect prediction research will only have a short-term effect. To make our results sustainable, we will work together with the defect prediction community to define guidelines that allow researchers to conduct their defect prediction experiments in such a way that we hopefully never face such problems with replicability again.
Project Details
Project Staff: Steffen Herbold, Alexander Trautsch
Funding Organizations:
Deutsche Forschungsgemeinschaft (DFG)
Related Publications
-
Steffen Herbold, Alexander Trautsch, Fabian TrautschOn the Feasibility of Automated Prediction of Bug and Non-Bug Issues, Empirical Software Engineering, 2020
-
Alexander Trautsch, Steffen Herbold, Jens GrabowskiA Longitudinal Study of Static Analysis Warning Evolution and the Effects of PMD on Software Quality in Apache Open Source Projects, Empirical Software Engineering, 2020
-
Steffen HerboldOn the cost and profit of software defect prediction, IEEE Transactions on Software Engineering, 2019
-
Steffen HerboldExploring the relationship between performance metrics and cost saving potential of defect prediction models, International Conference on Mining Software Repositories (MSR) - Registered Reports Track, 2021
-
Alexander Trautsch, Fabian Trautsch, Steffen Herbold, Benjamin Ledel, Jens GrabowskiThe SmartSHARK Ecosystem for Software Repository Mining, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, 2020
-
Steffen HerboldWith Registered Reports Towards Large Scale Data Curation, Proceeding of the 42nd International Conference on Software Engineering (ICSE) - NIER Track, 2020
-
Steffen Herbold, Alexander Trautsch, Benjamin LedelLarge-Scale Manual Validation of Bugfixing Changes, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR 2020), 2020
-
Alexander Trautsch, Steffen Herbold, Jens GrabowskiStatic source code metrics and static analysis warnings for fine-grained just-in-time defect prediction, 36th International Conference on Software Maintenance and Evolution (ICSME 2020), 2020
-
Alexander TrautschEffects of Automated Static Analysis Tools: A Multidimensional View on Quality Evolution, Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, 2019
2024 © Software Engineering For Distributed Systems Group