SmartSHARK
Description
Researchers of various research areas (e.g., defect prediction, sentiment mining, developer social networks) analyze software projects to develop new ideas or test their assumptions by performing case studies. But to analyze a software project, two different steps need to be taken:
- collection of the project data, including, e.g., pre-processing steps and synthesis of intermediate results and
- performing the analysis on basis of this data.
Currently, the tooling for these steps is very versatile, which raises the problem that performed studies are often not replicable. Therefore, performing a meta-analysis is often not possible, but needed to create, e.g., benchmarks for approaches. Hence, we developed our platform called SmartSHARK which could help in improving the validity and replicability of software mining studies. SmartSHARK combines the two essential steps into one platform: on the one hand, it enables researchers to easily collect data from various repositories. On the other hand, the platform uses Apache Spark as analytical backend to analysz the collected data.
SmartSHARK is able to collect project-level data from:
- Version Control Systems (VCSs)
- Issue Tracking Systems (ITSs)
- Mailing Lists
Furthermore, it collects:
- Abstract Syntax Tree (AST) statistics
- Product metrics, on different levels of abstraction (e.g., class-level, method-level)
- Clone data (detection of Type-2 clones)
- Clone metrics
This data is stored in a MongoDB. Furthermore, the data is connected with each other, which makes the analysis easier. On the analysis side, Apache Spark provides us with the needed efficiency and algorithms to analyze such an amount of data.
Instances:
An instance of SmartSHARK is currently deployed and can be reached via: https://smartshark2.informatik.uni-goettingen.de
Project Details
Related Publications
-
Alexander Trautsch, Johannes Erbel, Steffen Herbold, Jens GrabowskiWhat really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes, Empirical Software Engineering, 2023
-
Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher A. Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu, Diego Marcilio, Omar Alam, Abdullah Aldaeej, Idan Amit, Burak Turhan, Simon Eismann, Anna-Katharina Wickert, Ivano Malavolta, Matúš Sulír, Fatemeh Fard, Austin Z. Henley, Stratos Kourtzanidis, Eray Tuzun, Christoph Treude, Simin Maleki Shamasbi, Ivan Pashchenko, Marvin Wyrich, James Davis, Alexander Serebrenik, Ella Albrecht, Ethem Utku Aktas, Daniel Strüber, Johannes ErbelA fine-grained data set and analysis of tangling in bug fixing commits, Empirical Software Engineering, 2022
-
Steffen Herbold, Alexander Trautsch, Fabian Trautsch, Benjamin LedelProblems with SZZ and features: An empirical study of the state of practice of defect prediction data collection, Empirical Software Engineering, 2022
-
Steffen Herbold, Alexander Trautsch, Fabian TrautschOn the Feasibility of Automated Prediction of Bug and Non-Bug Issues, Empirical Software Engineering, 2020
-
Alexander Trautsch, Steffen Herbold, Jens GrabowskiA Longitudinal Study of Static Analysis Warning Evolution and the Effects of PMD on Software Quality in Apache Open Source Projects, Empirical Software Engineering, 2020
-
Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens GrabowskiAddressing problems with replicability and validity of repository mining studies through a smart data platform, Empirical Software Engineering, Springer, 2017
-
Alexander Trautsch, Fabian Trautsch, Steffen Herbold, Benjamin Ledel, Jens GrabowskiThe SmartSHARK Ecosystem for Software Repository Mining, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, 2020
-
Alexander Trautsch, Steffen Herbold, Jens GrabowskiStatic source code metrics and static analysis warnings for fine-grained just-in-time defect prediction, 36th International Conference on Software Maintenance and Evolution (ICSME 2020), 2020
-
Steffen Herbold, Alexander Trautsch, Benjamin LedelLarge-Scale Manual Validation of Bugfixing Changes, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR 2020), 2020
-
Alexander TrautschEffects of Automated Static Analysis Tools: A Multidimensional View on Quality Evolution, Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, 2019
-
Fabian Trautsch, Jens GrabowskiAre there any Unit Tests? An Empirical Study on Unit Testing in Open Source Python Projects, Proceedings of the 10th International Conference on Software Testing, Verification and Validation (ICST 2017), 2017
-
Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens GrabowskiAddressing Problems with External Validity of Repository Mining Studies Through a Smart Data Platform, Proceedings of the 13th International Conference on Mining Software Repositories, 13th International Conference on Mining Software Repositories, 2016
-
Fabian TrautschAn Analysis of the Differences between Unit and Integration Tests, 2019