SmartSHARK

Description

Researchers of various research areas (e.g., defect prediction, sentiment mining, developer social networks) analyze software projects to develop new ideas or test their assumptions by performing case studies. But to analyze a software project, two different steps need to be taken:

collection of the project data, including, e.g., pre-processing steps and synthesis of intermediate results and
performing the analysis on basis of this data.

Currently, the tooling for these steps is very versatile, which raises the problem that performed studies are often not replicable. Therefore, performing a meta-analysis is often not possible, but needed to create, e.g., benchmarks for approaches. Hence, we developed our platform called SmartSHARK which could help in improving the validity and replicability of software mining studies. SmartSHARK combines the two essential steps into one platform: on the one hand, it enables researchers to easily collect data from various repositories. On the other hand, the platform uses Apache Spark as analytical backend to analysz the collected data.
SmartSHARK is able to collect project-level data from:

Version Control Systems (VCSs)
Issue Tracking Systems (ITSs)
Mailing Lists

Furthermore, it collects:

Abstract Syntax Tree (AST) statistics
Product metrics, on different levels of abstraction (e.g., class-level, method-level)
Clone data (detection of Type-2 clones)
Clone metrics

This data is stored in a MongoDB. Furthermore, the data is connected with each other, which makes the analysis easier. On the analysis side, Apache Spark provides us with the needed efficiency and algorithms to analyze such an amount of data.
Instances:
An instance of SmartSHARK is currently deployed and can be reached via: https://smartshark2.informatik.uni-goettingen.de

Project Details

Project Staff: Fabian Trautsch, Steffen Herbold, Alexander Trautsch

Project Website:

https://smartshark2.informatik.uni-goettingen.de

Related Publications

Alexander Trautsch, Johannes Erbel, Steffen Herbold, Jens Grabowski
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes, Empirical Software Engineering, 2023
Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher A. Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu, Diego Marcilio, Omar Alam, Abdullah Aldaeej, Idan Amit, Burak Turhan, Simon Eismann, Anna-Katharina Wickert, Ivano Malavolta, Matúš Sulír, Fatemeh Fard, Austin Z. Henley, Stratos Kourtzanidis, Eray Tuzun, Christoph Treude, Simin Maleki Shamasbi, Ivan Pashchenko, Marvin Wyrich, James Davis, Alexander Serebrenik, Ella Albrecht, Ethem Utku Aktas, Daniel Strüber, Johannes Erbel
A fine-grained data set and analysis of tangling in bug fixing commits, Empirical Software Engineering, 2022
Steffen Herbold, Alexander Trautsch, Fabian Trautsch, Benjamin Ledel
Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection, Empirical Software Engineering, 2022
Steffen Herbold, Alexander Trautsch, Fabian Trautsch
On the Feasibility of Automated Prediction of Bug and Non-Bug Issues, Empirical Software Engineering, 2020
Alexander Trautsch, Steffen Herbold, Jens Grabowski
A Longitudinal Study of Static Analysis Warning Evolution and the Effects of PMD on Software Quality in Apache Open Source Projects, Empirical Software Engineering, 2020
Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens Grabowski
Addressing problems with replicability and validity of repository mining studies through a smart data platform, Empirical Software Engineering, Springer, 2017
Alexander Trautsch, Fabian Trautsch, Steffen Herbold, Benjamin Ledel, Jens Grabowski
The SmartSHARK Ecosystem for Software Repository Mining, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, 2020
Steffen Herbold, Alexander Trautsch, Benjamin Ledel
Large-Scale Manual Validation of Bugfixing Changes, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR 2020), 2020
Alexander Trautsch, Steffen Herbold, Jens Grabowski
Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction, 36th International Conference on Software Maintenance and Evolution (ICSME 2020), 2020
Alexander Trautsch
Effects of Automated Static Analysis Tools: A Multidimensional View on Quality Evolution, Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, 2019
Fabian Trautsch, Jens Grabowski
Are there any Unit Tests? An Empirical Study on Unit Testing in Open Source Python Projects, Proceedings of the 10th International Conference on Software Testing, Verification and Validation (ICST 2017), 2017
Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens Grabowski
Addressing Problems with External Validity of Repository Mining Studies Through a Smart Data Platform, Proceedings of the 13th International Conference on Mining Software Repositories, 13th International Conference on Mining Software Repositories, 2016
Fabian Trautsch
An Analysis of the Differences between Unit and Integration Tests, 2019

Search form

SmartSHARK

Description

Project Details

Related Publications

Main menu 2

Ongoing Research Projects

Past Research Projects

Search form

SmartSHARK

Description

Project Details

Related Projects:

Related Publications

Main menu 2

Ongoing Research Projects

Past Research Projects