Replication package for the ICSME 2018 paper "Threats of Aggregating Software Repository Data" by Martin P. Robillard, Mathieu Nassif, Shane McIntosh.

Download the data here: (21 MB). The file is a zip archive that extracts to the directory icsme2018-data.

This data complements the replication package of Nassif and Robillard, ICSME 2017 (, page 272.

Available at:

Data for each project is contained in the directory with the name of the project. The project data directories contains the following files:


Files containing the extended properties computed for each file present in the period. Each line is a file present during the period. Each line is a comma-separated record. The file has headers that correspond to properties in Figure 1 of the paper.


An automatically-generated report that highlights key statistics for of each event analyzed in the study.


A map between files and module names. Each module name starts with a # character. All files that start with the prefix(es) below the module name get mapped to the module, in order. The prefix below #TRIM is removed from all input file names prior to matching.

License and Attribution

This data artifact is provided under the terms of the Creative Commons---Attribution 4.0 International License

If you use this data place include the following reference:

Martin P. Robillard, Mathieu Nassif, and Shane McIntosh. Threats of Aggregating Software Repository Data. In Proceedings of the 34th IEEE International Conference on Software Maintenance and Evolution, 2018.