School of Computer Science

Non-Essential Changes in Version Histories

Companion Website


[July 26th, 2011] Clarified the README for the experimental data. Download data with new README here.
[July 26th, 2011] For bug fixes and enhancements, feel free to play with the latest stable code snapshot.
[March 26th, 2011] Release of data, tool, and viewer code (download code here)


Numerous techniques involve mining change data captured in software archives to assist engineering efforts. We observed that important changes to software artifacts are sometimes accompanied by numerous non-essential modifications, such as local variable refactorings or textual differences induced as part of a rename refactoring. We developed a tool-supported technique (called DiffCat) for detecting non-esential differences in the revision histories of software systems, and used our technique to investigate code changes in over 24 000 change sets gathered from the change histories of seven long-lived open-source systems. The details of our technique, as well as the observations supported by our investigation, were accepted for publication in the 33rd ACM/IEEE International Conference on Software Engineering.

This website allows readers to download DiffCat - our prototype Eclipse implementation that we used to scan change histories (CVS/SVN), to detect fine-grained structural differences in change sets from those change histories, and to identify which of those structural differences were non-essential. This website also allows readers to download our full experimental data package.


Our current implementation of DiffCat is a proof-of-concept prototype, rather than a fully-reusable API. We hence release DiffCat as a suite of (open-source) Eclipse plugin projects, rather than a reusable 3rd party API. These projects can be downloaded here and consist of four plugin projects: The first two projects make up DiffCat's diffing component. The last two projects make up DiffCat's experimental (and optional) Eclipse viewer. You can import either just the first two projects or all four. The code is distributed under the Eclipse EPL, version 1, except for the rebundled project (inside DiffCompare), which is released as-is under whatever license you find in the code.

DiffCat is implemented as an extension to the SemDiff repository analysis framework. SemDiff facilitates the retrieval of repository information and viewing the results of a diffing analysis. To use DiffCat, you'll need to run it via SemDiff (see below).

If you'd like to use DiffCat programmatically, you may download the latest snapshot (July 26th, 2011). This snapshot lets you use DiffCat's diffing service from your own Eclipse plugins without setting up any SemDiff repositories/DBs (you'll still need to install it though to resolve dependencies). To use DiffCat programmatically, please follow the instructions below.

DiffCat Installation Prerequisites

DiffCat is built on two existing research prototypes: SemDiff and ChangeDistiller. An installation of DiffCat will also require a prior installation of these two projects. We outline the requisite steps for a full installation below.

System requirements: Eclipse 3.6, Java 1.6, SemDiff, and ChangeDistiller. DiffCat has been tested on Linux, but I've managed to run it on Windows as well. I've restricted my testing of all DiffCat components to the Eclipse "RCP Development" release. Although this shouldn't be a problem for the main DiffCat components (diffcat + util), the viewer component (view + DiffCompare) might break on other Eclipse releases.

Installing SemDiff: Full instructions can be found here. Please make sure to install the latest version (2.3.1). SemDiff's licensing information can be found on its website. For the absolute best performance, I suggest that, after installing SemDiff, you overwrite its PPA distro with the latest PPA release from PPA's update site "" . I've found numerous PPA bugs in the past few months, and most have been fixed, but the fixes have not yet been released with SemDiff.

Installing ChangeDistiller:

  1. Register an account with the software evolution and architecture lab (s.e.a.l.) at the University of Zurich.
  2. Register the following site with the Eclipse update mechanism: You will need to enter your s.e.a.l. credentials.
  3. Select and install the "ChangeDistiller" and "Evolizer Core" projects and restart Eclipse.
ChangeDistiller's licensing information can be found on its website.

Installing DiffCat

To install DiffCat, just import the desired combination of plugin projects (see above) into your Eclipse workspace.

Running DiffCat via SemDiff

To try out DiffCat, follow these steps
  1. Launch project ca.mcgill.cs.swevo.diffcat as an Eclipse application by right clicking on the project and selecting Run As ... | Eclipse Application. This will open up a new Eclipse test environment through which DiffCat can be used. The rest of the instructions should be carried out within this test environment.
  2. If you've already created a SemDiff database, you'll need to refresh that database every time you re-install DiffCat. This allows SemDiff to handle the results of your newly installed DiffCat instance. To do this, go to SemDiff | Update Database and re-enter the information for the database you'll be working with. Otherwise, if this is your first install, you don't need to refresh anything.
  3. Setup a SVN/CVS repository and select change sets via SemDiff's UI (SemDiff | Run Detectors ...).
  4. After entering the change set range, click Next and check the DiffCat detector.
  5. Launch the analysis.
  6. The results can be viewed in SemDiff's Transaction View or in our (experimental) viewer, as we describe below.

Viewing the Results

To view the output of DiffCat's results, I recommend you use our experimental DiffCat viewer. To open and use this viewer, you'll have to have the ca.mcgill.cs.swevo.diffcat.view and DiffCompare projects in your workspace. Then follow the steps below:
  1. To open the view, select Window | Show View | Other ... | DiffCat View.
  2. Use the arrow buttons in the top right corner to navigate to the change sets that have been processed by DiffCat.
  3. If results are present, use the view to open up the files and methods that were found to have been modified.
  4. Double click on any of the diffs. The viewer will use the Eclipse compare view (somewhat crazily) to show the diff as best as possible.

Programming against DiffCat

To program against DiffCat, make sure you have installed the latest code snapshot. Then, start an Eclipse plugin project and declare the following dependencies in your manifest file: You'll probably have to resolve some other dependencies. For these, look into DiffCat's manifest file to find the requisite plugins (it is straightforward). Once you've resolved these dependencies, refer to DiffCat's ca.mcgill.cs.swevo.diffcat.MainController class and use its findStructuralDiffs method. This method requires you to specify the files you'd like to diff using (type-resolved) org.eclipse.jdt.core.CompilationUnit instances. Each map must associate a file path with each file, so that identical CompilationUnit instances can be properly distinguished during diffing. DiffCat treats file versions with identical file paths (including the name of the file) as versions of the same file. The rest will be treated as either file insertions/deletions or class renames, depending on the similarity between unmatched files.

DiffCat's diff model is straightforward. Each DiffCatResult instance returned by the MainController embodies one fine-grained structural difference, as would be returned by ChangeDistiller. There's a bunch of self-explanatory getter methods to access various components of each diff, e.g., the change type, the left and right AST nodes that were affected by the change, their enclosing method, class, and field signatures (as applicable), their positions in the original code file, etc. To weed out non-essential differences (as we've defined them so far), just do:

Collection ‹DiffCatResult› results = ...
CollectionUtil.removeAll(results, DiffCodes.NON_ESSENTIAL_DIFF);

Programming against DiffCat (within SemDiff)

If you'd like to use DiffCat within the SemDiff framework, you may develop your own SemDiff recommender, as outlined on SemDiff's help page. Your detector will need to declare a dependency on the DiffCat detector (id = ca.mcgill.cs.swevo.diffcat). You can then access the results using this id within your code.


A zip file containing our experimental data package can be downloaded here. The zip archive contains a README that provides details about our setup and the structure of our data. The archive also contains supplemental code used to process and generate portions of our data.


This content was generated by David Kawrykow as part of his Master's thesis while he was supervised by Martin Robillard.

Contact Information

For questions about DiffCat, the viewer, or the experimental data, please email David at dkawry at cs dot mcgill dot ca.

Valid XHTML 1.0 Transitional