Uttering Threats to Validity

Martin Robillard · Blog

Uttering Threats to Validity

29 April 2015 by Martin P. Robillard

One of the reviews for my ICSE 2015 submission included the demand:

There is no "threats to validity" or limitation section. Please add this.

When reporting on research results, it is of course necessary to state and discuss decisions of the experimental design that have major repercussions on how the results can be interpreted. A useful framework for organizing this thinking is the collection of "validity" criteria: construct validity, internal validity, external validity, etc.

But who decided that the corresponding prose must be corralled into a specific region of the paper? An obvious alternative is to discuss the implications of design decisions together with the description of the decisions. For instance, a paragraph on the sampling procedure could discuss the "threats" of this procedure, etc. It's the tyranny of the dominant decomposition all over again.

This "threats" trend appears to be particularly pronounced in software engineering. For example, of the 95 technical track papers that appeared at ICSE 2014, I counted 54 (57%) that had a section or subsection explicitly titled "Threats to Validity" (or some small variant). In contrast, looking at the ECOOP 2014 proceedings, I only found 2 of 27 accepted papers using this device. At CSCW 2014, roughly half the papers had some sort of "Limitations" section, but the term "threats to validity" was not to be seen. Clearly, other subdisciplines do things differently.

I got curious about when and how the practice of uttering threats to validity came about in software engineering. Mary Shaw's "Writing Good Software Engineering Research Papers" mini-tutorial doesn't say anything about threats. Vic Basili et al.'s 1986 classic "Experimentation in Software Engineering" is also completely un-threatening. The closest I got in terms of methodology paper is Kitchenham et al.'s "Preliminary Guidelines for Empirical Research in Software Engineering". However, this paper advocates a discussion of the limitations of the work without prescribing a particular paper structure:

I4: Specify any limitations of the study.Researchers have a responsibility to discuss any limitations of their study. They usually need to discuss at least internal and external validity... It is encouraging that recent software research papers have included discussion of threats to validity.

Could it simply be that the idea of a "threats" section, pioneered by a few empirical researchers in the 1990s (e.g., Wohlin et al.), simply snowballed into what it is now?

Overall I don't think the concept of a "Threats to Validity" section is a bad one: it often works very well. But blindly requiring it will not necessarily encourage authors to engage in a deeper reflection on their work. In the worst case, it will simply lead to the waste of precious space on boilerplate gems such as:

The primary threat to external validity for this study involves the representativeness of our [subjects]. Other [subjects] may exhibit different behaviors and cost-benefit trade-offs.

In that sense evaluating research papers based on easily-checkable writing devices would be akin to the rating system used for hotels, where the mere presence of an in-room coffee maker, no matter how crappy, is necessary to move to a different class of service.