Martin Robillard · Blog

On-Demand Developer Documentation

6 November 2017 by Martin P. Robillard with Andrian Marcus, Christoph Treude, Gabriele Bavota, Oscar Chaparro, Neil Ernst, Marco Aurélio Gerosa, Michael Godfrey, Michele Lanza, Mario Linares-Vásquez, Gail Murphy, Laura Moreno, David Shepherd, Edmund Wong

About anyone who develops with software technology will need some sort of documentation. According to a recent survey, that's about 29 million IT workers. And there’s a lot of evidence that these IT workers need and want documentation, for example:

So it looks like we should be writing good documentation. The only problem is that this is mission impossible. At least four forces conspire to make this task an essential challenge in software development:

Whatever the exact causes, the consequences are by now recognized in literature that reports on the experiences of developers with documentation, for example [Lethbridge et al., 2003] and [Uddin and Robillard, 2015].

When faced with costly challenges, we’ve now learned to turn to crowdsourcing as a potential option. However, things don't look too promising for crowd-sourced documentation, as a major attempt in this direction was inconclusive. Stack Overflow tried to implement a documentation crowdsourcing system, which simply did not take off. Some of the reasons proposed include

In February 2017, 14 researchers interested in software documentation gathered at a workshop to discuss the state of software documentation, with the goal of making it more dynamic. People have been dreaming about automatic documentation for decades. However, a lot of things happened in the last 40 years, from search engines to public source code repositories. So maybe it's time to try again?

We are thus proposing the concept of On-demand Developer Documentation (OD3) as a convenient way to summarize the research challenges that remain to make information on software more accessible to developers and cheaper to produce and maintain. We express these challenges by contrasting an OD3 system with a traditional information retrieval engine.

In a traditional search engine, a manually-authored query is fed to the search engine, which compares it against individual documents, and outputs a subset of the documents determined to be relevant to the query.

The first challenge to overcome in realizing a vision of on-demand developer documentation will be to enrich the query so that it can transparently include a detailed and multidimensional profile of the technical knowledge the user already has. There’s a lot of research to develop profiles of users as economic and social entities, but much less so to profile people’s technical expertise. To help users search effectively, it will also be necessary to develop abstractions that provide insigthts about what they can hope (and not hope) to find in terms of content.

For the cases where a high-quality answer to a query does not already exist, the second challenge will be to assemble an answer by inferring fragments of information and linking these together. In our vision the information we infer would be an intermediate representation that would be combined into a multifaceted document. These is also a contrast to the usual logic of information retrieval, where documents are individually matched against a query. For documentation generation to work, we will need to analyze how individual documents relate.

Finally, the third major challenge will be to advance the state of the art in documentation generation. Here the major difference with search engines is that we don't want to return a list of matching documents, but rather a single document that provides an authoritative answer to the query (possible with a few variants).

Independently of whether a usable OD3 system sees the light of day in the near future, the concept of OD3 is exciting because it brings together different research specialties while also providing application for existing research-based techniques.