Martin Robillard · Blog

Code Examples: Less is More, but What Should We Chop Off?

15 October 2014 by Martin P. Robillard and Annie Ying

A lot of the documents we need to understand and use software technologies include code examples. In fact, code examples are often what people are looking for. To decide whether to study a code example in detail, it's useful to have a summary of what it's about, just like for any other document. But most techniques for summarizing "normal" documents don't apply to code. This problem pops up everywhere one looks for code on the Internet. To illustrate the point, let's try to find code that shows how to handle drag and drop events in Android.

Using Google to find code examples shows results with the code in web pages summarized as text:

Stack Overflow does a similar thing: it folds the code snippets into the text when displaying search results, and provides no summarization when viewing the content of the posts.

Code search engines at least treat code as code, but even then good summarization is a challenge. On Black Duck Open Hub Code Search, the code summary is a number of lines around selected keyword matches, sometimes leaving out important context.

In contrast Codota focuses on API calls, and initially elides all lines with no API call.

This technique works well for sequences of API calls, but can omit important logic. Here, the summary misses the selection of the event type:

So, what is the best way to summarize source code? And can we do it automatically? In her recent Ph.D. work, Annie Ying looked at the way people want to see source code summarized by asking 16 participants to produce a total of 156 code summaries. This research elicited a catalog of code summarization and presentation practices, documented with the reasons for applying them.

The example below shows how three different people wanted to see an Android code example summarized. Even this small excerpt from our data shows a good variety of summarization techniques. In the top summary, the participant used a comment to indicate to the reader to expect similar code for a conceptually similar callback method. Not so easy to do this automatically, but we'll be trying.