This page contains additional resources related to the manuscript "Discovering Information Explaining API Types Using text Classification" submitted to ICSE 2015 by Petrosyan, Robillard, De Mori
To improve part-of-speech (POS) tags assigned by Stanford Parser
in case of technical concepts we reimplemented a multi-word term
detection algorithm and ran it on Official Java Tutorials. Afterwards
we chose top phrases and forced POS tagger to tag them as nouns.
List of Multi-word phrases:
multi_word_phrases.pdf
For out classification task we used Dependency-based Features.
To find useful dependencies we used the Java tutorials to extracted 1785
dependencies in which either the governor or the dependent was a
code-like term(CLT). After manual annotation we calculated a z-score and
normalized it to use as a weight for the dependency.
The useful dependencies instances overall mapped to 243 distinct typed
dependencies and 39 distinct relations.
List of useful dependencies used: dependencies.csv
List of useful relation types used: relations.csv
For constructing training set for the classification task we needed to
manually annotate the tutorials. To ensure a high level of rigour in
our annotation process, we constructed a detailed annotation guide.
Annotation Guide: annotation_guide.pdf
Studying how to discover tutorial sections relevant to API
types requires a corpus of tutorials. We selected five tutorials
covering four different Java APIs. Here are those five tutorials
after pre-processing and annotation.
Annotated tutorials:
JodaTime
Math
Col. Official
Col. Jenkov
Smack