Witt Project Web Site

This page contains additional resources related to the manuscript "Automatically Categorizing Software Technologies" submitted to TSE by Nassif, Treude, Robillard

Search Term Replacement

Replacements for search terms. This table refers to the heuristic described in Section IV.A, "Performing a General Wikipedia Search"
Original SequenceReplacement
vb.netvisual basic .net
vbvisual basic
vcmicrosoft visual c++
[single letter][single letter] programming

Acronym Expansion

Expansions for Acronyms. This table refers to the heuristic described in Section VI.A "3) Acronyms"
osxos x
osoperating system
ideintegrated development environment
dbmsdatabase management system
rdbmsrelational database management system
apiapplication programming interface
uiuser interface
guigraphical user interface
sdksoftware development kit

Tag Examples from Each Set

To get a sense of what the actual data is like, here are 50 tags from each of the 3 evaluation sets. Those tags have been chosen ramdomly from each set, each tag having a similar probability to be selected to all of the other tags from the same set. The order of the tags in the lists is also completely random.

Popular Tags (50 out of 301 tags)

Common Tags (50 out of 10,381)

Rare Tags (50 out of 27,208)

Coding Guide

Given a phrase and a list of candidate hypernyms, for each candidate in the list, give it one of three values:

Special Cases:

Terms Considered too General:

50 Largest Categories Produced by Witt

To understand the added value of WittC/CA over WittH, we provide the 50 largest categories returned by both variants, together with one member of each category to help understand potentially ambiguous terms. These members are provided only as example of the corresponding category. They may also be in other categories.

Category Size Tag Example
open source 298 torch
company 254 avaya
process 184 amalgamation
software 146 powertab
programming language 145 groovy-2.3
multi-paradigm programming language 120 perl5.12
web application framework 117 grails-2.0.4
functional programming language 113 haskell
integrated development environment 104 xcode4.1
imperative programming language 96 algol
library 90 rodf
object-oriented programming language 86 python-3.1
free software 84 ikiwiki
website 82 justin.tv
computer program 71 virus
tool 70 grako
framework 62 aurora.js
operating system 62 windows-3.1
procedural programming language 57 scheme48
method 56 linear-search
c 54 char32-t
open source web application framework 54 django-1.4
software framework 53 lime
application programming interface 52 opengl-2.0
general-purpose programming language 51 ruby-2.1
javascript library 51 planetary.js
function 51 identity-operator
ide 48 visual-c++-2013
reflective programming language 48 r
api 48 webcl
technique 47 record-locking
file format 46 .war
class 46 uiwebview
term 45 truthiness
data structure 44 binary-tree
way 40 orientation
unix-like os 39 android
mobile operating system 39 android
web browser 37 internet-explorer
open-source 36 git
graphical user interface 35 windows-8
command 35 kill
scripting programming language 34 javascript
compiler 34 gcc4
content management system 33 joomla
web server 33 httpserver
program 32 compiler
jquery plugin 32 gamequery
algorithm 31 binary-search
system 31 64bit
Category Size Tag Example Attributes
library 1104 quantlib numerical, free, open source, finance, software
tool 889 zpt html/xml, generation
framework 871 spring-2.5 open source, platform, application, development
system 715 sourcegear commercial, proprietary, revision control
programming language 642 python-3.3 general purpose, imperative, high level, multi paradigm, reflective, procedural, functional, object oriented
company 555 iar swedish, computer, software
language 486 xslt-1.0 transformation, xml
process 397 magnify
program 380 md5sum computer
platform 332 azure-platform durable, highly scalable, cloud based, storage
service 306 online-storage internet, hosting
operating system 293 ubuntu-12.10 linux, debian based, server
class 272 qtablewidget qt
plugin 261 jcarousel jquery, open source
integrated development environment 260 phpstorm php, commercial, cross platform
application programming interface 252 photon realtime, multiplayer, cross platform
function 245 sha hash, cryptographic
protocl 236 kermit computer, transfer, transfer/management, data
engine 209 yahoo-search web, search
web application framework 209 laravel-3 php, open source, free
method 208 accessor
interface 199 tkinter python, standard
free software 196 mcedit
package 189 r-raster r, data analysis
technique 184 regression collection of, statistical
c 179 crtp
project 178 koneki incubator, eclipse
website 174 yahoo-search
environment 174 dymola simulation
format 172 opentype computer font
file format 170 .doc document
algorithm 167 sieve-of-eratosthenes simple, ancient
extension 166 semantic-mediawiki mediawiki, open source, free
technology 159 3g-network third generation of, telecommunication, mobile
utility 157 diff comparison
structure 156 cpu-cache hardware
term 153 dynamic-html umbrella
editor 150 wymeditor open source, html, text, wysiwym
module 139 pyserial python
object 138 cube three dimensional, solid
database management system 136 sql-server-2014 relational
database 134 sql-ce relational, compact
file 132 assemblyinfo stardard
device 121 touchpad pointing
product 121 winfax software, microsoft, windows based
compiler 120 g++4.8
name 116 retina-display brand
client 116 curl ftp, http
way 107 idioms
element 105 noscript html