This page contains additional resources related to the manuscript "Automatically Categorizing Software Technologies" submitted to TSE by Nassif, Treude, Robillard
Original Sequence | Replacement |
---|---|
ms | microsoft |
vb.net | visual basic .net |
vb | visual basic |
vc | microsoft visual c++ |
[single letter] | [single letter] programming |
Acronym | Replacement |
---|---|
osx | os x |
os | operating system |
ide | integrated development environment |
dbms | database management system |
rdbms | relational database management system |
api | application programming interface |
ui | user interface |
gui | graphical user interface |
sdk | software development kit |
To get a sense of what the actual data is like, here are 50 tags from each of the 3 evaluation sets. Those tags have been chosen ramdomly from each set, each tag having a similar probability to be selected to all of the other tags from the same set. The order of the tags in the lists is also completely random.
Popular Tags (50 out of 301 tags)
Common Tags (50 out of 10,381)
Rare Tags (50 out of 27,208)
Given a phrase and a list of candidate hypernyms, for each candidate in the list, give it one of three values:
Special Cases:
Terms Considered too General:
To understand the added value of WittC/CA over WittH, we provide the 50 largest categories returned by both variants, together with one member of each category to help understand potentially ambiguous terms. These members are provided only as example of the corresponding category. They may also be in other categories.
Category | Size | Tag Example |
---|---|---|
open source | 298 | torch |
company | 254 | avaya |
process | 184 | amalgamation |
software | 146 | powertab |
programming language | 145 | groovy-2.3 |
multi-paradigm programming language | 120 | perl5.12 |
web application framework | 117 | grails-2.0.4 |
functional programming language | 113 | haskell |
integrated development environment | 104 | xcode4.1 |
imperative programming language | 96 | algol |
library | 90 | rodf |
object-oriented programming language | 86 | python-3.1 |
free software | 84 | ikiwiki |
website | 82 | justin.tv |
computer program | 71 | virus |
tool | 70 | grako |
framework | 62 | aurora.js |
operating system | 62 | windows-3.1 |
procedural programming language | 57 | scheme48 |
method | 56 | linear-search |
c | 54 | char32-t |
open source web application framework | 54 | django-1.4 |
software framework | 53 | lime |
application programming interface | 52 | opengl-2.0 |
general-purpose programming language | 51 | ruby-2.1 |
javascript library | 51 | planetary.js |
function | 51 | identity-operator |
ide | 48 | visual-c++-2013 |
reflective programming language | 48 | r |
api | 48 | webcl |
technique | 47 | record-locking |
file format | 46 | .war |
class | 46 | uiwebview |
term | 45 | truthiness |
data structure | 44 | binary-tree |
way | 40 | orientation |
unix-like os | 39 | android |
mobile operating system | 39 | android |
web browser | 37 | internet-explorer |
open-source | 36 | git |
graphical user interface | 35 | windows-8 |
command | 35 | kill |
scripting programming language | 34 | javascript |
compiler | 34 | gcc4 |
content management system | 33 | joomla |
web server | 33 | httpserver |
program | 32 | compiler |
jquery plugin | 32 | gamequery |
algorithm | 31 | binary-search |
system | 31 | 64bit |
Category | Size | Tag Example | Attributes |
---|---|---|---|
library | 1104 | quantlib | numerical, free, open source, finance, software |
tool | 889 | zpt | html/xml, generation |
framework | 871 | spring-2.5 | open source, platform, application, development |
system | 715 | sourcegear | commercial, proprietary, revision control |
programming language | 642 | python-3.3 | general purpose, imperative, high level, multi paradigm, reflective, procedural, functional, object oriented |
company | 555 | iar | swedish, computer, software |
language | 486 | xslt-1.0 | transformation, xml |
process | 397 | magnify | |
program | 380 | md5sum | computer |
platform | 332 | azure-platform | durable, highly scalable, cloud based, storage |
service | 306 | online-storage | internet, hosting |
operating system | 293 | ubuntu-12.10 | linux, debian based, server |
class | 272 | qtablewidget | qt |
plugin | 261 | jcarousel | jquery, open source |
integrated development environment | 260 | phpstorm | php, commercial, cross platform |
application programming interface | 252 | photon | realtime, multiplayer, cross platform |
function | 245 | sha | hash, cryptographic |
protocl | 236 | kermit | computer, transfer, transfer/management, data |
engine | 209 | yahoo-search | web, search |
web application framework | 209 | laravel-3 | php, open source, free |
method | 208 | accessor | |
interface | 199 | tkinter | python, standard |
free software | 196 | mcedit | |
package | 189 | r-raster | r, data analysis |
technique | 184 | regression | collection of, statistical |
c | 179 | crtp | |
project | 178 | koneki | incubator, eclipse |
website | 174 | yahoo-search | |
environment | 174 | dymola | simulation |
format | 172 | opentype | computer font |
file format | 170 | .doc | document |
algorithm | 167 | sieve-of-eratosthenes | simple, ancient |
extension | 166 | semantic-mediawiki | mediawiki, open source, free |
technology | 159 | 3g-network | third generation of, telecommunication, mobile |
utility | 157 | diff | comparison |
structure | 156 | cpu-cache | hardware |
term | 153 | dynamic-html | umbrella |
editor | 150 | wymeditor | open source, html, text, wysiwym |
module | 139 | pyserial | python |
object | 138 | cube | three dimensional, solid |
database management system | 136 | sql-server-2014 | relational |
database | 134 | sql-ce | relational, compact |
file | 132 | assemblyinfo | stardard |
device | 121 | touchpad | pointing |
product | 121 | winfax | software, microsoft, windows based |
compiler | 120 | g++4.8 | |
name | 116 | retina-display | brand |
client | 116 | curl | ftp, http |
way | 107 | idioms | |
element | 105 | noscript | html |