COMP 762 ML & NLP Methods for Software Engineering

 

News:

The enrollment is currently full. Please contact the instructor for adding your name to the waiting list.

 

Course Information:

Instructor:    Jin Guo

Class Time:   TR 10:05 AM - 11:25 AM

Location:       McConnell Engineering Building 103

 

Description:

Modern software engineering projects produce large amount of data including use cases, specifications, source code, test cases, etc. With effective Machine Learning (ML) and Natural Language Processing (NLP) techniques, these data can be utilized to support a variety of software engineering activities. This course aims to introduce students to cutting-edge research topics that utilize ML and NLP techniques to provide automated or semi-automated support for SE tasks. The course will focus on discussing seminal and state-of-the-art papers that are published in SE conferences and journals covering topics such as code completion, feature location, trace link evaluation, etc. A variety of ML and NLP techniques are utilized in those works, such as association rule mining, topic modeling, natural langue semantic parsing, language model and deep neural network.  Student will read assigned papers ahead of time and write paper reports. During class, students will present the paper and participate in in-depth discussion. Students will also carry out a final project that either extend previous literatures or explore new directions of using the relevant ML and NLP techniques and tools for SE needs.  

 

Outcome:

The students who successfully finished this course should be able to:

·      Have the knowledge of ML and NLP techniques that are frequently adopted in SE research. Understand the strengths and constrains of different techniques for solving specific SE problems.

·      Critically read scientific literatures. Identify and articulate their context, techniques, findings, contributions and limitations.

·      Find and formulate good SE problems. Identify and implement solutions that potentially solve or mitigate those problems effectively.

·      Clearly summarize and communicate research findings through written reports and presentations.

 

Paper Reading Report

Before in-class discussion, students will read the assigned papers, annotate the papers while they are reading, and write reports to summarize the content in papers. The reading report should cover the following points:

·      Motivation of this work.

·      What are the assumptions this paper makes?

·      What is the proposed solution?

·      How is solution evaluated?

·      What elements might threaten the validity of this work?

·      Limitations or extensions for this work

·      Major takeaway message

 

Paper Presentation

From Week 2, there will be one student giving a 20 minutes presentation about 1-2 papers related to the assigned paper. The presenter should find the related papers by himself/herself.  The papers can either be solving the same or similar SE problems, or using the same or similar techniques in a different SE context. The rest of the class are NOT required to read those papers.

 

The presenter of each class is decided in a first-come-first-serve way. Please sign up using the link that will be sent to your McGill email address.

 

Final Project

Student will work alone or in pairs for the final project. For pair project, student need to clearly define which person did which parts of the work and each person should contribute equally. 

 

The proposal report should be less than 2 pages, and cover the basic ideas of what SE problem you intend to solve, why you think it's important, and how you plan to evaluate it. The proposal presentation will be 10 minutes following with discussions on week 6. 

 

The final report should be 4-8 pages, and follow the structure of the SE literature, including sections of project goals, concrete methods, evaluation strategy (what are the measures, baselines and why) and conclusions. The final project presentation will be 15 minutes on week 14 (it might start from week 13 depending on the number of projects). 

 

Grading

·      Paper reading report [20%]

·      Participation (survey, discussion and feedback) [20%]

·      Presentation (related paper presentation, proposal presentation, final project presentation) [20%]

·      Project Report (proposal, peer review, final report) [40%]

 

Schedule

Below is a tentative schedule of the papers to read and discuss. The date is subject to minor modifications. The papers from this list is required to read by everyone before class.

 

Week

Date

Reading List

Slides

1

9-Jan

 

Week 1

11-Jan

How to Read an Engineering Research Paper

Writing Good Software Engineering Research Papers

2

16-Jan

Who should fix this bug

 

18-Jan

Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code

 

3

23-Jan

“Concept location” paper continued

IR Intro

25-Jan

No class

 

4

30-Jan

Automated Extraction and Clustering of Requirements Glossary Terms

Term Extraction and Clustering 

1-Feb

An Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies

Hyponym Extraction 

5

6-Feb

Towards an intelligent domain-specific traceability solution

DoCIT 

8-Feb

Discovering Information Explaining API Types Using Text Classification

Discover API Tutorial

6

13-Feb

Proposal Presentation

 

15-Feb

7

20-Feb

22-Feb

Improving code readability models with textual features

Code Readability Model 

Mining Metrics to Predict Component Failures

Statistic Techniques 

8

27-Feb

Guest Lecture by Martin Robillard

 

1-Mar

Improving Trace Accuracy through Data-Driven Configuration and Composition of Tracing Features

GA for Trace Engine Configuration 

9

6-Mar

Study Break

 

8-Mar

10

13-Mar

On the naturalness of software

 Naturalness of Software

Natural Language Models for Predicting Programming Comments

Predict Comment

15-Mar

Mining idioms from source code

Chalkboard Lecture by Carlos

Suggested Reading 

11

20-Mar

Toward Deep Learning Software Repositories

Neural Language Model 

22-Mar

Exploring API embedding for API usages and applications

API Embedding 

12

27-Mar

A Convolutional Attention Network for Extreme Summarization of Source Code

RNN Background 

29-Mar

Are Deep Neural Networks the Best Choice for Modeling Source Code

Nested Scoped LM 

13

3-Apr

Easy over Hard: A Case Study on Deep Learning

 Easy over Hard

5-Apr

Final Presentation and Discussion

 

 

14

10-Apr

12-Apr