Machine Learning (COMP-652 and ECSE-608)

Project

Goal

The main goal of the project for this course is to encourage you to experiment with some of the machine learning methods that we discussed in class, in the context of problems that are of interest to you.

Requirements

The project is to be completed either individually or in groups of two-three students. Students who are working for their research on problems that are amenable to machine learning solutions are strongly encouraged to formulate a project related to their work. Students who do not have such problems should contact Doina to discuss possible projects.

All students will be required to write a project report, and to do a final project presentation. The presentations will be scheduled in April, outside of the class time, and the projects will be due shortly thereafter.

By March 14, you should send Doina a short email with the title and a one-paragraph abstract of your project. Teams of two students should send one email per team. The abstract should make it clear what your goal is, what resources (data, software, papers) you will use, and what we should expect to see at the end of the project. A typical project will involve reading 2-3 papers and experimenting with machine learning algorithms on some interesting data set. You are strongly encouraged to use existing resources (e.g. software) and not to re-implement existing algorithms if it is not necessary.

The report should contain the following components:

An introduction/motivation section, in which you explain why the problem you are about to address is interesting and challenging
A section describing the basic approach that you will take. You should assume that algorithms we discuss in class (e.g. SVMs, HMMs etc.) are known, but summarize any algorithms that would not have been discussed in class.
A section describing the experimental setup. Here you describe your data set, along with any pre-processing steps that you might have taken (e.g. to remove noise, select attributes etc.). You should describe this in enough detail that someone with access to your data could exactly reproduce the results.
A section describing your results, along with a discussion of what you observed. It is important to ensure that you perform your experiments in such a way that results are meaningful (e.g., make sure you use cross-validation and report test set results). If you use statistical testing to assess the signifcance of your results, make sure that the test you choose is appropriate for your data. If appropriate, report running time and memory in addition to performance.
A section of references

There is no set format for the report, use your favorite. There are also no fixed length requirements - you should just make sure that you give the right amount of information for someone to understand what you did. In the past, reports have averaged 8-10 pages in single-column format, but use this number just as a guideline. We expect that many of you will be working with very interesting and challenging data. As a result, the goal of the grading would be to assess your competence in applying machine learning methods to a specific, large problem, and not to see a set level of performance. You need to ensure that you reference all sources appropriately (software, papers, ideas from colleagues etc).

The presentation should be 8 minutes long, with 2 minutes allowed for questions. This means that you could go over 8 slides at most, including your title slide. For such a short presentation, you do not need an outline slide. You should motivate your topic, present the general approach, the results and a short discussion, and have a conclusion/future work slide. Please upload the presentation along with your report. Both documents (report and presentation) should be in pdf format.