Machine Learning (COMP-652)
Fall 2012

Project

Goal

The main goal of the project for this course is to encourage you to experiment with some of the machine learning methods that we discussed in class, in the context of problems that are of interest to you.

Requirements

The topic of the project should be chosen by Tuesday, November 6. You are strongly encouraged to choose a topic related to your intended line of research. If this is not possible, contact Doina and she will assign a project to you. By November 6, you should e-mail Doina your intended project title and a short (one-paragraph) abstract outlining what you are planning to do. The abstract should make it clear what your goal is, what resources (data, software) you will use, and what we should expect to see at the end of the project. A typical project will involve reading some machine learning papers and experimenting with machine learning algorithms on some interesting data set. You are strongly encouraged to use existing resources (e.g. software) and not to re-implement existing algorithms, if it is not necessary.

Suring the week of December 10-14, you will turn in a project report and do a project presentation. The current tentative date for the presentations and reports is December 13, but this is subject to change.

You should e-mail the report, in pdf format to Doina directly (not the the usual homework e-mail address).

The report should contain the following components:

  1. An introduction/motivation section, in which you explain why the problem you are about to address is interesting and challenging
  2. A section describing the basic approach that you will take. You should assume that algorithms we discuss in class (e.g. SVMs, HMMs etc.) are known, but summarize any algorithms that would not have been discussed in class (e.g. conditional random fields, multi-dimensional scaling etc.)
  3. A section describing the experimental setup. Here you describe your data set, along with any pre-processing steps that you might have taken (e.g. to remove noise, select attributes etc.). You should describe this in enough detail that someone with access to your data could exactly reproduce the results
  4. A section describing your results, along with a discussion of what you observed. It is important to ensure that you perform your experiments in such a way that results are meaningful (e.g., make sure you use cross-validation and report test set results). If you use statistical testing to assess the signifcance of your results, make sure that the test you choose is appropriate for your data. If appropriate, report running time in addition to performance.
  5. A section containing conclusions and possible future work directions.
  6. A section of references
There is no set format for the report, use your favorite. There are also no fixed length requirements - you should just make sure that you give the right amount of information for someone to understand what you did. In the past, reports have averaged 8-10 pages in single-column format, but use this number just as a guideline. We expect that many of you will be working with very interesting and challenging data. As a result, the goal of the grading would be to assess your competence in applying machine learning methods to a specific, large problem, and not to see a set level of performance. You need to ensure that you reference all sources appropriately (software, papers, ideas from colleagues etc).

The presentation should be 8 minutes long, with 2 minutes allowed for questions. This means that you could go over 8 slides at most, including your title slide. For such a short presentation, you usually do not need an outline slide. You should motivate your topic, present the general approach, the results and a short discussion, and have a conclusion/future work slide. Please e-mail the presentation along with your report to Doina. The presentation should be in pdf format as well.