Machine Learning for Bioinformatics (COMP 766-001, Fall 2006)


Resources / handouts:

Lecture schedule (in process of being reformatted):

Lecture Date Topic(s) Readings / materials
1 Sep 5 Introduction - What is machine learning? Course mechanics and outline. [Slides]
2 Sep 7 Cancelled
3 Sep 12 Brief review of probability theory. Parametric density estimation. Bishop Chapter 2
4 Sep 14 More parametric density estimation. Nonparamteric density estimation. Bishop Chapter 2.
5 Sep 19 Paper discussion. Henikoff and Henikoff (1996) "Using substitution probabilities to improve position-specific scoring matrices" CABIOS, Vol. 12, No. 2, pp 135-143.
6 Sep 21 More nonparametric density estimation. Testing for associations between discrete variables: Chi-square test. For Chi-square - just about any stats book.
7 Sep 26 Testsing for associations between discrete variables: information theory. MacKay
8 Sep 28 Paper discussion. Draghici et al. (2003) "Global functional profiling of gene expression" Genetics, Vol. 81, pp. 98-104.
9 Oct 3 More information theory. Begin prediction / regression: Linear & polynomial regression. Logistic regression. Naive Bayes. Gaussian discriminant analysis.
10 Oct 5
11 Oct 12
12 Oct 17 Paper discussion. Oberg et al. "Joint estimation of calibration and expression for high-density oligonucleotide arrays" Bioinformatics, Vol. 22, No. 19, pp. 2381-2387.
13 Oct 19
14 Oct 24
15 Oct 26 Decision and Regression Trees. Tests (for internal nodes.) Criteria for test selection. Greedy growing and pruning. Mitchell Ch. 3
16 Oct 31 Closing comments on decision/regression trees. Random Forests. Breiman "Random Forests" Machine Learning Vol. 45 pp. 5-32 (2001)
17 Nov 2 Boosting, especially AdaBoost. Paper reading. Shapire "Theoretical views of boosting" EuroColt '99 pp. 1-10 (1999)
Li et al. "Discovery of significant rules for classifying cancer diagnosis data" Bioinformatics Vol. 19 Suppl. 2 pp. ii93-ii102 (2003)
18 Nov 7 Continue boosting and paper discussion. See above.
19 Nov 9 Nearest neighbor methods. Begin support vector machines Mitchell Chapter 8. For SVMs, any tutorial at
20 Nov 14 Finish support vector machines See above.
21 Nov 16 Paper discussion. Rangwala and Karypis (2005) "Profile-based direct kernels for remote homology detection and fold recognition" Bioinformatics, Vol. 21, No. 23, pp. 4239-4247.
22 Nov 21
23 Nov 23
24 Nov 28
25 Nov 30
26 Dec 4

Course outline

Taught by: Prof. Theodore J. Perkins
Office: McGill Centre for Bioinformatics
Phone: 398-5018

Course web page:
Class location: MT3438 04 (That is, Room 4 of 3438 McTavish, which is between the McGill Bookstore and the Undergraduate Student Union)
Class time: 1:05 PM to 2:25 Tue and Thu

What this course is about:

The purpose of this course is to introduce students with some background, or at least interest, in bioinformatics to the major principles and techniques of machine learning, and to look at how these can be applied to problems in bioinformatics. The course is intended to be accessible to students from life science deparments as well as computer science or other technical departments. (Necessary technical background will be kept to a minimum. On the other hand, the course has previously been enjoyed by students who have already taken COMP 652 - Machine Learning, for example. See more on prerequisites below.) The specific topics to be covered include (not necessarily in this order, and subject to revision based on student interest):

The goals of the course are to:

Format: Approximately half of the classes will be lectures taught by Dr. Perkins, and half will be discussions of bioinformatics research papers that use machine learning.


Credits: 4

Prerequisites: Students should have studied calculus, at least one class on probability/statistics, and have a basic background in computer science. If you are unsure, email me or talk to me in class.

Primary course materials:

Secondary course materials: