Computational Biology Methods (COMP-462) 2019

Computational Biology Methods and Research(COMP-561) 2019

Monday and Wednesday 2:30-4:00, on Zoom (see zoom link on MyCourses).

 

3 credits (462) and 4 credits (561)

 

 

Mathieu Blanchette

 

 

Trottier 3107

 

 

McGill Centre for Bioinformatics

 

 

McGill University

 

 

Montreal, Quebec, Canada

 

 

 

 

 

blanchem@cs.mcgill.ca

http://www.cs.mcgill.ca/~blanchem/462

http://www.cs.mcgill.ca/~blanchem/561

 

 

 

Telephone: 514-398-5209

 

 

 



 

 

 

Course Abstract:

 

 

 

 

Computational biology is the sub-discipline of Bioinformatics that is closest in spirit to pure computer science. The main efforts in this field are two-fold. Firstly, we are concerned with creating models for problems from the biosciences (biology, biochemistry, medicine) that are both biologically and mathematically sound. Secondly, we are interested in the design and analysis of efficient, and accurate algorithms that solve these problems in practice and strategies for validation of results.

 

 

 

 

 

This course is designed to introduce upper-year undergraduate students and graduate students to this area by examining several classic problems from the field. The intention of the course is to act as a gateway whereby, upon completion of the course, students will have the necessary biology, mathematics and computer science background to attend graduate level courses in bioinformatics geared towards specific topics (phylogenetics, genomic evolution, functional genomics, proteomics). The course is designed in such a manner that no previous formal training in biology is required of the students.

 

 

 

 

 

The necessary mathematical background consists of the lower level discrete structures and probablity courses, since topics such as maximum likelihood estimation, hidden Markov models, and dynamic programming will be used repeatedly throughout the material. (Both maximum likelihood and hidden Markov models will be introduced at a basic level however.) Students will be required to have already taken the lower level algorithms/data-structures, numerical computing and theoretical computer science courses.

 

Prequisities:

 

 

 

 

COMP 251 Data Structures and Algorithms

 

 

MATH 323 Probability Theory (or equivalent)

 

 

 

 

 

 

 

Office Hours:

 

 

 

 

M. Blanchette: Monday and Wednesday 4:00-5:00pm, via Zoom (same link as lectures)

Dongjoon Lim

Zichao Yan

Yanlin Zhang

 

Book and Material:

 

 

 

 

Bernhard Haubold and Thomas Wiehe. Introduction to Computational Biology: An Evolutionary Approach , Burkhauser Basel, 2007

           

 

 

Probably the best introductory book out there. Its level is ideal for the course, but it does not go much beyond this.

 

 

 

 

 

Durbin, Eddy, Krogh, Michinson, Biological Sequence Analysis, Cambridge, 1998.

 

 

 

 

 

Also not required, this book is particularly good for learning some of the basics of statistical inference/machine learning.

 

 

 

 

 

Jones, N.C. and Pevzner, P. An Introduction to Bioinformatics Algorithms, MIT press, 2004

 

 

You are not required to buy this book, however it is a good book for understanding some of the classic problems in computational biology and the algorithms used to solve these problems.

 

 

 

 

 

Campbell, A. M., and Heyer, L.J. Discovering genomics, proteomics, and bioinformatics, Benjamin Cummings, 2002

 

 

Also not required, this is a good primary for computer scientists that covers the basics of genomics, genetics, and proteomics.

 

 

 

 

 

Alberts, Johnson, Lewis, Raff, Roberts, Walter Molecular Biology of the Cell, Garland, 2002

 

 

This is a widely used and comprehensive book covering the biology of the cell. It is a good place to start when you want to explore a new topic.

 

Evaluation for COMP 462:

 

 

 

4 assignments

40% total (10% each)

 

Take-home Midterm, October 19, held remotely

20%

 

Take Home Final Exam, during exam period, held remotely

40%

 

Evaluation for COMP 561:

 

 

 

3 assignments (the first three)

30% total (10% each)

 

Project

20%

 

Take-home Midterm, October 19, in class

20%

 

Take-home Final Exam, during exam period

30%

 

Computer Science/mathematics topics:

 

 

 

 

Basic probability and statistics (ubiquitous)

 

 

Dynamic programming (sequence alignment)

 

 

Approximation algorithms (string alignment)

 

 

Advanced data structures (suffix trees)

 

 

Numerical techniques (least squares fits)

 

 

Experimental design

 

 

Programming

 

Concepts from biology and biotechnologies

 

 

 

 

Models of evolution

 

 

Sequence comparison

 

 

Phylogenetics

 

 

Gene expression and regulation

 

 

Peptide identification

 

 

RNA secondary structure

 

 

DNA sequencing

 

 

Population genetics

 


Course outline

 

Lecture 1,2: Introduction to molecular biology and genomics.

 

 

 

 

Topics: Basic Questions, Basic Strategies, Introduction to molecular biology and genomics.

 

 

Background Reading: Chapter 1 of Artificial Intelligence and Molecular Biology , by L. Hunter

 

 

On-line Resources: Lecture notes by Dudoit and Gentleman

 

Lecture 3-7: Sequence evolution and sequence alignment.

 

 

 

 

Topics: Introduction to sequence evolution. Global and local alignment; Gapping; BLAST algorithm; Multiple Alignments.

 

 

Background Reading: Chapter 6 of Jones, Pevzner; Chapter 6.2 of Ewans, Grant.

 

 

Math/Algorithms: Dynamic Programming

 

 

Applications: Gene finding.

 

 

On-line Resources:

 

 

Additional material for COMP 561: Chapter 6 of Durbin and Eddy.

 

Lecture 8-11: Evolutionary models and phylogenetic Tree construction.

 

 

 

 

Topics: Discrete and continuous nucleotide and amino acid substitution models Distance-based methods; Parsimony; Maximum Likelihood.

 

 

Background Reading: Chapters 7-8 of Durbin et al. or Chapter 8 and 9.1-9.5 of Haubold and Wiehe.

 

 

Math/Algorithms: Discrete algorithm design; Maximum likelihood.

 

 

On-line Resources:

 

 

Midterm exam, October 19, in class (open book).

 

 

 

 

 

 

Lecture 13-16: Hidden Markov Models.

 

 

 

 

Topics: Forward, backward, Viterbi, Baum-Welch algorithms. Gene-finding HMMs. Profile HMMs.

 

 

Background Reading: Chapters 3 and 5 of Durbin et al.

 

 

Math/Algorithms: Markov processes; Dynamic programming; Parameter estimation.

 

 

Application: Gene finding

 

 

On-line Resources:

 

Lecture 17-18: Motif discovery.

 

 

 

 

Topics: Modelling and searching for signals in DNA..

 

 

Background Reading: Chapter 5 of Ewans, Grant; Chapter 4 of Jones, Pevzner

 

 

Math/Algorithms: Probability theory; Markov processes; exhaustive search; Gibb's sampling (intro)

 

 

Applications: Searching for repeats. Identifying transcription factor binding sites.

 

 

On-line Resources:

 

 

 

Lecture 19-20: Gene Expression Analysis.

 

 

 

 

Topics: Class distinction; Class prediction; Class discovery.

 

 

Background Reading: Chapter 10 of Jones, Pevzner.

 

 

Math/Algorithms: Differential expression; Principal Component Analysis; Clustering; Graph theory.

 

 

On-line Resources:

 

Lecture 21-22: Sequence assembly and Protein Identification.

 

 

 

 

Topics: DNA sequencing; Sequence assembly problem; Peptide identification; Peptide sequencing; Protein Identification.

 

 

Background Reading: Chapter 8.10-8.15 of Jones, Pevzner.

 

 

Math/Algorithms: Graph theory; Dynamic programming..

 

 

On-line Resources:

 

 

Lecture 23,24: Introduction to population genetics

 

 

 

 

Topics: Polymorphisms, haplotypes. Genotyping by sequencing. Phasing.

 

 

Background Reading: TBD

 

 

Math/Algorithms:

 

Final exam, during the exam session.

 

 

 

Whereas, McGill University values academic integrity;

Whereas, every term, there are new students who register for the first time at McGill and who need to be informed about academic integrity;

Whereas, it is beneficial to remind returning students about academic integrity;

Be it resolved that instructors include the following statement on all course outlines: McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/integrity for more information).

Be it further resolved that failure by an instructor to include a statement about academic integrity on a course outline shall not constitute an excuse by a student for violating the Code of Student Conduct and Disciplinary Procedures.

 

 

Mathieu Blanchette (based on Mike Hallett's syllabus for COMP 462)


2020-09-02