Natural Language Processing
Fall, 2017
Instructor: Jackie Chi Kit Cheung
Time: Tuesdays and Thursdays, 4pm – 5:30pm
Location: MAASS 217
Office hours: Tuesdays, 2:30pm – 3:45pm, McConnell 108N
Course outline
TAs : Ali Emami, Jad Kabbara, Kian Kenyon-Dean, Krtin Kumar
This course presents an introduction to the computational modelling of natural language. Topics covered include: computational morphology, language modelling, syntactic parsing, lexical and compositional semantics, and discourse analysis. We will consider selected applications such as automatic summarization, machine translation, and speech processing. We will also study machine learning algorithms that are used in natural language processing.
Prerequisites: MATH 323 or ECSE 305; COMP 251 or COMP 252.
Useful but not required: Background in artificial intelligence (e.g., COMP 424); introductory course in linguistics (LING 201).
Announcements
- TA office hours for A4: Monday, November 27, 2:30-4:30 PM in McConnell 321.
-
Extra TA office hours before the midterm:
11:30 to 13:30 on Monday, Nov 6
12:00 to 14:00 on Tuesday, Nov 7
Both will be in Trottier 3104. - TA office hours for A2 feedback: 2pm-4pm on Friday, October 27th in Trottier 3104.
- TA office hours for A2: 12pm-2pm on Monday, October 16th, in Trottier 3104.
- Regular office hours cancelled on Oct 17. TA office hours will be announced for A2.
- Final project specifications have been posted. Start finding your partner and project now!
- 24-hour extension for A1 to September 30, 2017, 11:59pm.
- TA office hours for A1: 2pm-4pm on Thursday, September 28th, in Trottier 3104.
- Because enrollment is now beyond the seating capacity of the lecture hall, I ask that auditors not attend lectures for the first two weeks. I expect that enrollment will stabilize by then, and that there will be room in the lecture hall. Lectures will not be recorded, as the room is not equipped for such.
- No office hours Sept 12. Please send me e-mail regarding any issues.
Lectures and Readings
Draft chapters of the 3rd edition of Jurafsky and Martin are available here.
Date | Topic | Readings |
---|---|---|
Sept 5 | 1 - Introduction to Natural Language Processing | J&M Ch 1 (both 1st ed and 2nd ed) |
Sept 7 | 2 - Morphology, FSAs and FSTs – Lecture by Krtin Kumar | J&M Ch 2.2, Ch 3 (both 1st ed and 2nd ed) |
Sept 12 | 3 - Article prediction, Python intro – Lecture by Jad Kabbara | NLTK |
Sept 14 | 4 - Language models and N-grams | J&M Ch 6.1, 6.2 (1st ed); J&M Ch 4.1 – 4.4 (2nd or 3rd ed) |
Sept 19 | 5 - Smoothing and model complexity | J&M Ch 6.3 (1st ed); J&M Ch 4.5 (2nd ed) Notes by Kevin Murphy |
Sept 21 | 6 - Feature extraction and classification | |
Sept 26 | 7 - Part of speech tagging: Markov chains and hidden Markov models | J&M Ch. 8.1–8.3 (1st ed); J&M Ch. 5.1–5.3 (2nd ed); J&M Ch. 10.1–10.4 (3rd ed) |
Sept 28 | 8 - Part of speech tagging: Algorithms | J&M Ch. 7.2-7.3, 8.5 (1st ed); J&M Ch. 5.5, 6.1–6.5 (2nd ed) |
Oct 3 | 9 - Linear-chain conditional random fields | Tutorial by Sutton and McCallum. Sections 1, 2–2.3, 3–3.1; J&M Ch. 10.5-10.6 (3rd ed) |
Oct 5 | 10 - Recurrent neural networks | Primer on neural networks for NLP by Yoav Goldberg |
Oct 10 | 11 - Introduction to context-free grammars | J&M Ch. 9 (1st ed); J&M Ch. 12 (2nd ed); J&M Ch. 11 (3rd ed) |
Oct 12 | 12 - CYK parsing and probabilistic context-free grammars | J&M Ch. 10, 12, especially 12.1 (1st ed); J&M Ch. 13, 14, especially 14.2 (2nd ed); J&M Ch. 12, especially 12.2 (3rd ed) |
Oct 17 | 13 - Lexical semantics – Lecture by Jad Kabbara | J&M Ch. 16, 17–17.3 (1st ed); J&M Ch. 19, 20–20.5 (2nd ed); J&M Ch. 17 (3rd ed); WordNet; Yarowsky, 1995 |
Oct 19 | Guest lecture by Prof. Timothy O'Donnell | |
Oct 24 | 14 - Hearst patterns and distributional semantics | J&M Ch. 15, 16 (3rd ed) |
Oct 26 | 15 - Compositional semantics | J&M Ch. 14 – 14.3 (1st ed); J&M Ch. 17 – 17.4 (2nd ed) |
Oct 31 | 16 - Compositional semantics | J&M Ch. 15 (1st ed); J&M Ch. 18 (2nd ed) |
Nov 2 | 17 - Coreference resolution | J&M Ch. 18.1 (1st ed); J&M Ch. 21.3–21.8 (2nd ed) |
Nov 7 | 18 - Discourse coherence | &M Ch. 18.2, 18.3 (1st ed); J&M Ch. 21.1, 21.2 (2nd ed); Barzilay and Lapata, 2008 |
Nov 9 |
Midterm Please go to the following location, depending on the first two letters of your family name:
AA - LE: ENGTR 0060 (Trottier building) |
Study materials: List of practice questions Fall 2015 midterm |
Nov 14 | 19 - Automatic summarization | J&M Ch. 23.3-23.7 (2nd ed); Survey by Nenkova and McKeown, 2011, Chapters 1 and 6 |
Nov 16 | 20 - Abstractive summarization and natural language generation | |
Nov 21 | 21 - Machine translation | |
Nov 23 | 22 - Word alignment, decoding, and neural MT | |
Nov 28 | 23 - Evaluation in NLP | |
Nov 30 | 24 - Bias in NLP systems | Bolubaksi et al., 2016 Zhao et al., 2017 |
Assignments
Assignment 1
Due September 29 30, 2017 at 11:59pm – note 24-hour extension (not shown in pdf)
- Handout
- Link to file for Q3 from dataset by Pang and Lee. Full website here.
Assignment 2
Due October 20, 2017 at 11:59pm
Assignment 3
Due November 13, 2017 at 11:59pm
Assignment 4
Due November 30, 2017 at 11:59pm
Final Project
Proposal due October 26, 2017 at 11:59pm
Project due November 30, 2017 at 11:59pm
- Project specifications
- You and your partner must enroll into a group on MyCourses in order to make submissions.