A graduate course on advanced techniques to automatically interpret large amounts of structured, semi-structured, and unstructured data, infer some useful knowledge from it, and present this knowledge to users in a convenient form. With a special focus on applications to software engineering problems.
Software developers, like many other types of knowledge workers, have sophisticated information needs, while at the same time being overwhelmed with information. For example, searching for "How do I send an email with Java" finds 143,000,000 related documents including articles, forum posts, mailing list archives, etc. This could be too much information. Recommendation Systems are tools that help users navigate large information spaces by providing guidance and assistance in the form of recommendations, or pieces of information estimated to be relevant in the context of a given task. Recommendation Systems for Software Engineering, or RSSEs, provide recommendations in highly technical contexts where analyses of structured data (such as source code) must often complement traditional data mining techniques. For a more detailed overview of RSSEs, read this IEEE Software article, and check out this website.
The course will cover topics in three major areas:
The course will involve a combination of "roadmap" lectures and invited lectures on selected topics, student presentations and discussion of research papers, and "work-in-progress" project presentations. The course will involve a major project: the development of a prototype RSSE. The final grade will take into account the project (50%), class participation (30%) and a take-home exam (20%). The course will be based on the book "Recommender Systems: An Introduction" by Jannach et al., 2010, and from selected scientific papers.
Official Academic Integrity Statement McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/students/srr for more information).
Language Policy In accord with McGill University’s Charter of Students’ Rights, students in this course have the right to submit in English or in French any written work that is to be graded.
The course project is to develop a prototype RSSE. You can chose whatever application and technique you like, as long as it involves the analysis of software engineering artifacts. Although you will be expected to develop a complete and functional RSSE, you are encouraged to focus on a specific aspect that corresponds to your research area of interest (e.g., mining algorithms, data preprocessing, UIs, etc.)
At the same time as you develop the technical aspects of your project, you will write a report on it using ACM's conference formatting guidelines. There are three milestones:
Details on the format of the reports and presentation, and general guidelines and advice, will be provided in class.
A one-page essay answering a synthesis question, to be completed on your own within a 24-hour period at some point after the project demos.
This schedule is subject to change. Seminar articles are in bold.
|Mon 9 Jan||Introduction to software engineering research. Roadmap: Recommendation systems. Overview of the project.||[RWZ2010]|
|Wed 11 Jan||Roadmap: Data mining software repositories||[XTL2009]|
|Mon 16 Jan||Seminar: Early Systems: CodeBroker and ExpertiseBrowser||[YF2002] [MH2002]|
|Wed 18 Jan||Seminar: Recommendations for the web: tags and shortcuts||[LM2010] [BCC2009]|
|Mon 23 Jan||Seminar: Applications of content-based recommendations: features and bug reports||[AHM2006] [DGH2011]|
|Wed 25 Jan||Seminar: Code comprehension: reuse and debugging||[HRR2009] [AJL2009]|
|Mon 30 Jan||Project proposals|
|Wed 1 Feb||Seminar: Mining code usage||[LZ2005] [BMM2009]|
|Mon 6 Feb||Seminar: Finding code examples||[SC2006] [BOL2010]|
|Wed 8 Feb||Seminar: Synthesizing code examples||[MXB2005] [DR2011]|
|Mon 13 Feb||Invited Lecture: Partial program analysis and the SemDiff recommender||[DR2008]|
|Wed 15 Feb||Invited Lecture: Mining user interaction data||[YR2011]|
|Mon 20 Feb||No class - Study break|
|Wed 22 Feb||No class - Study break|
|Mon 27 Feb||Work in progress presentations|
|Wed 29 Feb||Seminar: Specification Mining||[ABL2002] [GS2009]|
|Mon 5 Mar||Seminar: API property inference||[ZZX2009] [HST2010]|
|Wed 7 Mar||Seminar: Code Quality||[ECH2001] [KR2009]|
|Mon 12 Mar||Seminar: Bug prediction||[SZW2007] [BMN2011]|
|Wed 14 Mar||Seminar: Software Evolution||[ZWD2004] [KN2009]|
|Mon 19 Mar||Roadmap: Metrics and evaluation||[RRS2009] Chapter 8 [JZF2010] Chapter 7|
|Wed 21 Mar||Seminar: Personalization||[FYW2004] [TDH2005]|
|Mon 26 Mar||Seminar: Interaction traces||[PG2006] [FOM2010]|
|Wed 28 Mar||Seminar: User interfaces||[KRW2011] [SS2011]|
|Mon 2 Apr||Seminar: Explanation||[HKR2000] [VSR2009]|
|Wed 4 Apr||Roundtable: Privacy Issues in Recommender Systems||Selected by students|
|Mon 9 Apr||No class - Easter Monday|
|Wed 11 Apr||Project presentations|
|Mon 16 Apr||Project presentations|
Sources not explicitly discussed as part of the seminar, but that will provide useful additional background on the course in general, or on specific topics.
|[AT2005]||G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734- 749, Jun. 2005.|
|[CACM1997]||Communications of the ACM, Special Issue on Recommender Systems, vol. 40, no. 3, Mar. 1997.|
|[DR2008]||B. Dagenais and M. P. Robillard, “Recommending adaptive changes for framework evolution,” in Proceedings of the 30th ACM/IEEE International Conference on Software Engineering, 2008, pp. 481–490.|
|[JZF2010]||D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich, Recommender Systems: An Introduction. Cambridge Univ Press, 2010.|
|[RRS2009]||F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor, Recommender systems handbook. Springer, 2009.|
|[RWZ2010]||M. Robillard, R. Walker, and T. Zimmermann, “Recommendation systems for software engineering,” IEEE Software, vol. 27, no. 4, pp. 80-86, Aug. 2010.|
|[XTL2009]||T. Xie, S. Thummalapenta, D. Lo, and C. Liu, “Data mining for software engineering,” IEEE Computer, vol. 42, no. 8, pp. 35–42, 2009.|
|[YR2011]||A. T.T. Ying and M. P. Robillard, “The Influence of the task on programmer behaviour,” in Proceedings of the 19th IEEE International Conference on Program Comprehension, 2011, pp. 31-40.|
|[ABL2002]||G. Ammons, R. Bodík, and J. R. Larus, “Mining specifications,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2002, pp. 4–16.|
|[AHM2006]||J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?,” in Proceedings of the 28th ACM/IEEE International Conference on Software engineering, 2006, pp. 361–370.|
|[AJL2009]||B. Ashok, J. Joy, H. Liang, S. K. Rajamani, G. Srinivasa, and V. Vangala, “DebugAdvisor: a recommender system for debugging,” in Proceedings of the the 7th Joint Meeting of the European Software Engineering conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 373–382.|
|[BCC2009]||R. Baraglia et al., “Search shortcuts: a new approach to the recommendation of queries,” in Proceedings of the 3rd ACM Conference on Recommender Systems, 2009, pp. 77–84.|
|[BMM2009]||M. Bruch, M. Monperrus, and M. Mezini, “Learning from examples to improve code completion systems,” in Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 213–222.|
|[BMN2011]||C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu, “Don’t touch my code! Examining the effects of ownership on software quality,” in Proceedings of the the 8th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2011.|
|[BOL2010]||S. K. Bajracharya, J. Ossher, and C. V. Lopes, “Leveraging usage similarity for effective retrieval of examples in code repositories,” in Proceedings of the 18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2010, pp. 157–166.|
|[DGH2011]||H. Dumitru et al., “On-demand feature recommendations derived from mining public product descriptions,” in Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering, 2011, pp. 181–190.|
|[DR2011]||E. Duala-Ekoko and M. Robillard, “Using structure-based recommendations to facilitate discoverability in APIs,” In Proceedings of the European Conference on Object-Oriented Progamming, 2011, pp. 79–104.|
|[ECH2001]||D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as deviant behavior: a general approach to inferring errors in systems code,” in Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001, pp. 57–72.|
|[FOM2010]||T. Fritz, J. Ou, G. C. Murphy, and E. Murphy-Hill, “A degree-of-knowledge model to capture source code familiarity,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, pp. 385–394.|
|[FYW2004]||Fang Liu, C. Yu, and Weiyi Meng, “Personalized Web search for improving retrieval effectiveness,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, pp. 28- 40, Jan. 2004.|
|[GS2009]||M. Gabel and Z. Su, “Symbolic mining of temporal specifications,” in Proceedings of the 30th ACM/IEEE International Conference on Software Engineering, 2009, pp. 51–60.|
|[HKR2000]||J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative filtering recommendations,” in Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, 2000, pp. 241–250.|
|[HRR2009]||R. Holmes, T. Ratchford, M. P. Robillard, and R. J. Walker, “Automatically Recommending Triage Decisions for Pragmatic Reuse Tasks,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 397–408.|
|[HST2010]||Hao Zhong, Suresh Thummalapenta, Tao Xie, Lu Zhang, and Qing Wang, “Mining API Mapping for Language Migration,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, p. 195--204.|
|[KN2009]||M. Kim and D. Notkin, “Discovering and representing systematic code changes,” in Proceedings of the 31st ACM/IEEE International Conference on Software Engineering, 2009, pp. 309–319.|
|[KR2009]||D. Kawrykow and M. P. Robillard, “Improving API usage through automatic dDetection of redundant code,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 111–122.|
|[KRW2011]||B. P. Knijnenburg, N. J. M. Reijmer, and M. C. Willemsen, “Each to his own: how different users call for different interaction methods in recommender systems,” in Proceedings of the 5th ACM Conference on Recommender systems, 2011, pp. 141–148.|
|[LM2010]||M. Lipczak and E. Milios, “Learning in efficient tag recommendation,” in Proceedings of the 4th ACM Conference on Recommender Systems, 2010, pp. 167–174.|
|[LZ2005]||Z. Li and Y. Zhou, “PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code,” in Proceedings of the 10th European Software Engineering Conference held jointly with the 13th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2005, pp. 306–315.|
|[MH2002]||A. Mockus and J. D. Herbsleb, “Expertise browser: a quantitative approach to identifying expertise,” in Proceedings of the 24th ACM/IEEE International Conference on Software Engineering, 2002, pp. 503–512.|
|[MXB2005]||D. Mandelin, L. Xu, R. Bodík, and D. Kimelman, “Jungloid mining: helping to navigate the API jungle,” in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005, pp. 48–61.|
|[PG2006]||C. Parnin and C. Gorg, “Building usage contexts during program comprehension,” in Proceedings of the 14th IEEE International Conference on Program Comprehension, 2006, pp. 13-22.|
|[SC2006]||N. Sahavechaphan and K. Claypool, “XSnippet: mining for sample code,” in Proceedings of the 21st ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, 2006, pp. 413–430.|
|[SC2006]||Sunghun Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller, “Predicting faults from cached history,” in Proceedings of the 29th ACM/IEEE International Conference on Software Engineering, 2007, pp. 489-498.|
|[SS2011]||E. I. Sparling and S. Sen, “Rating: how difficult is it?,” in Proceedings of the 5th ACM Conference on Recommender Systems, 2011, pp. 149–156.|
|[TDH2005]||J. Teevan, S. T. Dumais, and E. Horvitz, “Personalizing search via automated analysis of interests and activities,” in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005, pp. 449–456.|
|[VSR2009]||J. Vig, S. Sen, and J. Riedl, “Tagsplanations: explaining recommendations using tags,” in Proceedings of the 14th International Conference on Intelligent User Interfaces, 2009, pp. 47–56.|
|[YF2002]||Y. Ye and G. Fischer, “Supporting reuse by delivering task-relevant and personalized information,” in Proceedings of the 24th ACM/IEEE International Conference on Software Engineering, 2002, pp. 513–523.|
|[ZWD2004]||T. Zimmermann, P. Weissgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” in Proceedings of the 26th ACM/IEEE International Conferences on Software Engineering, 2004, pp. 563–572.|
|[ZZX2009]||H. Zhong, L. Zhang, T. Xie, and H. Mei, “Inferring resource specifications from natural language API documentation,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 307–318.|
This course draws inspiration from many sources, and in particular: discussions on RSSEs with the co-organizers of the RSSE workshop (Walid Maalej, Rob Walker, and Tom Zimmermann), joint work on API property inference with Mira Mezini and Eric Bodden at TU Darmstadt, Ahmed Hassan's course on Mining Software Engineering Data, and the exciting work of my graduate students.