Overview

Software developers, like many other types of knowledge workers, have sophisticated information needs, while at the same time being overwhelmed with information. For example, searching for "How do I send an email with Java" finds 143,000,000 related documents including articles, forum posts, mailing list archives, etc. This could be too much information. Recommendation Systems are tools that help users navigate large information spaces by providing guidance and assistance in the form of recommendations, or pieces of information estimated to be relevant in the context of a given task. Recommendation Systems for Software Engineering, or RSSEs, provide recommendations in highly technical contexts where analyses of structured data (such as source code) must often complement traditional data mining techniques. For a more detailed overview of RSSEs, read this IEEE Software article, and check out this website.

Learning Outcomes

The course will cover topics in three major areas:

The course will also help you practice and improve important soft skills for researchers:

Course Work and Evaluation

The course will involve a combination of "roadmap" lectures and invited lectures on selected topics, student presentations and discussion of research papers, and "work-in-progress" project presentations. The course will involve a major project: the development of a prototype RSSE. The final grade will take into account the project (50%), class participation (30%) and a take-home exam (20%). The course will be based on the book "Recommender Systems: An Introduction" by Jannach et al., 2010, and from selected scientific papers.

Official Academic Integrity Statement McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/students/srr for more information).

Language Policy In accord with McGill University’s Charter of Students’ Rights, students in this course have the right to submit in English or in French any written work that is to be graded.

Seminars

The majority of class times are reserved for the presentation and discussion of research papers ("seminars"). For each paper, each student will be assigned the role of "presenter", "discussant", or "audience". The class participation grade will be based on the performance in these respective roles.

Project

The course project is to develop a prototype RSSE. You can chose whatever application and technique you like, as long as it involves the analysis of software engineering artifacts. Although you will be expected to develop a complete and functional RSSE, you are encouraged to focus on a specific aspect that corresponds to your research area of interest (e.g., mining algorithms, data preprocessing, UIs, etc.)

At the same time as you develop the technical aspects of your project, you will write a report on it using ACM's conference formatting guidelines. There are three milestones:

  1. Proposal: Email the instructor (by 26 January 11:53pm) a 3-page description of the RSSE you want to build. Pitch your project proposal to your classmates on 30 January and collect their feedback and reaction. Would they fund your project?
  2. Midterm report: Email the instructor (by 24 February 11:55pm) a 5-page report (extended from your proposal) that focuses on the motivation, main techniques, and general architecture of the RSSE. Your report should include details of the data sources and mining algorithms you use, references to reused packages, and illustrations of non-standard techniques or algorithms employed. The report should also include a brief discussion of (at least) the two or three most related works, with bibliographic references. Describe your system in class on 27 February.
  3. Grand finale: Present your completed system, including a live demo, on 11 or 16 April. Email your final, 8-page report (extended from the midterm report) to the instructor before April 17, 11:59pm.

Details on the format of the reports and presentation, and general guidelines and advice, will be provided in class.

Final Exam

A one-page essay answering a synthesis question, to be completed on your own within a 24-hour period at some point after the project demos.

Schedule

This schedule is subject to change. Seminar articles are in bold.

DateClass TopicsReading
Mon 9 JanIntroduction to software engineering research. Roadmap: Recommendation systems. Overview of the project.[RWZ2010]
Wed 11 JanRoadmap: Data mining software repositories[XTL2009]
Mon 16 JanSeminar: Early Systems: CodeBroker and ExpertiseBrowser[YF2002] [MH2002]
Wed 18 JanSeminar: Recommendations for the web: tags and shortcuts[LM2010] [BCC2009]
Mon 23 JanSeminar: Applications of content-based recommendations: features and bug reports[AHM2006] [DGH2011]
Wed 25 JanSeminar: Code comprehension: reuse and debugging[HRR2009] [AJL2009]
Mon 30 JanProject proposals
Wed 1 FebSeminar: Mining code usage[LZ2005] [BMM2009]
Mon 6 FebSeminar: Finding code examples[SC2006] [BOL2010]
Wed 8 FebSeminar: Synthesizing code examples[MXB2005] [DR2011]
Mon 13 FebInvited Lecture: Partial program analysis and the SemDiff recommender[DR2008]
Wed 15 FebInvited Lecture: Mining user interaction data[YR2011]
Mon 20 FebNo class - Study break
Wed 22 FebNo class - Study break
Mon 27 FebWork in progress presentations
Wed 29 FebSeminar: Specification Mining[ABL2002] [GS2009]
Mon 5 MarSeminar: API property inference [ZZX2009] [HST2010]
Wed 7 MarSeminar: Code Quality[ECH2001] [KR2009]
Mon 12 MarSeminar: Bug prediction[SZW2007] [BMN2011]
Wed 14 MarSeminar: Software Evolution[ZWD2004] [KN2009]
Mon 19 MarRoadmap: Metrics and evaluation[RRS2009] Chapter 8 [JZF2010] Chapter 7
Wed 21 MarSeminar: Personalization[FYW2004] [TDH2005]
Mon 26 MarSeminar: Interaction traces[PG2006] [FOM2010]
Wed 28 MarSeminar: User interfaces[KRW2011] [SS2011]
Mon 2 AprSeminar: Explanation[HKR2000] [VSR2009]
Wed 4 AprRoundtable: Privacy Issues in Recommender SystemsSelected by students
Mon 9 AprNo class - Easter Monday
Wed 11 AprProject presentations
Mon 16 AprProject presentations

Reading List

General References

Sources not explicitly discussed as part of the seminar, but that will provide useful additional background on the course in general, or on specific topics.

[AT2005]G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734- 749, Jun. 2005.
[CACM1997]Communications of the ACM, Special Issue on Recommender Systems, vol. 40, no. 3, Mar. 1997.
[DR2008]B. Dagenais and M. P. Robillard, “Recommending adaptive changes for framework evolution,” in Proceedings of the 30th ACM/IEEE International Conference on Software Engineering, 2008, pp. 481–490.
[JZF2010]D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich, Recommender Systems: An Introduction. Cambridge Univ Press, 2010.
[RRS2009]F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor, Recommender systems handbook. Springer, 2009.
[RWZ2010]M. Robillard, R. Walker, and T. Zimmermann, “Recommendation systems for software engineering,” IEEE Software, vol. 27, no. 4, pp. 80-86, Aug. 2010.
[XTL2009]T. Xie, S. Thummalapenta, D. Lo, and C. Liu, “Data mining for software engineering,” IEEE Computer, vol. 42, no. 8, pp. 35–42, 2009.
[YR2011]A. T.T. Ying and M. P. Robillard, “The Influence of the task on programmer behaviour,” in Proceedings of the 19th IEEE International Conference on Program Comprehension, 2011, pp. 31-40.

Seminar Articles

[ABL2002]G. Ammons, R. Bodík, and J. R. Larus, “Mining specifications,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2002, pp. 4–16.
[AHM2006]J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?,” in Proceedings of the 28th ACM/IEEE International Conference on Software engineering, 2006, pp. 361–370.
[AJL2009]B. Ashok, J. Joy, H. Liang, S. K. Rajamani, G. Srinivasa, and V. Vangala, “DebugAdvisor: a recommender system for debugging,” in Proceedings of the the 7th Joint Meeting of the European Software Engineering conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 373–382.
[BCC2009]R. Baraglia et al., “Search shortcuts: a new approach to the recommendation of queries,” in Proceedings of the 3rd ACM Conference on Recommender Systems, 2009, pp. 77–84.
[BMM2009]M. Bruch, M. Monperrus, and M. Mezini, “Learning from examples to improve code completion systems,” in Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 213–222.
[BMN2011]C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu, “Don’t touch my code! Examining the effects of ownership on software quality,” in Proceedings of the the 8th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2011.
[BOL2010]S. K. Bajracharya, J. Ossher, and C. V. Lopes, “Leveraging usage similarity for effective retrieval of examples in code repositories,” in Proceedings of the 18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2010, pp. 157–166.
[DGH2011]H. Dumitru et al., “On-demand feature recommendations derived from mining public product descriptions,” in Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering, 2011, pp. 181–190.
[DR2011]E. Duala-Ekoko and M. Robillard, “Using structure-based recommendations to facilitate discoverability in APIs,” In Proceedings of the European Conference on Object-Oriented Progamming, 2011, pp. 79–104.
[ECH2001]D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as deviant behavior: a general approach to inferring errors in systems code,” in Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001, pp. 57–72.
[FOM2010]T. Fritz, J. Ou, G. C. Murphy, and E. Murphy-Hill, “A degree-of-knowledge model to capture source code familiarity,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, pp. 385–394.
[FYW2004]Fang Liu, C. Yu, and Weiyi Meng, “Personalized Web search for improving retrieval effectiveness,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, pp. 28- 40, Jan. 2004.
[GS2009]M. Gabel and Z. Su, “Symbolic mining of temporal specifications,” in Proceedings of the 30th ACM/IEEE International Conference on Software Engineering, 2009, pp. 51–60.
[HKR2000]J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative filtering recommendations,” in Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, 2000, pp. 241–250.
[HRR2009]R. Holmes, T. Ratchford, M. P. Robillard, and R. J. Walker, “Automatically Recommending Triage Decisions for Pragmatic Reuse Tasks,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 397–408.
[HST2010]Hao Zhong, Suresh Thummalapenta, Tao Xie, Lu Zhang, and Qing Wang, “Mining API Mapping for Language Migration,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, p. 195--204.
[KN2009]M. Kim and D. Notkin, “Discovering and representing systematic code changes,” in Proceedings of the 31st ACM/IEEE International Conference on Software Engineering, 2009, pp. 309–319.
[KR2009]D. Kawrykow and M. P. Robillard, “Improving API usage through automatic dDetection of redundant code,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 111–122.
[KRW2011]B. P. Knijnenburg, N. J. M. Reijmer, and M. C. Willemsen, “Each to his own: how different users call for different interaction methods in recommender systems,” in Proceedings of the 5th ACM Conference on Recommender systems, 2011, pp. 141–148.
[LM2010]M. Lipczak and E. Milios, “Learning in efficient tag recommendation,” in Proceedings of the 4th ACM Conference on Recommender Systems, 2010, pp. 167–174.
[LZ2005]Z. Li and Y. Zhou, “PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code,” in Proceedings of the 10th European Software Engineering Conference held jointly with the 13th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2005, pp. 306–315.
[MH2002]A. Mockus and J. D. Herbsleb, “Expertise browser: a quantitative approach to identifying expertise,” in Proceedings of the 24th ACM/IEEE International Conference on Software Engineering, 2002, pp. 503–512.
[MXB2005]D. Mandelin, L. Xu, R. Bodík, and D. Kimelman, “Jungloid mining: helping to navigate the API jungle,” in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005, pp. 48–61.
[PG2006]C. Parnin and C. Gorg, “Building usage contexts during program comprehension,” in Proceedings of the 14th IEEE International Conference on Program Comprehension, 2006, pp. 13-22.
[SC2006]N. Sahavechaphan and K. Claypool, “XSnippet: mining for sample code,” in Proceedings of the 21st ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, 2006, pp. 413–430.
[SC2006]Sunghun Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller, “Predicting faults from cached history,” in Proceedings of the 29th ACM/IEEE International Conference on Software Engineering, 2007, pp. 489-498.
[SS2011]E. I. Sparling and S. Sen, “Rating: how difficult is it?,” in Proceedings of the 5th ACM Conference on Recommender Systems, 2011, pp. 149–156.
[TDH2005]J. Teevan, S. T. Dumais, and E. Horvitz, “Personalizing search via automated analysis of interests and activities,” in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005, pp. 449–456.
[VSR2009]J. Vig, S. Sen, and J. Riedl, “Tagsplanations: explaining recommendations using tags,” in Proceedings of the 14th International Conference on Intelligent User Interfaces, 2009, pp. 47–56.
[YF2002]Y. Ye and G. Fischer, “Supporting reuse by delivering task-relevant and personalized information,” in Proceedings of the 24th ACM/IEEE International Conference on Software Engineering, 2002, pp. 513–523.
[ZWD2004]T. Zimmermann, P. Weissgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” in Proceedings of the 26th ACM/IEEE International Conferences on Software Engineering, 2004, pp. 563–572.
[ZZX2009]H. Zhong, L. Zhang, T. Xie, and H. Mei, “Inferring resource specifications from natural language API documentation,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 307–318.

Acknowledgements

This course draws inspiration from many sources, and in particular: discussions on RSSEs with the co-organizers of the RSSE workshop (Walid Maalej, Rob Walker, and Tom Zimmermann), joint work on API property inference with Mira Mezini and Eric Bodden at TU Darmstadt, Ahmed Hassan's course on Mining Software Engineering Data, and the exciting work of my graduate students.