School of Computer Science

Computer Science COMP 420 (Fall term)
Secondary Storage Algorithms and Data Structures

"Secondary Storage Algorithms and Data Structures", COMP 420, introduces data structures and algorithms for data on secondary storage, notably queries and processing. The course looks at sequential files, tree-structured files, and direct-access files and at algorithms for applications requiring various levels of volatility, symmetry, and activity.

We also examine file access across the Internet, using sockets and TCP/IP.

File and socket programming is in Java. (The class package documentation is especially useful.)

Based on the categories covered, we discuss design techniques for specific applications (which have in the past included student records and employment databases, natural language retrieval, astronomical queries, musicology queries, air traffic control, restaurant guide, city property registry, world geographical database of towns, personnel database, hospital ward monitor, and a medical insurance database).

The course also briefly introduces databases, using McGill's advanced relational database system, relix.

Fundamental to this course is that secondary storage is memory organized differently from main memory (RAM), so that data structures, algorithms, complexity considerations and languages must all be rebuilt differently from the familiar computer science for RAM.

COMP 420                        Secondary Storage Algorithms and Data Structures

                            COURSE SUMMARY


Week            Topic                                                     Asst.
       
1               Overview of Aldat [PDF slides 1797K]
                Java classes and sequential I/O                            1
                Java and UNIX
                Files, records, blocks, RAM and secondary storage
		 [Slides ps 291K pdf 214K] [Exercises ps 133K pdf 61K]
		 [Text ps 303K pdf 203K]
2               Sequential Files:
		 [Slides ps 109K pdf 68K] [Exercises ps 124K pdf 52K]
		 [Text ps 150K pdf 133K]
                   searching                                               2
                   merging                                                 3
                   sorting
                Random access in Java                                      4
3               Logarithmic Files:
		 [Slides ps 314K pdf 245K] [Exercises ps 206K pdf 102K]
		 [Text ps 530K pdf 302K]
                   B-trees: searching, inserting, splitting, overflows     5
                   Volatility, activity
                   Tries: text; variable-resolution [Notes ps 340K pdf 202K]
                   (Supplementary notes: Building and Searching Full Tries
                    Slides on the pointerless representation for tries)
                   Symmetry: quad trie, kd-tree, kd-trie, z-order          6
5               Direct Files:
		 [Slides ps 499K pdf 674K] [Exercises ps 122K pdf 125K]
		 [Text ps 444K pdf 302K]
                   Hashing: division/remainder, multiplicative
                            collision resolution - linear probing,         7
                                                   separate chaining
                   (Supplementary notes: Hash Delete with Linear Probing)
                   Volatility: virtual hashing, linear hashing
                   Activity: tidying, distributions
                   Symmetry: multipaging - static and dynamic
                   (Supplementary notes: Multipage Search and Insert)
7               Cost Analysis
                   Usage distributions: sequential/direct breakeven
                   Probe and load factors: B-trees and hashing
                   Hashing overflows - theory and measurement              8
                   (Supplementary notes: Cost Analysis for Trees)
                File Design [Exercises ps 179K pdf 74K]
                   Using  Access Categories, Activity, Volatility, Symmetry
                   and Cost Analysis					   9
                   (Supplementary notes: Design Problem and Answers)
8               Relational Databases:
		 [Slides ps 191K pdf 261K] [Exercises ps 42K pdf 48K]
                   Relations
                   Relational Algebra 
                   relix                                                  10
9               Domain Algebra                                         11
                   Misc:
                     File support for concurrency [Slides ps 136K pdf 101K];
                     High dimensionality [Slides ps 139K pdf 101K]
                     Trie Join [PDF slides 70K]

The links from this web page constitute the text for the course. They are reasonably complete, but some omissions and errors have been left deliberately, and will be addressed in the lectures.
Note on plagiarism

Computer Science COMP 420 (Fall term) Secondary Storage Algorithms and Data Structures

Computer Science COMP 420 (Fall term)
Secondary Storage Algorithms and Data Structures