Optimizing data placement for distributed computation

Lukasz Golab - University of Waterloo

Feb. 28, 2014, 1 p.m. - Feb. 28, 2014, 2 p.m.

MC103


I will discuss the following problem: given a set of data items, a set of tasks that reference these data items, and a set of servers with finite storage and processing capacities, allocate the data items to the servers in a way that minimizes the amount of data that needs to be transferred among servers during task execution. This problem arises in many practical scenarios including cloud databases. I will show that this problem can be reduced to the well-studied graph partitioning problem, which is NP-hard, but for which efficient approximation algorithms exist. I will also discuss how to handle load balancing and data replication. This is joint work with Marios Hadjieleftheriou (AT&T), Howard Karloff (Yahoo!) and Barna Saha (AT&T).

Lukasz Golab is a faculty member in the Management Sciences department at the University of Waterloo. Previously he was a Senior Member of Research Staff at AT&T Labs. He obtained a BSc from the University of Toronto in 2001 and a PhD from the University of Waterloo in 2006. His research interests are in database systems, data mining and energy data management.