In-Network Support for microsecond-scale Remote Procedure Calls

Marios Kogias - MSR Cambridge

Sept. 24, 2021, 2:30 p.m. - Sept. 24, 2021, 3:30 p.m.

Zoom (see link below)

Hosted by: Oana Balmau


Abstract: Online services play a major role in our everyday life for communication, entertainment, socializing, e-commerce, etc. These services run inside datacenters under strict tail-latency service level objectives in order to remain interactive. The emergence of new hardware for IO has enabled microsecond-scale datacenter communications that challenge the efficiency of existing operating systems and network mechanisms. Also, new in-network programmable devices start being deployed in data centers and introduce a new computing paradigm that shifts functionality traditionally performed at the end-points to the network.

 

In this talk, I am going to focus on network support for microsecond-scale Remote Procedure Calls (RPCs) and explore the possibilities of pushing functionality in the network leveraging in-network compute. In the first part of the talk, I will present R2P2[ATC 2019]. R2P2 is a transport protocol specifically designed for datacenter RPCs, that exposes the RPC abstraction to the endpoints and the network, making RPCs first-class datacenter citizens. R2P2 is specifically designed for in-network policy enforcement. I will show how using R2P2 allowed us to offload RPC scheduling to programmable switches that can schedule requests directly on individual CPU cores. In the second part of the talk, I will introduce HovercRaft[Eurosys2020]. HovercRaft proposes a new way to build fault-tolerant services by implementing fault-tolerance at the RPC layer. HovercRaft extends the Raft protocol and carefully eliminates CPU and I/O bottlenecks while using in-network compute for the fan-out and fan-in management. As a result, HovercRaft manages to increase both the resilience and the performance of general state-machine replication by adding nodes in the cluster.

 

Speaker bio: Marios Kogias is currently a researcher at MSR Cambridge. He is joining Imperial College London as a Lecturer (Assistant Professor) in summer 2022. He graduated from EPFL in August 2020. His main research focus is at the intersection of operating systems and networking in the data center. He’s working on building and understanding systems with strict tail-latency SLOs leveraging new emerging hardware. He was an IBM Ph.D. Fellow and won the best student paper award at Eurosys2020 and the honorable mention for the Roger Needham Ph.D. award in 2021. Before joining EPFL he got his undergrad degree from the National Technical University of Athens.

 

Zoom link: https://mcgill.zoom.us/j/84280715850 (zoom login required)

 

Reception after the talk in gather town: https://gather.town/app/tYHHMh7tPcPw9037/reception