Network Requirements for Resource Disaggregation

Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network requirements for resource disaggregation. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 249–264.

Recent industry trends suggest a paradigm shift to a disaggregated datacenter (DDC) architecture containing a pool of resources, each built as a standalone resource blade and interconnected using a network fabric. This work derives the minimum latency and bandwidth requirements that the network in DDCs must provide to avoid degrading app-level performance & explores the feasibility of meeting these requirements with existing system designs and commodity networks, in the context of 10 workloads spanning 7 open-source systems:

The workloads can be classified into two classes based on their performance characteristics.

The work makes the following assumptions:

The design knobs in the study are:

For I/O traffic to storage devices, the current latency & b/w requirements allow us to consolidate them into the network fabric with low performance impact, assuming we have a 40Gbps or 100Gbps network. The dominant impact to app performance comes from CPU-memory disaggregation.

The work compares the performance of apps in a server-centric architecture to its performance in the disaggregated context (represents the worst-case in terms of potential degradation, compared to re-write). To emulate remote memory accesses, they implement a special swap device backed by some amount of physical memory rather than disk, and intercept all page faults & injects artificial delays to emulate network RTT latency and bandwidth for each paging operation. This does not model queueing delays.

The work shows the following:

The work then evaluates the impact of queueing delay. They collect a remote memory access trace from their instrumentation tool: a network access trace using tcpdump and a disk access trace using blktrace. Then, they translate these traces to network flows in their simulated disaggregated cluster. They consider five protocols and use simulation to evaluate their network-layer performance measured in FCT (Flow Completion Time) under the generated traffic workloads:

They find the following:

The authors also built a kernel-space RDMA block device driver which serves as swap device. The local CPU can now swap to remote memory instead of disk.

Strengths

Weaknesses

Future Work

There are numerous directions for future work: