PRISM: Rethinking the RDMA Interface for Distributed Systems

Matthew Burke, Sowmya Dharanipragada, Shannon Joyner, Adriana Szekeres, Jacob Nelson, Irene Zhang, and Dan R. K. Ports. 2021. PRISM: Rethinking the RDMA Interface for Distributed Systems. In ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21), October 26–29, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3477132.3483587

The past few years have seen a continual increase in network bandwidth and as signaled by Moore’s law, CPU speeds have become to stagnate. It has become critical to reduce the involvement of the CPU when it comes to networking. RDMA achieves this by providing a standard accelerated interface to access remote memory directly through the network. Following these hardware trends, recent years have seen a plethora of distributed systems work that have been redesigned to utilize RDMA. These include key-value stores, distributed transactions, & replicated storage systems.

The RDMA interface has two kinds of ops:

They both come with their own tradeoffs:

Motivating Example

Many implementations of remote data structures use indirection. For example, hash table used in Pilaf (ATC’13) and FaRM (NSDI’14) key-value stores, two state-of-the-art RDMA systems, is a hash-based indexed data structure that maintains pointers to the actual value stored inside the key-value store. A get op could be implemented using RDMA as follows:

Additional network roundtrips can make one-sided implementations slower. What can we do about this?

PRISM: Primitives for Remote Interaction with System Memory

This work considers standard extensions to the RDMA interface. How to design the API?

They present PRISM: Primitives for Remote Interaction with System Memory based on:

The PRISM API: list of proposed extensions to the traditional RDMA interface

This is chosen based on their applicability to a large class of systems as well as their feasibility to implement across different platforms.

Indirect Reads with PRISM

Going back to the example, we can issue an indirect read PRISM call. The target address specified by the indirect read is interpreted as an address of a pointer. This pointer is then dereferenced to obtain the value that is returned to the client.

More complicated patterns of indirection can be implemented using operation chaining.

Prototyped PRISM primitives on two different network stacks:

Three case studies to demonstrate the benefits of PRISM:

Each of these apps are widely used in practice. The proposed extensions allows us to efficiently implement all of these apps entirely in terms of one-sided PRISM primitives.

Two kinds of benefits to using PRISM:

Transaction Processing with PRISM - PRISM-TX

FaRM is a state-of-the-art transaction processing sytem over RDMA. We consider a storage system where data is partitioned among multiple servers and clients group their ops into transactions. During the execution phase, they either read or write data from different servers. Reads use one-sided RDMA and writes are buffered locally in the execution phase. After all the operations finish executing, the transaction enters the commit phase. FaRM uses a variant of the two-phase commit protocol:

In FaRM, the Update phase & Lock phase have to use RPCs but the read set can be validated using one-sided RDMA reads.

Can we build something better than FaRM?

The execution phase of FaRM needs no further improvement since all communication that occurs between client & servers here uses only one-sided ops. There is scope for improvement in the commit phase. We can replace the RPCs in the Lock & Update phases with the PRISM API. But we need to make some modifications first.

With these changes, all the OCC checks in the commit phase can use PRISM’s Enhanced CAS primitives with no additional CPU involvement.

The primitives enable a new class of one-sided OCC mechanisms.

They compared PRISM-TX with their implementation of the FaRM protocol using both hardware & software RDMA implementations for FaRM. PRISM-TX outperforms FaRM in both throughput and latency with a lower message complexity enabled by PRISM’s primitives, about 5 micros faster than FaRM, & reached about a 1M more txns/s before saturating the network.

PRISM significantly expands the design space for RDMA-based applications by offering a middle ground between the restrictive RDMA R/W interface and the full-generality of RPC communication.

Strengths

Weaknesses

Future Work