P4: Programming Independent Packet Processors

Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: programming protocol-independent packet processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87–95. DOI:https://doi.org/10.1145/2656877.2656890

The promise of SDN is that a single control place can directly control a whole network of switches. OpenFlow (OF) supports this goal by providing a single, vendor-agnostic API. But the control plane cannot express how the packets howd be processed to best meet the needs of control apps. OF v1.0 (Dec 2009) started simple with abstractions of a single table of rules that could match packets on 12 header field, e.g. MAC addresses, IP addresses, protocol, TCP/UDP port numbers, etc. OF v1.4 (Oct 2013) specification requires 41 header fields. Datacenter network operators want to apply new forms of packet encapsulation (e.g. NVGRE, VXLAN, STT) requiring more header fields. This requires repeatedly extending the OF specification. Instead P4 enables supporting flexible mechanisms for parsing packets & matching header fields, allowing controller apps to leverage these capabilities.

The challenge is to find a sweet spot that balances:

need for expressiveness
- e.g. Click but it is difficult to infer dependencies & map to h/w
ease of implementation across wide range of h/w & s/w switches
- e.g. OpenFlow 1.0 but it is impossible to reconfigure protocol processing

P4 has three goals:

reconfigurability in the field
- controller can re-define packet parsing & processing in the field
protocol independence
- switch not tied to specific packet formats
- controller can specify packet parser for extracting header fields with specific names & types
- controller can specify a collection of typed match+action tables that process these headers
target independence
- a compiler should take switch’s capabilities into account when turning a target-independent P4 description into a target-dependent microcode program.

The work first describes an abstract forwarding model.

Switches forward packets via a programmable parser followed by multiple stages of match+action, arranged in series, parallel, or a combination of both.

OF assumes a fixed parser. P4 supports a programmable parser to allow new headers to be defined.
OF assumes match+action stages are in series. In P4, they can be in parallel or in series.
P4 assumes actions are composed of protocol-independent primitives supported by the switch.

This generalizes how packets are processed on different forwarding devices - Ethernet switches, load-balancers, routers, etc. & by different technologies - fixed-function switch ASICs, NPUs, reconfigurable switches, s/w switches, FPGAs. This allows using a common language, P4, to represent how packets are processed.

The forwarding model is controlled by 2 ops:

Configure
- program the parser
- set the order of match+action stages
- specify the header fields processed by each stage
- determines which protocols are supported
Populate:
- add/remove entries to the match+action tables specified during configuration
- determines the policy applied to the packets

Arriving packets are first handled by the parser. The packet body is buffered separately, unavailable for matching. The parser extracts the fields from the header. The extracted header fields are passed to match+action tables (ingress & egress).

ingress processing: forwarded, replicated (for multicast, span, or to control plane), dropped, or trigger flow control
egress processing: per-instance modifications to header - e.g. for multicast copies

A packet processing language must allow the programmer to express any serial dependencies between header fields. Dependencies determine which tables can be executed in parallel. Dependencies can be identified by analyzing the Table Dependency Graphs (TDG) which describe the field inputs, actions, and control flow between tables.

They propose a 2-step compilation process:

programmers express packet processing programs using an imperative language representing the control flow (P4)
a compiler translates the P4 representation to TDGs to facilitate dependency analysis and then maps the TDG to a specific swtich target using a target specific back-end.

The key concepts in P4 are: headers, parsers, tables, actions, control programs.

They explore P4 by examining a simple example. They consider a L2 network deployment with ToR switches at the edge connected by a two-tier core. The number of end-hosts is growing and the core L2 tables are overflowing. P4 lets us express a custom solution with minimal changes to the network architecture. The routes through the core are encoded by a 32-bit tag.

Header: First, we add a new header, mTag with fields 8-bit up1, up2, down1, down2, and 16-bit ethertype.
Packet Parser: P4 assumes the underlying switch can implement a state machine that traverses packet headers from start to finish, extracting field values as it goes. Parsing starts in the start state and proceeds until an explicit stop state is reached or an unhandled case is encountered (error). The extracted headers are forwarded to match+action processing in the next-half of the switch pipeline.
Table Specification: The programmer then describes how the defined header fields are matched in match+action stages (e.g. exact, prefix, wildcards, etc.) and what actions should be performed when a match occurs.
Action Specification: P4 defines a collection of primitive actions from which more complicated actions are built. The programmer specifies the complex actions made up of primitive actions such as set_field, copy_field, add_header, remove_header, increment, checksum. P4 assumes parallel execution of primitives within an action function.
Control Program: The programmer then specifies the control flow from one table to another via an imperative representation using functions, conditionals and table references.

Strengths

This work unlocks the potential of SDNs further by proposing a configuration language & compilers that generate low-level target-specific configurations including h/w optimizations to provide greater expressivity instead of repeatedly extending the OpenFlow specification to meet the needs of control plane applications.
The work stresses on resolving dependencies between match+action tables to parallelize the execution as much as possible on the switch h/w. P4 is designed to make it easy to translate an imperative control program into a table dependency graph. The design also lets the compiler optimize for the specific target: support RAM & TCAM, parallel tables, few tables, final writes.

Weaknesses

SDN is successful because one can control a whole network of switches, which led to cost savings as well (by simplifying the switch functionality to only provide forwarding), even though one had to re-write interfaces for applications. Even though P4 supports the goal of SDN offering more flexibility to the controller applications, P4 would have limited or no benefits with low-cost targets like fixed-function switches. The potential targets of P4 are expensive.

Future Work

The work emphasizes that P4 is a first step towards more expressivity in the data plane. In this proposal, several aspects of a switch are undefined: e.g. congestion control primitives, queueing disciplines, traffic monitoring. Each of these are areas of future work.