Data center networks are important facilities for cloud computing and there is a long line of work on topology design:
Previous designs focussed on achieving high performance at low cost, but not much attention to designing for manageability. This work considers management complexity of data center topology design.
How does the complexity of managing data centers depend on the topology? This work focuses on studying lifecycle management of data center topologies. Life cycle management is the process of building a network, physically deploying it on the data center floor and expanding it over several years so that it is available for use by a constantly increasing set of services.
Lifecycle management complexity is important:
To understand lifecycle management there are several challenges to be solved:
Many dimensions to lifecycle management: packaging, cable wiring, placement, cable rewiring during expansion. All these problems need to respect constraints from all kinds of data center devices such as switches, racks, patch panels, cable trays.
They derive metrics that capture the complexity of deployment and expansion of topologies.
Packaging Metric: Number of switches
Packaging is the first step in deployment. Packaging is carefully arranging servers and switches into the data center rack. The number of switches directly determine packaging complexity, so this is used as a metric.
Wiring Complexity Metric: Number of cable bundle types (bundle capacity, bundle length)
After packing servers and switches into racks, they are connected. If the connected switches are within the same rack, we can use short and cheap cables to connect them. If connected switches are in different racks and far from each other on the data center floor, we use inter-rack links, which are routed over cable trays over the data center racks. The main wiring complexity comes from the inter-rack links.
In data center scales, there are too many fibers to be handled individually. So we use cable bundling. A cable bundle is a fixed number of identical-length fibers between two clusters of network devices, characterized by capacity (number of fibers in bundle) and length.
If the number of bundle types increase, we can expect that the complexity of manufacturing, packaging and cable routing can increase. So we use number of bundle types as a metric to quantify the wiring complexity.
Wiring Complexity Metric: Number of patch panels
Patch panel is a network device that acts as an aggregator. It is a device where fibers from different bundles are connected manually.
Topology expansion leads to capacity drop. To maintain a given residual capacity, topology expansion is conducted in multiple steps. The number of expansion steps is a natural metric to quantify expansion complexity.
It is hard to move existing links which has already been routed over cable trees. To solve this, patch panels are used. First, all existing bundles are connected to some patch panel. During expansion, we need to connect the new links to the existing patch panels. We can complete the rewiring process entirely on the patch panels. But this rewiring is manual, dominating the time for each expansion step. So, they use the number of rewired links per patch panel rack as a metric to quantify single expansion step complexity.
The paper studies Clos, Jellyfish and Xpander. Clos performs better in terms of patch panel usage and bundling. Jellyfish performs better in terms of switches, re-wired links per patch panel rack & expansion complexity. No topology dominates by all the metrics.
Principle 1: Importance of topology regularity
Jellyfish is a random graph which leads to non-uniform bundles between switch clusters. In large scale, Jellyfish has 1 order of magnitude more bundle types than Clos.
Principle 2: Importance of maximizing intra-rack links
In Clos, some regular part with dense links can be packed into a single rack. These links are cheap, short and easy to manage. But, most links in Jellyfish are inter-rack links, which leads to more patch panel usage & high wiring complexity.
Principle 3: Importance of fat edge
Fat edge can significantly reduce the expansion capacity. The network edge can be divided it’s out-links into two categories: links connecting servers called southbound links, and links connecting switches are called northbound links. A thin edge has equal number of northbound and southbound links. A fat edge has more northbound links than southbound links.
Suppose, we want to keep residual capacity during expansion at 75%. Since rewiring leads to capacity drop, we need to drain traffic before rewiring. For thin edge, to maintain residual capacity, we can drain & rewire at most 25% links at the edge leading to 25% capacity drop. For fat edge, even draining 50% link leads to 0 capacity drop. So, more links can be rewired in a single expansion step.
Jellyfish has fat-edge so it has fewer expansion steps while Clos has thin-edge and hence more expansion steps.
FatClique is a topology with 3 hierarchies. In the lowest hierarchy, they build a sub-block (clique of switches) with the design goal of packing multiple sub-blocks into a single rack to maximize intra-rack links. In the second hierarchy, they build the blocks which is a clique of sub-blocks. They compose a whole network with a clique of blocks with the design goal of large enough blocks to form uniform bundles between each other.
By construction, FatClique satisfies the regularity principle. But maximizing intra-rack links and achieving fat edge is conflicting. To achieve fat-edge, we need to decrease the intra-rack links per switch in a sub-block. Achieving fat edge also conflicts with minimizing switch usage.
There are other constraints in topology design such as providing right amount of capacity, minimizing rack fragmentation, minimizing overall cable length, etc.
The solution is a constraint-based search. The whole design space can be described by 5 variables. An example is imposing a fat edge at a switch: #northbound links > #southbound links. They use this formulation to search a topology with a given capacity and optimal manageability objectives.
They designed highly optimized placement algorithm, topology expansion algorithm, topology search algorithm for each topology families. They evaluate different topology families in large scale. FatClique performs best in all deployment complexity metrics.
Also, FatClique is as good as expanders. It only uses 3 or 4 expansion steps even when the residual capacity requirement is tight. For Clos, the residual capacity requirement is very high and requires more than 20 steps to expand the whole topology. Thus, FatClique enables higher availability.
In particular, FatClique uses 50% fewer switches, 33% fewer patch panels than Clos at large scale, 23% lower cabling cost, and can permit fast expansion while degrading network capacity by small amounts (2.5 - 10%), while Clos can take 5x longer to expand the topology, each Clos expansion step requires 30-50% more links to be rewired at each step per patch panel.
This work is a first step towards designing for manageability, which is a critical dimension in data center topology design. The paper designs several metrics to quantify the management complexity. The paper uses the principles learned from the existing topology families, to design a new topology family with same-performance and faring well in all the management complexity metrics.
The study developed quite a few optimal algorithms for various competing topology families: Clos generation, Jellyfish placement, FatClique expansion, optimal expansion solutions for Clos, FatClique topology synthesis and also quantified other properties of the topologies: Spectral Gap to approximate Edge Expansion, and number of shortest paths between two ToR switch from different pods in Clos and computing the number of paths no longer than this in other topologies to compute the path diversity. Jellyfish, Xpander and FatClique have similar and higher path diversity than Clos, and have shorter paths than Clos.
The theoretical metric to quantify latency is network diameter. Even though FatClique has a smaller diameter, but does not have many shortest paths. This study has not considered practical routing in their design, and so cannot conclude about the latency for FatClique.
Even though the management complexity metrics they come up with aids in deciding about competing topology designs, it would be more helpful if the study could get cost-performance numbers for each design. Big data center operators could provide some ballpark numbers for translating the complexity metrics to cost, and then the cost-performance curves would be interesting to study.
The paper leaves good pointers for further work on manageability of data center topology designs: