FairCloud:
Sharing the Network in Cloud Computing
In cloud computing, networks are shared in a best-effort manner
making it hard for both tenants & cloud providers to reason about
how network resources are allocated. The reason networks are more
difficult to share is because the network allocation of a VM X depends
not only on the VMs running on the same machines as X, but also other
VMs that X communicates with and the cross-traffic on each link used by
X.
The work identifies the desirable requirements for bandwidth
allocation across multiple tenants:
- min-guarantee: provide a minimum absolute bandwidth guarantee for
each VM. This is key for achieving predictable app performance.
- high utilization: do not leave network resources underutilized when
there is unsatisfied demand. Important for throughput-sensitive apps
like MapReduce.
- network proportionality: share bandwidth between tenants based on
their payments, like CPU & memory resources.
Fully utilized links are termed as congested links.
There is a hard trade-off between achieving network proportionality
& providing each VM a useful bandwidth guarantee. Given a network
proportional allocation, a tenant can arbitrarily reduce another
tenant’s bandwidth by increasing the number of VMs communicating with a
VM.
Even in the absence of the min-guarantee requirement, there is a
tradeoff between network proportionality and high utilization. This is
because a tenant might be disincentivized to use an uncongested path due
to decrease in the tenant’s network propotional allocation along a
congested link. This hurts overall network utilization. In other words,
there are no utilization incentives for such network proportional
allocation.
- Congestion Proportionality: network proportionality restricted to
congested paths that involve more than one tenant. But even here, a
tenant can modify her demand to get a higher allocation & decrease
system utilization - i.e. not strategy proof.
- Link Proportionality: network proportionality restricted to a single
link. The only constraint is that the weight of any tenant K on a link L
is the same for any communication pattern between K’s VMs communicating
over L and any distribution of the VMs as sources and destinations.
Since allocation is independent across different links, this can achieve
high utilization.
Traditional allocation properties don’t work.
- Per-flow fairness can lead to unfair allocations at VM granularity
by instantiating more flows.
- Per-SD (source-destination pair): each SD pair is allocated an equal
share of a link’s b/w regardless of the number of flows between them.
Not communication-pattern independent: many-to-many gives O(n^2) b/w
share, one-to-one gives O(n) b/w share.
- Per-source & Per-destination: fair to either sources or
destinations, but not fair to the other. This is asymmetry. Neither
provides min-guarantee - one can arbitrarily reduce another’s.
Network sharing properties:
- Work conservation: A link is either fully allocated, or it satisfies
all demands.
- Strategy-proofness: Tenants cannot improve their allocations by
lying about their demands.
- Utilization incentives: Restricted form of strategy-proofness.
Tenants are never incentivized to reduce their actual demands on
uncongested paths or to artificially leave links underutilized.
- Communication-pattern independence: allocation of a VM depends only
on the VMs it communicates with & not on the communication
pattern.
- Symmetry: Switching the directions of all the flows in the network,
then the reverse allocation should match the forward allocation.
Different apps value one direction over the another.
Three allocation policies in the tradeoff space:
- PS-L, Proportional Sharing at Link-level: Each switch implements a
WFQ (weighted fair queueing) and has one queue for each tenant. The
weight of the queue for tenant A on link L is the sum of the weights of
A’s VMs that communicate over the link L. One can use weight for A to be
total weights of all A’s VMs or apply PS-L at a per VM granularity to
remove incentives to send traffic between all of one’s VMs. Not
strategy-proof.
- PS-N, Proportional Sharing at Network-level: Communication between
the VMs in a set has the same total weight through the network
irrespective of the communication pattern between the VMs. PS-L can be
extended to incorporate information regarding the global communication
pattern of a tenant’s VMs. Drawback is each VM’s weight is statically
divided across the flows with other VMs irrespective of traffic demands.
Lacks utilization incentives.
- PS-P, Proportional Sharing on Proximate Links: Offers useful min b/w
guarantees (no network proportionality forms as above two). Prioritizes
VMs close to a given link: per-source fair sharing for traffic towards
root of the tree (core) & per-destination fair sharing for traffic
from the root (core). Can be implemented per tenant.
The above policies can be deployed with:
- full switch support: each switch must provide a number of queues
>= the number of tenants communicating through it & support
WFQ.
- partial switch support: Implementation using CSFQ (core stateless
FQ).
- no switch support: hypervisor-only implementations, either
centrallized controller-based enforcing rate-limiters at hypervisors
based on current network traffic or distributed mechanism like
Seawall.
The work evaluated the proposed allocation policies using simulation
& a software switch implementation, through hand-crafted examples
and traces of MapReduce jobs from a production cluster.
Strengths
- The work provides a rigorous formal framework and unifies several
published works in solving the challenge of network allocation in data
centers. They identify 3 main requirements: min-guarantee,
proportionality, and high utilization. Also, they extract the low-level
desirable properties behind the tradeoffs, and formally define
them.
- They propose 3 different allocation policies that navigate the
tradeoff space giving cloud providers a variety of VM pricing models:
from flat-rate per VM to per-byte pricing models. These policies can be
fundamental building blocks for more complicated allocation policies
such as providing guarantees and shares proportionally only the
bandwidth unused by guarantees.
Weaknesses
- The work utilizes hand-crafted examples, and use them both for
motivation and evaluation of some desirable properties. For example:
emphasis on tenants underutilizing uncongested paths is the basis of
utilization incentives - but this might be just one of the many possible
scenarios for underutilization in data centers. Another is apply PS-L at
VM granularity to fix the problem of a tenant trying to increase her
allocations by sending traffic between all of one’s VMs. The desirable
properties are not a complete set by any means, but a real-world example
motivating the need to consider some of them is lacking.
Future Work
Many possible areas of future work:
- The refinement of allocation policies is the biggest one and
evaluation of a real-world deployment of such a policy. Specifically,
selecting suitable values for the parameters of PS-P to generalize it to
other topologies such as BCube or DCell.
- Hypervisor-only deployment for the policies described.
- Improve PS-N to incorporate utilization incentives by not providing
any weight to flows travelling uncongested paths.
- Find a work-conserving bandwidth allocation policy that is
strategy-proof. This paper deals with utilization incentives: a
restricted version of strategy proofness.