Attention: The feature described in this doc is still under development or is in a very early stage, please keep updated!

Adaptive Load Balancing and Flow Control

The flexible services discussed in this article mainly refer to the functions of load balancing on the consumer side and traffic limiting on the provider side. In previous versions of Dubbo, the load balancing focused more on fairness, meaning that the consumer would choose equally from the providers, which didn’t perform ideally in some cases. The traffic limiting only provided static schemes, requiring users to set a static maximum concurrency value on the provider, which was not easy for users to determine. We made improvements to address these issues.

Overview

The flexible services discussed in this article mainly refer to the functions of load balancing on the consumer side and traffic limiting on the provider side. In previous versions of Dubbo,

The load balancing focused more on fairness, meaning the consumer chooses equally from the provider, which didn’t perform ideally in some cases.
The traffic limiting only provided static schemes, requiring users to set a static maximum concurrency value on the provider, which was not easy for users to determine.

We made improvements to address these issues.

Load Balancing

Usage Introduction

In the original Dubbo versions, there were five loading balancing schemes available: Random, ShortestResponse, RoundRobin, LeastActive, and ConsistentHash. Except for ShortestResponse and LeastActive, the others mainly considered fairness and stability.

For ShortestResponse, its design aims to select the provider with the shortest response time to improve overall system throughput. However, two issues arise:

In most scenarios, the response times of different providers do not show significant differences, making this algorithm degrade to random selection.
The response time does not always represent the machine’s throughput capacity. For LeastActive, it believes traffic should be allocated to machines with fewer active connections, but it similarly fails to reflect the machine’s throughput capacity.

Based on this analysis, we propose two new load balancing algorithms. One is a simple P2C algorithm based on fairness, while the other is an adaptive method that attempts to dynamically assess the throughput capacity of the provider machines and allocate traffic accordingly to improve overall system performance.

Overall Effect

The effectiveness experiments for load balancing were conducted under two different conditions: balanced provider machine configurations and configurations with large discrepancies.

Usage Method

Usage method of Dubbo Java implementation is the same as the original load balancing method. Just set “loadbalance” to “p2c” or “adaptive” on the consumer side.

Code Structure

The algorithm implementation of load balancing only needs to inherit from the LoadBalance interface in the existing load balancing framework.

Principle Introduction

P2C Algorithm

The Power of Two Choice algorithm is simple yet classic, mainly as follows:

For each call, make two random selections from the available provider list, selecting two nodes, providerA and providerB.
Compare providerA and providerB, choosing the one with the smaller “current active connection count”.

Adaptive Algorithm

Code’s GitHub address

cpuLoad . This metric is obtained from the provider machine and passed to the consumer via the invocation’s attachment.
rt rt is the time taken for an RPC call, measured in milliseconds.
timeout timeout is the remaining timeout for this RPC call, measured in milliseconds.
weight weight is the service weight set.
currentProviderTime The time on the provider side when calculating cpuLoad, measured in milliseconds.
currentTime currentTime is the last calculated load time, initialized to currentProviderTime, measured in milliseconds.
multiple
lastLatency
beta Smoothing parameter, defaulting to 0.5.
ewma Smooth value of lastLatency
inflight inflight is the number of requests not yet returned on the consumer side.
load For a candidate backend machine x, if the time since the last call is greater than 2 * timeout, its load value is 0. Otherwise,

Algorithm Implementation

Still based on the P2C algorithm.

Make two random selections from the candidate list to get providerA and providerB.
Compare the load values of providerA and providerB, selecting the smaller one.

Adaptive Traffic Limiting

Unlike load balancing running on the consumer side, traffic limiting operates on the provider side. Its purpose is to limit the maximum number of concurrent tasks processed by the provider. The processing capacity of a server machine theoretically has an upper limit, and when a large number of requests occur in a short period, it can lead to undelivered requests and overload the machine. This scenario can lead to two issues:

Due to backlogged requests, all requests may have to wait a long time to be processed, leading to service paralysis.
Prolonged server overload poses a risk of downtime.

Therefore, when overload risks are present, it is better to reject some requests. In previous versions of Dubbo, traffic limiting was achieved by setting a static maximum concurrency value on the provider side. However, in scenarios with many services, complex topology, and dynamically changing processing capabilities, calculating a static value is difficult.

Based on these reasons, we require an adaptive algorithm that can dynamically adjust the maximum concurrency value of server machines to process as many received requests as possible while ensuring the machine does not overload. Therefore, we implemented two adaptive traffic limiting algorithms in the Dubbo framework: HeuristicSmoothingFlowControl based on heuristic smoothing and AutoConcurrencyLimier based on a window.

Code’s GitHub address