This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

Heuristic Flow Control

By Quanlu Liu | Monday, January 30, 2023

Overview

The flexible services discussed in this article primarily refer to load balancing on the consumer side and rate limiting on the provider side. In previous versions of Dubbo,

The load balancing component primarily focused on the principle of fairness, meaning that the consumer would choose from providers as equally as possible, which did not perform ideally in certain situations.
The rate limiting component provided only static rate limiting schemes, requiring users to set static maximum concurrency values on the provider side, which are not easy for users to select reasonably.

We have made improvements to address these issues.

Load Balancing

Introduction

In the original version of Dubbo, there were five load balancing schemes to choose from: Random, ShortestResponse, RoundRobin, LeastActive, and ConsistentHash. Except for ShortestResponse and LeastActive, the other schemes mainly consider fairness and stability in selection.

For ShortestResponse, its design aims to select the provider with the shortest response time from all available options to improve overall system throughput. However, there are two issues:

In most scenarios, the response times of different providers do not show significant differences, causing the algorithm to degrade to random selection.
The length of the response time does not always represent the machine’s throughput capability. For LeastActive, it believes traffic should be allocated to machines currently handling fewer concurrent tasks, but it similarly faces issues like ShortestResponse, as it does not solely indicate the machine’s throughput capability.

Based on this analysis, we propose two new load balancing algorithms. One is a purely P2C algorithm based on fairness considerations, and the other is an adaptive method that attempts to measure the throughput capabilities of provider machines adaptively, allocating traffic to machines with higher throughput to enhance overall system performance.

Overall Effect

The effectiveness experiments for load balancing were conducted in two different scenarios: one with relatively balanced provider configurations and another with significant disparities in provider configurations.

Usage Method

The usage method is the same as the original load balancing methods. Simply set “loadbalance” to “p2c” or “adaptive” on the consumer side.

Code Structure

The algorithm implementation for the load balancing part only requires inheriting the LoadBalance interface within the existing load balancing framework.

Principles

P2C Algorithm

The Power of Two Choices algorithm is simple yet classic, and its main idea is as follows:

For each call, make two random selections from the available provider list, choosing two nodes providerA and providerB.
Compare the two nodes, providerA and providerB, and select the one with the smaller “current number of connections being processed”.

Adaptive Algorithm

Code GitHub Link

Relevant Metrics

cpuLoad . This metric is obtained on the provider side and passed to the consumer side through the invocation’s attachments.
rt rt is the time taken for a single RPC call, measured in milliseconds.
timeout timeout is the remaining timeout for the current RPC call, measured in milliseconds.
weight weight is the configured service weight.
currentProviderTime The time at which the provider side calculates cpuLoad, measured in milliseconds.
currentTime currentTime is the last time load was calculated, initialized to currentProviderTime, measured in milliseconds.
multiple
lastLatency
beta Smoothing parameter, default is 0.5.
ewma The smoothed value of lastLatency
inflight inflight is the number of requests on the consumer side that have not yet been returned.
load For the alternate backend machine x, if the time since the last call is greater than 2*timeout, its load value is 0. Otherwise,

Algorithm Implementation

Still based on the P2C algorithm.

Randomly select two times from the alternative list to get providerA and providerB.
Compare the load values of providerA and providerB, choosing the smaller one.

Adaptive Rate Limiting

Unlike load balancing, which runs on the consumer side, the rate limiting feature operates on the provider side. Its purpose is to limit the maximum number of concurrent tasks processed by the provider. Theoretically, the server’s processing capacity has an upper limit. When a large number of request calls occur in a short period of time, it can lead to a backlog of unprocessed requests, overloading the machine. In such cases, two issues may arise: 1. Due to the request backlog, all requests must wait a long time to be processed, causing the entire service to go down. 2. Long-term overload of the server machine may risk crashing. Therefore, when there is potentially a risk of overload, rejecting some requests might be the better choice. In previous versions of Dubbo, rate limiting was implemented by setting a static maximum concurrency value on the provider side. However, in situations with numerous services and complex topology where processing capacity can dynamically change, it’s challenging for users to set this value statically. For these reasons, we need an adaptive algorithm that can dynamically adjust the maximum concurrency values of server machines, allowing them to process as many received requests as possible while ensuring the machines do not become overloaded. Therefore, we implemented two adaptive rate limiting algorithms within the Dubbo framework, based on heuristic smoothing: “HeuristicSmoothingFlowControl” and a window-based “AutoConcurrencyLimier”.

Code GitHub Link