This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.
With the emergence of Docker and Kubernetes, a large monolithic application can be split into multiple independently deployed microservices, packaged, and run in corresponding containers. Different applications communicate with each other to jointly complete a function module. The benefits of microservices architecture and containerized deployment are evident as they reduce the coupling between services, facilitate development and maintenance, and make better use of computing resources. However, microservices architecture also has corresponding drawbacks:
To address these pain points, Service Mesh emerged. Taking the classic Sidecar mode as an example, it implements governance and control of proxy traffic by injecting a Sidecar container into the business Pod, thereby decoupling the governance capability of the framework from the business system. This makes it easy to achieve unified traffic control, monitoring, and other needs across multiple languages and protocols. By decoupling SDK capabilities and breaking them down into independent processes, it alleviates the dependency on SDKs, allowing developers to focus more on the business itself. The foundational framework capabilities have thus been optimized, as illustrated in the following figure (source: dubbo official website):
The classic Sidecar Mesh deployment architecture has many advantages, such as reduced SDK coupling and minimal business intrusion, but it also adds a layer of proxy, introducing additional issues, such as:
To solve these pain points, the Proxyless Service Mesh mode was introduced. Traditional service meshes intercept all business network traffic through proxies, which must be aware of the configuration resources issued by the control plane to control the flow of network traffic as required. Taking Istio as an example, the Proxyless mode allows applications to communicate directly with the istiod process responsible for the control plane. The istiod process listens for and obtains Kubernetes resources, such as Service and Endpoint, and distributes these resources uniformly via the xDS protocol to different RPC frameworks, enabling service discovery and governance capabilities. The Dubbo community was one of the first to explore the Proxyless Service Mesh mode in China, as the Proxyless mode has a lower implementation cost compared to Service Mesh, making it a good option for small and medium-sized enterprises. In version 3.1, Dubbo added Proxyless support by parsing the xDS protocol. xDS is a general term for service discovery, where applications can dynamically obtain Listener, Route, Cluster, Endpoint, and Secret configurations through the xDS API.
Through the Proxyless model, Dubbo establishes direct communication with the Control Plane, thereby achieving unified management over traffic control, service governance, observability, and security, avoiding the performance loss and deployment complexity associated with the Sidecar model.
Overall, the interaction timing diagram between the Istio control plane and Dubbo is shown above. The main logic of xDS handling in Dubbo resides in the PilotExchanger and the specific implementations of each DS (LDS, RDS, CDS, EDS) protocols. The PilotExchanger is responsible for unifying the linkage logic, primarily encompassing three major logics:
For instance, for LDS and RDS, the PilotExchanger invokes the getResource method of LDS to establish communication with Istio, sending data and parsing responses from Istio. Upon completing the parsing, the resource is used as an argument for RDS’s getResource method, which sends data to Istio. When changes occur in LDS, the observeResource method of LDS triggers changes in itself and RDS. The existing interaction is as follows, corresponding to the red-line process in the figure above:
After successfully acquiring resources for the first time, each DS will continuously send requests to Istio via scheduled tasks, parse response results, and maintain interaction with Istio. This process corresponds to the blue line part of the figure above.
Dubbo Proxyless mode has proven its reliability after validation. However, existing Dubbo Proxyless implementation schemes face the following issues:
The interaction logic after the transformation:
Currently, Dubbo’s resource types include LDS, RDS, EDS. For the same process, all resources being listened to for the three resource types correspond one-to-one with the cached resource listening list for that process in Istio. Therefore, we should design separate local resource cache pools for these three resource types. When Dubbo attempts to access resources, it first checks the cache pool; if results are found, it returns directly; otherwise, it aggregates the resource list in the local cache pool with the resources to be sent to Istio for updating its listening list. The cache pool is as follows, where key represents a single resource, and T is the return result of different DS:
protected Map<String, T> resourcesMap = new ConcurrentHashMap<>();
With a cache pool, we must have a structure or container for listening to the cache pool. Here we design it as a Map, as follows:
protected Map<Set<String>, List<Consumer<Map<String, T>>>> consumerObserveMap = new ConcurrentHashMap<>();
Where the key represents the resources to be observed, and the value is a List. The List is designed to support repeated subscriptions. Items stored in the List are of Consumer type in jdk8, which can convey a function or behavior, with the parameter being Map<String, T>, allowing retrieval from the cache pool. As mentioned, the PilotExchanger is responsible for linking the complete process, where the update relationships between different DS can be conveyed using Consumer. Taking observing LDS as an example, the code is roughly as follows:
// Listen
void observeResource(Set<String> resourceNames, Consumer<Map<String, T>> consumer, boolean isReConnect);
// Observe LDS updated
ldsProtocol.observeResource(ldsResourcesName, (newListener) -> {
// Inconsistent LDS data
if (!newListener.equals(listenerResult)) {
// Update LDS data
this.listenerResult = newListener;
// Trigger RDS listening
if (isRdsObserve.get()) {
createRouteObserve();
}
}
}, false);
Once the stream flow model transforms to establish a persistent connection, we also need to store the behavior of this Consumer in the local cache pool. Once Istio receives the push request from Dubbo, it refreshes its cached resource list and returns a response. At this time, the response content returned by Istio is an aggregated result. Upon receiving the response, Dubbo splits the response resources into smaller granular resources and pushes them to the corresponding Dubbo application to notify it of any changes.
Pitfalls:
When the first request is sent to Istio, it calls the getResource method to query the cache. If absent, it aggregates the data to send a request to Istio, which then returns the corresponding results to Dubbo. There are two implementation plans for processing Istio’s responses:
public class ResponseObserver implements XXX {
...
public void onNext(DiscoveryResponse value) {
// Accept data from Istio and split
Map<String, T> newResult = decodeDiscoveryResponse(value);
// Local cache pool data
Map<String, T> oldResource = resourcesMap;
// Refresh cache pool data
discoveryResponseListener(oldResource, newResult);
resourcesMap = newResult;
// for ACK
requestObserver.onNext(buildDiscoveryRequest(Collections.emptySet(), value));
}
...
public void discoveryResponseListener(Map<String, T> oldResult,
Map<String, T> newResult) {
....
}
}
// Specific implementation left to LDS, RDS, EDS
protected abstract Map<String, T> decodeDiscoveryResponse(DiscoveryResponse response){
// Compare new data with cache pool resources to extract resources absent in either pool
...
for (Map.Entry<Set<String>, List<Consumer<Map<String, T>>>> entry : consumerObserveMap.entrySet()) {
// Skip if not present in the local cache pool
...
// Aggregate resources
Map<String, T> dsResultMap = entry.getKey()
.stream()
.collect(Collectors.toMap(k -> k, v -> newResult.get(v)));
// Refresh cache pool data
entry.getValue().forEach(o -> o.accept(dsResultMap));
}
}
Pitfalls:
Listeners in consumerObserveMap and resourcesMap caching pools are likely to result in concurrent conflicts. For resourceMap, as put operations are concentrated in the getResource method, pessimistic locking can secure the corresponding resources and avoid concurrent listening. For consumerObserveMap, with simultaneous put, remove, and traversal operations, using read-write locks can mitigate conflicts. Read locks can be applied for traversals while write locks can be utilized for put and remove operations to prevent concurrency conflicts. Thus, a pessimistic lock on resourcesMap suffices whereas the operations involving consumerObserveMap are as follows:
Pitfalls:
Disconnection reconnection only requires a scheduled task to interact with Istio regularly, trying to obtain authorization certificates. Successfully obtaining the certificate signifies that Istio has come back online. Dubbo will aggregate local resources to request data from Istio, parse the response and refresh local cache pool data, and finally close the scheduled task when completed. Pitfalls:
During this functionality transformation, I genuinely lost a bit of hair, as encountering bugs that couldn’t be found was not uncommon. Besides the pitfalls mentioned above, other issues include but are not limited to:
…… However, it must be acknowledged that Proxyless Service Mesh indeed has its own advantages and broad market prospects. Since the release of Dubbo 3.1.0, Dubbo has already implemented Proxyless Service Mesh capabilities. In the future, the Dubbo community will closely align with businesses to address more real-world production pain points, further refining service mesh capabilities.