NotNets: Accelerating Microservices by Bypassing the Network

2404.06581

YC

0

Reddit

0

Published 4/11/2024 by Peter Alvaro, Matthew Adiletta, Adrian Cockroft, Frank Hady, Ramesh Illikkal, Esteban Ramos, James Tsai, Robert Soul'e
NotNets: Accelerating Microservices by Bypassing the Network

Abstract

Remote procedure calls are the workhorse of distributed systems. However, as software engineering trends, such as micro-services and serverless computing, push applications towards ever finer-grained decompositions, the overhead of RPC-based communication is becoming too great to bear. In this paper, we argue that point solutions that attempt to optimize one aspect of RPC logic are unlikely to mitigate these ballooning communication costs. Rather, we need a dramatic reappraisal of how we provide communication. Towards this end, we propose to emulate message-passing RPCs by sharing message payloads and metadata on CXL 3.0-backed far memory. We provide initial evidence of feasibility and analyze the expected benefits.

Get summaries of the top AI research delivered straight to your inbox:

Overview

  • The paper "NotNets: Accelerating Microservices by Bypassing the Network" explores a new approach to improving the performance of microservices by avoiding the network overhead.
  • Microservices, a popular architectural style for building complex software systems, can suffer from significant performance degradation due to the network communication overhead between individual services.
  • The researchers propose "NotNets," a system that enables microservices to communicate directly without going through the network, bypassing the associated latency and throughput penalties.

Plain English Explanation

Microservices are a way of building software where different parts of an application are broken up into smaller, independent services that can be developed, deployed, and scaled separately. This approach has many benefits, like making it easier to update and maintain the software. However, it also introduces a significant downside: the need for these services to communicate with each other over a network, which can slow things down.

The NotNets: Accelerating Microservices by Bypassing the Network paper proposes a solution to this problem. Instead of having the services talk to each other over the network, the researchers developed a way for them to communicate directly, without going through the network. This "direct communication" approach allows the services to work together much faster, without the delays and bottlenecks that can happen when data has to travel across a network.

The key idea behind NotNets is to take advantage of recent advances in hardware and software technologies, like high-speed interconnects and specialized communication libraries, to enable this direct communication between microservices. By avoiding the network overhead, the researchers were able to significantly improve the performance of microservices-based applications in their experiments.

This kind of optimization can be particularly valuable for applications that are very latency-sensitive, like real-time systems or interactive services, where even small delays can negatively impact the user experience. By optimizing the communication latency for sensitive HPC applications, NotNets can help make these types of applications more responsive and efficient.

Technical Explanation

The NotNets: Accelerating Microservices by Bypassing the Network paper presents a novel approach to improving the performance of microservices-based applications by avoiding the network communication overhead.

The researchers identify the network as a key source of overhead in microservices architectures, where individual services need to communicate with each other over the network. This network communication can introduce significant latency and throughput penalties, which can degrade the overall performance of the application.

To address this issue, the NotNets system enables microservices to communicate directly, without going through the network. This is achieved by leveraging recent advancements in hardware and software technologies, such as high-speed interconnects (e.g., RDMA, CXL) and specialized communication libraries (e.g., more scalable sparse dynamic data exchange).

The NotNets architecture consists of several key components:

  1. Service Placement: NotNets intelligently places microservices on physical hosts to maximize the opportunities for direct communication, considering factors such as service dependencies and resource requirements.
  2. Direct Communication: NotNets establishes direct communication channels between co-located microservices, allowing them to exchange data without going through the network.
  3. Network Bypass: For microservices that cannot be co-located, NotNets utilizes high-speed interconnects and specialized communication libraries to optimize the distributed protocols and query rewrites, effectively bypassing the network.

The researchers evaluate the performance of NotNets using real-world microservices-based applications and demonstrate significant improvements in latency, throughput, and overall application responsiveness compared to traditional microservices architectures.

Critical Analysis

The NotNets paper presents a compelling approach to addressing the network overhead in microservices architectures, which is a well-known challenge in the field. The researchers have identified a relevant problem and proposed a practical solution that leverages emerging hardware and software technologies.

One potential limitation of the NotNets approach is the reliance on specialized hardware, such as high-speed interconnects, which may not be readily available in all deployment environments. The researchers acknowledge this and suggest that their techniques can also be applied to streamline the CXL adoption for hyperscale efficiency, but further research may be needed to explore the feasibility and cost-effectiveness of this approach in different contexts.

Additionally, the paper does not provide a comprehensive analysis of the security and privacy implications of the NotNets system. As microservices often handle sensitive data, it would be important to investigate the privacy techniques for gRPC-based microservice communication and ensure that the direct communication channels do not introduce new vulnerabilities.

Overall, the NotNets paper presents a well-designed and promising approach to improving the performance of microservices-based applications. The researchers have demonstrated the feasibility and potential benefits of their solution, and their work could have a significant impact on the design and deployment of modern software systems.

Conclusion

The "NotNets: Accelerating Microservices by Bypassing the Network" paper introduces a novel approach to addressing the network overhead in microservices architectures. By enabling direct communication between microservices, the NotNets system can significantly improve the latency, throughput, and overall responsiveness of microservices-based applications.

The key innovation of NotNets is its ability to leverage recent advancements in hardware and software technologies to bypass the network and establish high-performance communication channels between co-located microservices. This approach has the potential to benefit a wide range of applications, particularly those that are latency-sensitive or require real-time performance.

While the paper identifies some potential limitations, such as the reliance on specialized hardware, the researchers have demonstrated the feasibility and effectiveness of their solution through rigorous experimentation. As the adoption of microservices continues to grow, the NotNets approach could become an important tool for building efficient and scalable software systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Full-Stack Allreduce on Multi-Rail Networks

Full-Stack Allreduce on Multi-Rail Networks

Enda Yu, Dezun Dong, Xiangke Liao

YC

0

Reddit

0

The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale networks by facilitating collaborative data transfer across various networks. To achieve this, we propose the Nezha system, which integrates TCP, in-network computing protocol SHARP, and RDMA-based protocol GLEX. To maximize data transfer rates, Nezha incorporates a load balancing data allocation scheme based on cost feedback and combines exception handling to achieve reliable data transmission. Our experiments on a six-node cluster demonstrate that Nezha significantly enhances allreduce performance by 58% to 87% in homogeneous dual-rail configurations and offers considerable acceleration in heterogeneous settings, contingent on the performance variance among networks.

Read more

5/29/2024

🌐

Relay Mining: Incentivizing Full Non-Validating Nodes Servicing All RPC Types

Daniel Olshansky, Ramiro Rodr'iguez Colmeiro

YC

0

Reddit

0

Relay Mining presents a scalable solution employing probabilistic mechanisms, crypto-economic incentives, and new cryptographic primitives to estimate and prove the volume of Remote Procedure Calls (RPCs) made from a client to a server. Distributed ledgers are designed to secure permissionless state transitions (writes), highlighting a gap for incentivizing full non-validating nodes to service non-transactional (read) RPCs. This leads applications to have a dependency on altruistic or centralized off-chain Node RPC Providers. We present a solution that enables multiple RPC providers to service requests from independent applications on a permissionless network. We leverage digital signatures, commit-and-reveal schemes, and Sparse Merkle Sum Tries (SMSTs) to prove the amount of work done. This is enabled through the introduction of a novel ClosestMerkleProof proof-of-inclusion scheme. A native cryptocurrency on a distributed ledger is used to rate limit applications and disincentivize over-usage. Building upon established research in token bucket algorithms and distributed rate-limiting penalty models, our approach harnesses a feedback loop control mechanism to adjust the difficulty of mining relay rewards, dynamically scaling with network usage growth. By leveraging crypto-economic incentives, we reduce coordination overhead costs and introduce a mechanism for providing RPC services that are both geopolitically and geographically distributed. We use common formulations from rate limiting research to demonstrate how this solution in the Web3 ecosystem translates to distributed verifiable multi-tenant rate limiting in Web2.

Read more

4/30/2024

Optimizing Distributed Protocols with Query Rewrites [Technical Report]

Optimizing Distributed Protocols with Query Rewrites [Technical Report]

David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard

YC

0

Reddit

0

Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is ad hoc and error-prone. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. Distributed protocol rewrites entail a new burden: reasoning about spatiotemporal correctness. We leverage order-insensitivity and data dependency analysis to systematically identify correct coordination-free scaling opportunities. We apply this analysis to create preconditions and mechanisms for coordination-free decoupling and partitioning, two fundamental vertical and horizontal scaling techniques. Manual rule-driven applications of decoupling and partitioning improve the throughput of 2PC by $5times$ and Paxos by $3times$, and match state-of-the-art throughput in recent work. These results point the way toward automated optimizers for distributed protocols based on correct-by-construction rewrite rules.

Read more

4/4/2024

Optimizing Layerwise Microservice Management in Heterogeneous Wireless Networks

Optimizing Layerwise Microservice Management in Heterogeneous Wireless Networks

Haojie Yan, Yuedong Xu, Lianggui Dai

YC

0

Reddit

0

Small cells with edge computing are densely deployed in 5G mobile networks to provide high throughput communication and low-latency computation. The flexibility of edge computation is empowered by the deployment of lightweight container-based microservices. In this paper, we take the first step toward optimizing the microservice management in small-cell networks. The prominent feature is that each microservice consists of multiple image layers and different microservices may share some basic layers, thus bringing deep coupling in their placement and service provision. Our objective is to minimize the expected total latency of microservice requests under the storage, communication and computing constraints of the sparsely interconnected small cell nodes. We formulate a binary quadratic program (BQP) with the multi-dimensional strategy of the image layer placement, the access selection and the task assignment. The BQP problem is then transformed into an ILP problem, and is solved by use of a novel sphere-box alternating direction multipliers method (ADMM) with reasonable complexity $O(q^{4})$, where $q$ is the number of variables in the transformed problem. Trace-driven experiments show that the gap between our proposed algorithm and the optimal is reduced by 35$%$ compared with benchmark algorithms.

Read more

5/21/2024