Litep2p Network Backend Updates

In this post, we’ll review the latest updates to the litep2p network backend and compare its performance with libp2p. Feel free to navigate to any section of interest.

Section 1. Updates

We are pleased to announce the release of litep2p version 0.7, which brings significant new features, improvements, and fixes to the litep2p library. Highlights include enhanced error handling, configurable connection limits, and a new API for managing public addresses. For a comprehensive breakdown, please see the full litep2p release notes. This update is also integrated into Substrate via PR #5609.

Public Addresses API

A new PublicAddresses API has been introduced, enabling developers to manage the node’s public addresses. This API allows for adding, removing, and retrieving public addresses shared with peers through the Identify protocol. It aims to address or reduce long-standing connectivity issues in litep2p.

Enhanced Error Handling

The DialFailure event now includes a DialError enum for more granular error reporting when a dial attempt fails. Additionally, a ListDialFailures event has been added, which lists all dialed addresses and their corresponding errors in the case of multiple failures.

We’ve also focused on providing better error reporting for immediate dial failures and rejection reasons for request-response protocols. This marks a shift away from the general litep2p::error::Error enum, improving overall error management. For more details, see PR #206 and PR #227.

Configurable Connection Limits

The Connection Limits feature now lets developers control the number of inbound and outbound connections, helping optimize resource management and performance.

Feature Flags for Optional Transports

With Feature Flags, developers can now selectively enable or disable transport protocols. By default, only TCP is enabled, with the following optional transports available:

  • quic - Enables QUIC transport
  • websocket - Enables WebSocket transport
  • webrtc - Enables WebRTC transport

Configurable Keep-Alive Timeout

Developers can now configure the keep-alive timeout for connections, allowing more control over connection lifecycles. Example usage:


let litep2p_config = Config::default()
    .with_keep_alive_timeout(Duration::from_secs(30));

Section 2. Performance Comparison

To gauge performance, we ran a side-by-side test with two Polkadot nodes — one using the litep2p backend and the other using libp2p — on the Kusama network. Both nodes were configured with the following CLI parameters: --chain kusama --pruning=1000 --in-peers 50 --out-peers 50 --sync=warp --detailed-log-output.

While network fluctuations and peer dynamics introduce some variability, this experiment offers an approximation of how the two network backends perform in real-world scenarios.

CPU Usage

One of litep2p’s key advantages is its lower CPU consumption, using 0.203 CPU time compared to libp2p’s 0.568, making it 2.78 times more resource-efficient.

Network Throughput

Litep2p handled 761 GiB of inbound traffic, while libp2p processed 828 GiB, giving libp2p an 8% edge in this category. However, litep2p outperformed libp2p in outbound traffic, handling 76.9 GiB versus libp2p’s 71.5 GiB, providing litep2p a 7% advantage for outbound requests.

Sync Peers

The chart below shows the number of peers each node connected with for sync purposes. Litep2p maintained more stable sync connections, whereas libp2p exhibited periodic disconnection spikes, which took longer to recover from. This may be due to litep2p’s increased network discovery via Kademlia queries.

Request Responses

While both backends showed similar numbers of successful request responses, libp2p outperformed litep2p in this area.

Litep2p encountered more outbound request errors, primarily due to substreams being closed before executing the request.

Preliminary CPU-constrained parachain testing resulted in worse performance for litep2p, for more details see Issue #5035.

With recent improvements in error handling, we expect to address these issues in future releases.

Other Performance Metrics

  • Warp Sync Time

    The warp sync process saw litep2p completing in 526 seconds, compared to libp2p’s 803 seconds, indicating a significant performance gain for litep2p. The warp sync time was measured using the sub-triage-logs tool and you can find more details in PR #5609.

  • Kademlia Query Performance

    The Kademlia component facilitates network discoverability. In an experiment to benchmark network discoverability, litep2p located 500 peers (about 25% of the Kusama network) in 12-14 seconds, while libp2p completed the same task in 3-6 seconds.

    The experiment still produces quite a lot of noise and we’ll have a closer look at this once we have a better benchmarking system. In the meanwhile, the subp2p-explorer tool was used for this experiment. The bench-cli tool can also spawn a local litep2p network to reproduce this experiment, providing additional opportunities for optimization.

A special thanks to Dmitry for his exceptional work on litep2p, @alexggh for testing litep2p from the parachain perspective, and @AndreiEres for his efforts in improving benchmarking systems to help drive further network optimizations :pray:

9 Likes

We’re excited to announce litep2p version 0.8.0, which introduces support for content provider advertisement and discovery in the Kademlia protocol, aligning with the libp2p spec. This enables nodes to publish and discover specific content providers on the network. Alongside this feature, the release brings notable improvements in stability, performance, and memory management.

For a full list of changes, refer to the litep2p changelog.

Content Provider Advertisement and Discovery

With this release, Litep2p now supports content provider advertisement and discovery using the Kademlia protocol, allowing content providers to publish records to the network, and enabling other nodes to locate and retrieve these records with the GET_PROVIDERS query. This feature is crucial for storing parachain bootnodes in the relay chain DHT.

    // Start providing a record to the network.
    // This stores the record in the local provider store and starts advertising it to the network.
    kad_handle.start_providing(key.clone());

    // Wait for some condition to stop providing...

    // Stop providing a record to the network.
    // The record is removed from the local provider store and stops advertising it to the network.
    // Please note that the record will be removed from the network after the TTL expires.
    kad_provider.stop_providing(key.clone());

    // Retrieve providers for a record from the network.
    // This returns a query ID that is later producing the result when polling the `Kademlia` instance.
    let query_id = kad_provider.get_providers(key.clone());

Connection Stability

The release includes several improvements to enhance the stability of connections in the litep2p library:

  • Connection Downgrading: Inactive connections are now downgraded only after extended inactivity, reducing interruptions and improving long-term stability.

  • Enhanced Peer State Management: A refactored state machine with smoother transitions enhances the management of peer connections, preventing issues like state mismatches that could lead to rejected connections.

  • Address Store Improvements: Address tracking is now more precise, with a new eviction algorithm to manage unreachable addresses and better control memory usage.

Optimizations

  • Improved Dialing Logic: Dialing across TCP, WebSocket, and Quic is now more resource-efficient, with canceled attempts immediately terminating to save resources.

  • Kademlia Data Handling: Data handling is now more efficient by replacing unnecessary data cloning with reference-based retrievals for Kademlia messages.

  • Memory Leak Fixes: Addressed memory leaks across TCP, WebSocket, and Quic transports, especially in canceled connections. Unremoved pending operations were resolved in both the ping and identify modules. See the relevant PRs: #272, #271, #274, #273.

I want to extend my thanks to everyone who contributed to making this release possible. Special thanks to Dimitry @dmitry-markin for his outstanding work on implementing the content provider advertisement and discovery feature, and to Alex @alexggh for his dedicated testing efforts and for detecting a high memory consumption. Thanks also to Andrei @sandreim for his valuable suggestions on investigating memory cloning, and to Andrei @AndreiEres for his ongoing commitment to enhancing benchmarking! :pray:

4 Likes

This v0.8.1 release includes key fixes that enhance the stability and performance of the litep2p library. The focus is on long-running stability and improvements to polling mechanisms.

For a full list of changes, refer to the litep2p changelog.

Long Running Stability Improvements

This issue caused long-running nodes to reject all incoming connections, impacting overall stability.

Addressed a bug in the connection limits functionality that incorrectly tracked connections due for rejection.

This issue caused an artificial increase in inbound peers, which were not being properly removed from the connection limit count.

This fix ensures more accurate tracking and management of peer connections #286.

Polling implementation fixes

This release provides multiple fixes to the polling mechanism, improving how connections and events are processed:

  • Resolved an overflow issue in TransportContext’s polling index for streams, preventing potential crashes (#283).

  • Fixed a delay in the manager’s poll_next function that prevented immediate polling of newly added futures (#287).

  • Corrected an issue where the listener did not return Poll::Ready(None) when it was closed, ensuring proper signal handling (#285).

Dashboards

This dashboard provides a comprehensive view of litep2p’s performance compared to libp2p. Here, litep2p demonstrates a remarkable speed advantage, handling notifications with various payload sizes 10x to 30x faster than libp2p.

The next dashboard highlights the improved connection stability of a long-running node with a high connection load (500 inbound and 500 outbound connections). Litep2p is represented by the green line, showcasing its stability, while libp2p is represented by the yellow line.

Finally, the CPU consumption dashboard reveals a significant reduction in CPU usage for litep2p, using half the CPU resources compared to libp2p. Here, litep2p is represented by the yellow line, and libp2p by the magenta line.

As always, thanks @dmitry-markin for in-depth reviews and suggestions, @AndreiEres for implementing the dashboards that show a significant notification performance improvement #6455! :pray:

5 Likes