Elastic Scaling Rollout: Asset Hub Kusama & Asset Hub Polkadot

As part of the ongoing low-latency roadmap, we are moving forward with enabling Elastic Scaling on Asset Hub Kusama (AHK) and Asset Hub Polkadot (AHP).

The primary goal of this initiative is to achieve approximately 2-second block times using 3 cores. The rollout is following a gradual strategy, moving from test-nets (Versi, Westend, Paseo) to AHKusama, and finally to AHPolkadot.

Test Net Findings

We have conducted extensive triaging on several test networks to ensure stability before the mainnet rollout:

  • Asset Hub Westend: Elastic scaling was enabled (polkadot/PR #9880). While we encountered some network instability (e.g., collations not advertised, validator connectivity issues), these appear unrelated to the elastic scaling feature itself. Work is ongoing to stabilize the chain. Thanks to Eduard Spataru and Nikola Djoric for handling this :folded_hands:

  • Versi: Stress testing with 2k periodic transactions showed stable 2s block times. We observed some connectivity instability and latency spikes, but the core elastic scaling logic held up, producing stable blocks for 3 cores and 600ms blocks for 12 cores (with specific optimizations).

  • Paseo: A new chain was deployed to test patched versions of the runtime. The chain has been able to sustain ~2s blocks on average. Thanks to @ArshamTeymouri for providing the cloud machines to run this chain and @alejandro for providing the cores :folded_hands:

Based on the data from the test-nets, we have decided to proceed with enabling elastic scaling on Kusama. We believe the observed connectivity issues on test-nets are due to specific validator setups and networking conditions rather than the elastic scaling implementation itself.

  • Kusama: We are preparing to enable Elastic Scaling with 3 cores on Asset Hub Kusama (runtimes/PR #1018). Please note that we’ll assign 3 cores only after polkadot/PR #10311 is merged.

  • Polkadot: Asset Hub Polkadot will follow shortly after the Kusama deployment has been triaged and verified.

Key Optimizations

Several improvements have been identified and implemented during this testing phase to ensure smoother performance:

  • Safety Buffer for Block Authoring: To prevent blocks from being authored too late for inclusion, we reduced the authoring duration for the last produced block. This creates a 1-second safety buffer before the scheduled slot change (see polkadot/PR #10154).

  • Fix for Premature Disconnection: We identified an issue where connections were dropped at the end of a slot even if collations were pending. A fix has been merged to keep connections open a bit longer (see polkadot/PR #10446). Thanks @sandreim for the help with debugging and implementing the fix :folded_hands:

  • Instant Reserved Peer Connections: We optimized SetReservedPeers to eliminate an artificial delay of up to 1 second. The node now attempts to open substreams to reserved peers (validators) immediately upon receiving the command, rather than waiting for the next slot timer tick (see polkadot/PR #10362).

  • Local Address Checks: Enhanced checks were added to prevent the node from attempting to add its own local addresses (e.g., loopback interfaces) to the store, reducing log spam and unnecessary connection attempts (see litep2p/PR #480).

Future Improvements

Looking ahead, we have identified further improvements for connection stability that we will be tackling next. These upcoming changes will span across litep2p, Substrate, and the collator protocol.

You can track the full progress and view the technical details in the followig issue: polkadot-sdk/issues/10425

Thanks again to everyone involved in this effort! :folded_hands:

8 Likes