Overcoming SEA network shaping

Hello fellows,

I’ve been facing an issue for almost a year now with running validators and services in the SEA region, which has proven to be quite a challenge to overcome. I must admit I’m not the best communicator, but it seems nearly impossible to secure a contract in Thailand with an ISP that doesn’t implement network shaping for international traffic. I have a 400/400 Mbps commit on a 1 Gbps port, costing around $1,000 USD monthly. However, in reality, the performance per single-threaded connection is akin to that of a 90s asymmetric digital subscriber line (ADSL).

After negotiating and benchmarking the major local providers in Thailand, it appears more the rule than the exception that we peak at 10 Mbps speeds for international connectivity per thread. The best solution seems to be getting cross-connected with some global IT transit provider. However, this is a bit much for the entry-level objectives we aim for by running validators locally outside of cloud services.

My question for the forums is: How can we mitigate this in the Polkadot design and networking stack? As I understand, we are moving towards a more linear networking stack from the initial subgossip, and there seems to be no way around the fact that one paravalidator can be located in Novosibirsk and another in Phnom Penh. Consequently, network traffic ends up routing through Finland due to firewalls and network shaping, with latency reaching somewhere past 300 ms instead of the ~20 ms it would take light to travel the distance. However, more problematic than the actual large latencies, in my opinion, is the degraded bandwidth speeds over distance due to network shaping and oversubscribed subsea cables.

I feel that we should at least try to provide some resolution to this in our software stack. Could we perhaps have a flag in the clients to allow multithreaded fetching of PoVs inside muxed channels? Or does the client perhaps already do this? Well regardless I am missing quite a few fetchPoVs due to timeouts, and solving the problem isn’t as easy as upgrading hardware. It requires a significant amount of business development effort to reach out to ISPs and get their engineers to benchmark connectivity in hopes of sealing deals. Most of the ISPs seem absolutely clueless when I present them with the result data and deny any intentional shaping on their behalf.

Apr 12 10:26:13 pso06 polkadot[1348794]: 2024-04-12 10:26:13 fetch_pov_job err=FetchPoV(NetworkError(Network(Timeout))) para_id=Id(4000) pov_hash=0x7d242d8ff9c1e8bd8b7b10da953dc5c43632c1ddc2d3974676c0de9f75afc380 authority_id=Public(629f9fd0dd7279c7af7470472d1208a13e33239b484974d47cffce4ad4785644 (13EK7STm...))
Apr 12 10:30:49 pso06 polkadot[1348794]: 2024-04-12 10:30:49 fetch_pov_job err=FetchPoV(NetworkError(Network(Timeout))) para_id=Id(2015) pov_hash=0x9f98b2b6dbc2cefffac1784ee67d16ac1922a2d43aaee6318949c9fb6209523b authority_id=Public(e806a160805fd12fece90fedf9f49406593fa603e934cd636eb48e0ae033a035 (16FE5aNU...))
Apr 12 10:35:13 pso06 polkadot[1348794]: 2024-04-12 10:35:13 fetch_pov_job err=FetchPoV(NetworkError(Network(Timeout))) para_id=Id(4006) pov_hash=0x02d5a2093fef5e2681a462d2519429ff3442a31d12d5ddfdf6b0f3354978396e authority_id=Public(fedde28f6f994db7050ed42dba0e2825ef3ccc1bb2a1ad230ff36e81244ada2a (16mB5RmY...))
Apr 12 10:35:25 pso06 polkadot[1348794]: 2024-04-12 10:35:25 fetch_pov_job err=FetchPoV(NetworkError(Network(Timeout))) para_id=Id(4006) pov_hash=0x042fca13e8f12c8e0e62edde80ca8e23a261446d0209db556f9987e94c86c6b3 authority_id=Public(fedde28f6f994db7050ed42dba0e2825ef3ccc1bb2a1ad230ff36e81244ada2a (16mB5RmY...))

I believe that Westend/Rococo both are running high bandwidth cloudservices where these networking challenges are unlikely to come into the attention of developers. I propose that in community network Paseo we began to collect/export prometheus data from all compensated validators to understand if this is more broader issue world wide. Anyways it would be good practice for the testnet.

In the issue #1498 Jeff mentions radix16 quadrupling effect on storage of dense PoVs that also reflects on networking requirements. Has there been any consensus/strategy developed for transitioning from the current network and storage-intensive radix16 trie to a more efficient radix2/JMT? During a recent discussion at Sub0, Jeff introduced a novel concept involving the use of Blake3 for Merkle tree construction, which he suggested could potentially enhance data transmission efficiency by three to four times. tbh I didn’t understand much of the math in it but blake3 benchmark results are incredible so thats probably a great option.

Some tests results from the region:


1gbits port 400/400 commit 1 threaded

1gbits port 400/400 commit 32 threaded

other providers in region(tested with 1 threaded networking):


10gbits port 1000/1000 commit 1 threaded


10gbits port 1000/1000 commit 1 threaded

2x10gbits in SG 1 threaded

Test were conducted by ISPs with the script I wrote intspeed that just initiates speedtest-go worldwide locations and plots it into image.

2 Likes

I feel like multi-part / chunked PoV transfers are the solution to this and will improve decentralization of the network. I worry that the solution is going to be complex. But it is needed. Even in the US, I need to compensate for the trans-atlantic delays and throttling with superior hardware. I’m still outperformed by lower end hardware because it’s in the EU – close to the primary clusters of validators.

Thanks for the detailed analysis - making it possible to sustainably operate a validator within a more remote region is quite important. However, at the moment there are indeed growing bandwidth requirements and low latency. This is in part due to the inefficiency of the 16-radix MPT used across Substrate, but would only be alleviated by a binary trie, never solved entirely. I think we should treat PoV space according to Jevons’ Paradox - more efficient tries will mean we use more, not less.

So a couple of points.

One of the goals of asynchronous backing is to reduce latency requirements on validators. At the moment, parachain candidate blocks must be collated, circulated, backed, and then those backing statements circulated all within a tight 6 second window. This locks out high-latency validators from being effective within the backing process. Asynchronous backing should help - when parachains build their parachain blocks slightly in advance of their inclusion in the relay chain, the network should comfortably latencies of even seconds without impacting performance.

We have discussed multi-part PoV transfers for a long time and it would also help here.

Side note: I am currently working on a novel binary trie database which I hope to bring upstream to Polkadot-SDK, though it is likely not the solution to this problem.

2 Likes

One could make argument for SEA being the most important/central region due to the population density. The same population density together with corruption is the main cause for this ongoing shaping in connections that hit the subsea cables.

Right, that’s true. Got sidetracked in despair.

Aren’t the PoVs already erasure coded into chunks for candidate backing? Wouldn’t it be somewhat trivial addition to have flag to allow multi-part fetching for these chunked PoVs?

I am probably forced to relocate to singapore to resolve this and enable future growth as well, but would be really great if multi-part PoV fetching was something to be included at least in JAM spec to allow participation into securing network outside of europe.

@hitchhooker

I still am under the impression that Asynchronous Backing will completely solve this issue once most parachains have enabled it. Can you run some tests at that point and check back in?

1 Like