Parachain gateway stuck on transaction revalidation

lach · March 3, 2023, 4:05pm

We stumbled upon a problem during our transaction throughput testing on our kusama parachain: revalidating transactions stuck in a loop on gateway nodes (?).

Context

Our network consists of 7 nodes,

3 collators - parachain-collator-swe🇸🇪, parachain-collator-deu🇩🇪, parachain-collator-ita🇮🇹
3 gateways - parachain-gateway-kor🇰🇷, parachain-gateway-deu🇩🇪, parachain-gateway-usa🇺🇸
1 extra archive node - parachain-archive

Only gateways can have peerings with collators, so there is no direct connectivity between parachain-archive and collators.

There is a graph for txpool_validations_scheduled node metric

During our testing, we are spamming one of the gateways with a lot of balance.transfer calls, client was located in Europe in all the following steps, except (3).
(Load script code: benchmark.ts · GitHub)

We started with parachain-gateway-usa🇺🇸 as our first target, and everything went smoothly; collators handled every transaction (first spike on the graph, 1).

Then we proceeded with parachain-gateway-kor🇰🇷, and everything went well (second spike, 2).

Then we retried with parachain-gateway-usa🇺🇸 again, but with the client located in North America, resulting in success (third spike, 3). The sender location doesn’t make a difference.

But then, interesting things started to happen.

During parachain-gateway-deu🇩🇪 testing (orange line on the graph), we filled the transaction pool with transactions (4), and collators executed a couple of transactions… And then, the rest of the transactions were stuck in the loop on the gateway, revalidating and moving from Ready state to Future state and vice-versa (spiky orange line on the graph, 5).

Then we restarted parachain-gateway-kor🇰🇷 (6), collators executed another part of the initial batch of transactions, and the rest were stuck in the same loop again. Now there are two nodes in this loop: parachain-gateway-deu🇩🇪 and parachain-gateway-kor🇰🇷. (7)

Then parachain-gateway-usa🇺🇸, and the same story as with parachain-gateway-kor🇰🇷 (8)

Then parachain-archive (9), and there were no transactions executed at all. So restarting a gateway works as a pump; restarted gateway gathers some part of the transaction pool, manages to send some of them to the collator and then stops for some reason?

And finally, we restarted parachain-collator-deu🇩🇪 (10), and every transaction was finally processed (11).

This behaviour is reproducible, and we have found that adding 100ms latency to parachain-gateway-deu🇩🇪 network makes this issue go away.

Summary

parachain-gateway-deu🇩🇪 getting stuck under load, and only collator restart helps to resolve this issue.
We also tested with transactions sent from Europe and North America with the same result (to parachain-gateway-deu🇩🇪 in both cases); sender location doesn’t make a difference; only transactions sent toparachain-gateway-deu🇩🇪 getting stuck.

What might be the cause of this behaviour? What can we do to prevent the malicious actor from making our gateways stuck in this state?

rphmeier · March 5, 2023, 10:09pm

This is more suitable as a Substrate/Cumulus GitHub issue, rather than a forum post. Not dismissing it, but just want to make sure it lands in the right place for any investigation/bugfixing.

Topic		Replies	Views
Stalled parachains on Kusama - post mortem Tech Talk postmortem , kusama	5	812	September 27, 2023
Polkadot Summit - Parachain Migration Tools Notes Ecosystem	0	283	July 14, 2023
Cumulus Consensus Modules Tech Talk	11	632	May 28, 2023
Push Kusama Limits with PoV / Weight Limit System Parachains Tech Talk	10	603	November 24, 2022
Equivocation within parachains Tech Talk	12	376	September 25, 2022

Parachain gateway stuck on transaction revalidation

Context

Summary

Related Topics