Robust chain upgrades: Impossible or Uptane for Substrate (Parachains)

0mm-mark · November 27, 2022, 5:49am

I’ve come across several reports of a parachain update/upgrade going awry and a chain is bricked, unable to produce blocks - hi-jinx ensue, and everyone lives happily ever after.

One such case is discussed below and the consensus appears to be automatic rollback is not possible, and being inoperable for some period is the way things will be (hi-jinx required):

I’d assumed any such changes are like a kernel update with A/B partitions - if the reboot fails the previous setup is reverted to.

I am not suggesting this would be straight forward. In fact the Uptane details indicate the scope of the issue - albeit in a different context:

Nonetheless, is something like “The Update Framework + failsafe rollbacks for Parachains” possible?

Is there anything along these lines underway that can be tracked?

Update:
There is a Rust implementation of TUF (beta):

albi · November 27, 2022, 7:48am

There is no development done yet. We currently discuss what the best mechanism would be. We host a session at the barcamp next week.

At the moment there are multiple options proposed in the discussion you linked.

A mechanism in Cumulus that enables parachains to recover without external help.
A mechanism on the Relay Chain that allows parachains to delegate recovery powers to an entity. (suggested by Bryan Chen)

Both options have their pros and cons. Option 1 might not solve all the errors that could occur, while option 2 is difficult to implement in a decentralized fashion (the token holders on the bricked parachain should be able to vote).

An interesting discussion point regarding option 2 is also how much power should be moved to the relay chain. The code of the parachain defines the rules that must be followed on this chain. If we now move parts of this to the relay chain, the parachain gives up a portion of its sovereignty and it also becomes more complicated to reason about the rules on a parachain. You would need to take the parts into account that now live on the relaychain.

Another issue is also, that a stalled parachain might even be the luckiest error case. A security vulnerability that let’s you mint tokens could be even worse. Rollbacks might not be possible in these cases since the tokens could already be moved to other chains via XCM.

0mm-mark · November 27, 2022, 9:05am

I won’t be at the Barcamp, and understand it’s held under Cheltenham House Rules - which is fine.

My understanding of both those options is that neither is an automatic rollback by the relay and parachain. Both options require voting. Correct?

Agree not every mishap will be reversible. Maybe restrict the initial scope to those that are.

Am I correct that if the upgrade protocol had an immediate ‘block-production’ validation step and the relay and parachain kept the last known-good wasm to revert to, then some of your ‘hi-jinx’ may have been avoided, and you would have immediately been alerted to the issue.

I’m not suggesting some uptane-like recovery functionality is trivial, nor a cure-all. It is a well defined starting point. It also means not every recovery is blocked on a vote.

Probably important to address two categories of mishaps separately?

0mm-mark · November 27, 2022, 11:10pm

@albi , I have drafted a RFP and submitted it to the Web 3.0 Foundation Grants Program.
If you could be kind enough to bring it to the attention of the attendees at the Polkadot Summit: Barcamp (30 Nov, 1 Dec) topic Parachain Emergency Recovery?

In addition to general feedback attendees will likely know of teams/people that could deliver the RFP:

github.com/w3f/Grants-Program

RFP: Designing Upchain - a framework for securing Substrate runtime upgrades and Substrate network upgrades

w3f:master ← taqtiqa-mark:master

opened 10:48PM - 27 Nov 22 UTC

taqtiqa-mark

+68 -0

# Request for Proposals ## Abstract The Upchain Specification, a framework… for securing Substrate runtime upgrades and Substrate network upgrades, by extending The Update Framework and modeled on Uptane (The Update Framework Specification extended for automobiles) ## Background * [How to recover a parachain](<https://forum.polkadot.network/t/how-to-recover-a-parachain/673>) * [Polkadot Summit: Barcamp (30 Nov, 1 Dec) topic Parachain Emergency Recovery](https://forum.polkadot.network/t/polkadot-summit-barcamp-submit-agenda-topics-30-nov-1-dec/669/8) ## Checklist - [x] I have checked the [open](https://github.com/w3f/General-Grants-Program/tree/master/rfps) and [implemented](https://github.com/w3f/General-Grants-Program/tree/master/rfps/implemented) RFPs to make sure this is not a duplicate. - [x] I have read and followed the [RFP Suggestion instructions](https://github.com/w3f/General-Grants-Program#mailbox_with_mail-request-for-proposals-rfp-suggestions).

Topic		Replies	Views
How to Recover a Parachain Tech Talk	25	3837	December 8, 2022
Proposed Solutions for Unbricking an Enterprise Parachain with OpenGov Ecosystem	14	2061	September 18, 2023
Parachain runtime-upgrade panic error Tech Talk	0	226	May 9, 2024
[2024-04-21] Polkadot parachains stalled until next session Tech Talk postmortem	1	782	April 29, 2024
Escape Hatches for Parachains Tech Talk	4	958	June 15, 2023

Robust chain upgrades: Impossible or Uptane for Substrate (Parachains)

Related topics