How to Recover a Parachain

I’m here with @rphmeier. You need to think about this in the way of having a Parachain that is decentralized and this stops because of some bug. Then you want some external entity to vote on some wasm blob to fix the bug. This bug fix could be controversial and the people that are affected by this wouldn’t have any voting power or not as much voting power as on the Parachain itself.

1 Like

As a note, this PR has been merged, which makes it one step easier for teams to manage their Wasm and Hash on the relay chain.

The steps needed:

  • Register your parachain with an account managed by your team.
  • Get a parachain slot, which will “lock” your parachain, the default and safe behavior for the network.
  • Have your parachain send an XCM message to the relay chain, unlocking the chain, and giving access to the parachain registrant.
  • Make scheduled changes to your parachain wasm or head using the new extrinsics available to the registrant of an unlocked parachain.

I would like to note that chains which are “unlocked” are basically the same as those with “sudo”, and should be considered permissioned and somewhat centrally controlled chains. BUT, these are still useful, especially in the early days of the network. Looking forward to seeing more writing and guidance on using these tools as parachain teams actually use them.

1 Like

I’m also fully supporting building better tools for it to not happen, but recent incident has shown us that we cannot live without the political part. Especially when we have no way to speed this critical update up. I would argue that it’s not as hard to see the changes even for novice users when the fix is small like in the case of our fix there are only two changes

  1. Comparing v12.0.0...v12.0.1 · galacticcouncil/Basilisk-node · GitHub
  2. dont wipe authorities · galacticcouncil/substrate@2cb01a5 · GitHub

We could make sure there are tools to make it much easier to explain and show the diff to users.

But even if this is solved, I see three main problems here which are fundamentally technical and are not solved by gov v2 AFAIK

  1. If a parachain stalls there is no way for users to vote on new upgrade apart from relay chain vote
  2. If there is a lot of KSM in that parachain and even KSM of the voters, there is no way for them to vote on unlocking their funds.
  3. There is no way for this to be quick

I was proposing to host the voting on separate chain with separate tokens but @bkchr pointed out off chain solution could be made and it would probably be much more efficient.

I think we should really think about this because if de-fi chain stalls with a lot of liquidity and a money market on top of it even for a day. It could lead to catastrophic chain of liquidations and loss of funds events in the whole ecosystem. We should have a way to fix stuff quickly because even if we have the best tools, it will probably happen and there could be just one time.s

2 Likes

Very good points we host a workshop in the barcamp about this topic. Would love if we could work on a solution there. :slight_smile:

4 Likes

Maybe that doesn’t need to be fully off chain. With manual para lock we could probably introduce some kind of “move control to an external body at parachain X”. Let’s assume that your Parachain has done this and it stopped. Then you could send your users to parachain X, they could proof based on the latest block of your Parachain to Parachain X that they own X amount of tokens and are eligible to vote to recover your chain. When enough users from your parachain voted, the recovery could be done. With some extra checks around this stuff like when the state of your Parachain changed, all these previous votes are removed, because they tried to “recover” a working chain.

I think there are some ways to do this kind of things, but for sure it requires much more thinking.

4 Likes

Maybe we can have a common good parachain for this purpose.

Every other parachain can opt-in to authorize this rescue parachain to have the upgrade permission IFF there is no para block finalized on relaychain for more than X mins.

And then this rescue parachain can implement some simple voting method to allow people to vote the rescue wasm runtime. Interchain Proof Oracle Network will be used to proof holding of funds.

So then instead of seeking DOT/KSM holders’ approval, the parachain token holder can self-service to rescue their parachain.

9 Likes

I agree with this idea, I believe a common good “parachain recovery” chain would be a nice compromise, but understanding that this is not a quick solution to implement, it would be wise to explore the possibilities presented by the tools available to us in the near future, specifically parathreads. There might be a better approach where parachains maintain some sort of emergency recovery “mini runtime” ready to run in a parathread that gets automatically triggered when the parachain stalls and presents the network participants a more sovereign solution to recovering the chain, perhaps this could even allow for basic usage (like token transfers and other features not in constant change) of the network that then gets pushed back to the main parachain along with the fixed validation code that unbricks the chain. This could allow for 0 downtime in the parachain with a sovereign recovery solution, it’s just a quick idea, but in my opinion worth exploring more in depth.

How should that work? How would the parathread be authorized to do this upgrade?

Maybe we should really integrate some kind of recovery mode in the Cumulus PoV logic. The only problem there is, how should we authorize the enabling of this mode? It needs to be something that doesn’t involve too many entities, maybe 3/4 of the collator set or something similar. I mean in the end it would be configured by the Parachain on what logic to use for authorizing this. The recovery mode could then be some simple token voting mechanism to authorize a runtime upgrade to fix the broken chain.

So this is obviously a very complex & political topic.

The de-facto approach to governance in most blockchain systems is more or less that code is law. There have always been exceptions when the system itself was threatened, e.g. when Bitcoin had an infinite mint bug in 2011. Ethereum has hardened on this position over time: in 2016, the DAO hack was enough to necessitate a hard fork and in 2018/19 the Parity wallet hack was not. At this point in time, it doesn’t seem that even a $500m DeFi hack is enough for Ethereum governance to get involved and push a hard fork.

In Polkadot, we have on-chain governance, which carries with it a social contract: the token-holders or ‘citizens’ of the chain have ultimate authority. Other lesser bodies may have some minor privileges, which are scoped. The relay-chain governance has the ability to overwrite any parachain deployed on Polkadot. This is a power that should be used with extreme caution.

When evaluating governance authority, it is important to evaluate the worst cases for abuse as well as use. Convenience often gives way to tyranny. And it is extremely difficult to account for the actual intentions of a parachain in a broad technical mechanism. The best way we have of doing that is whether the chain is proceeding as planned according to its state-transition function. If the chain stalls, there is no shortcut for human intervention.

Essentially, I see parachain teams asking for the relay-chain to automatically evaluate proxy signals such as a token-holder vote or collator referendum to get the chain started up again. Given that parachains are a general mechanism akin to smart contracts, there is no impartial way of evaluating whether these signals actually encapsulate the will of the parachain. For instance, if the parachain has deliberately set its code to void as a way of shutting down, that should not be overruled. It goes beyond the social contract.

It seems to me that parachains don’t want to add ‘admin-multisig’ style recovery paths to their own chain, but would like for the relay-chain governance to function as a fast-response admin-multisig, without having the proper interfaces to do this in a general way that suits all use-cases. I think it would be better for admin/recovery to be managed in the parachain Wasm, as @bkchr suggests, and to expose fallback/recovery infrastructure within the Wasm blob itself, to handle corrupted storage or bugs, if that is what’s desired. I assume this would solve for 80-90% of such cases that we have seen historically, with mild errors in parachain logic or storage. Over time, as systems get more stable, they might choose to remove or limit their recovery/admin infrastructure in favor of more decentralization. For cases that aren’t covered by parachain-scope recovery paths, we will just have to wait longer for the top-level relay-chain governance to decide. Which is not even an option in other ecosystems.

I don’t believe these are technical problems at all.

I can ask these political questions about each of these points:

  1. Who should have the power to upgrade a parachain against the explicit observable behavior of the parachain’s Wasm code itself?
  2. Let’s say the parachain intended to burn those users’ KSM by stopping itself. Should those users alone have the ability to override this behavior? Or should they vote alongside all other KSM holders? Is there an impartial way to determine which KSM is intended to be owned by which account on the parachain, even if the parachain code or storage itself is erroneous?
  3. Is it not dangerous to be able to quickly upgrade a parachain or change its storage? From a technical perspective, we could easily all vote to change the governance system to operate 100x faster. This is not a technical issue but a political one, because there needs to be enough time for all interested participants.

In my proposal, this functionality needs to be opt-in. i.e. enabled by the parachain governance. So if the parachain want’s to purposely die, it can simply opt-out first.

Another way to look at this is: I am requesting a generalized governance chain, the solo purpose is to allow people to vote and dispatch XCM to other parachain / relaychain. Then a parachain may not need to implement any native token & governance. It just need to issue token on Statemint and use this generalized governance chain for any governance actions. In fact, this is the goal of the Polkadot: the core relaychain should have no functionality other than finalizing para blocks. All the governance & token functionality are on system parachain. This shares the same design except the system parachain could be used for other community parachain.

Or when parathread is mature, parachains can also deploy an alternative governance body on parathread that can be used to rescue the main parachain under special circumstances.

5 Likes

I think we agree that relaychain voting should only be used in extreme circumstances and in that sense.

  1. Parachain users. This was the point of my post and I would like to find solution for this.

I know it’s not possible right now but it seems to me like we could find one. In very generic ideation way… Once we have state proofs on the relay chain for parachain state, can we devise a generic way to gather balance of it’s users? (parachains might to chose rules for this e.g. who owns lended voting power? Can you use only free balance?). If this would be readable by the relaychain or a governance parachain. Users of given parachain could vote with balances on the next state of the given chain. Again, might not be the best solution.

  1. Vote alongside everybody else. In the case the storage has errors, it is probably the extreme case and relaychain governance could step in as it would now. Relaychain vote has precedence over parachain vote (at least now and I don’t see immediate reason to change this)

  2. It depends and I’m not sure… Is there a way to speed up technical fixes if a collective of people deemed technical, agree that the upgrade is non-malicious and is it fixing the protocol? If it’s kept in governance v2 why not for parachains? If not, how can we find a way to make fixes secure and relatively fast? I think parachains should have a say on these parameters for themselves and should chose parameters they deem reasonable.

All in all. I completely agree we should not re-use or abuse relaychain governance for these things and all I’m trying to find out is: Can we find a way to make parachain governance more robust and use relaychain governance as last-resort only? It might not be the right way and best reward for the effort. That’s why we need this discussion because there might be better ideas. The only thing I know is that having a DeFi parachain stall for days means it probably never goes back up again.

Once we have state proofs on the relay chain for parachain state, can we devise a generic way to gather balance of it’s users

this is a pretty big presupposition. Polkadot is deliberately designed to be general over storage formats of parachains as well as user schema. There isn’t a good way to ‘prove’ users to the relay chain without effectively locking in parachains to a particular type of merkle trie or storage schema. Most chains use hex tries now, although binary tries are strictly better. And not all parachains will follow the current FRAME schema. There will be parachains written with things that are not FRAME, or with future versions of FRAME. It also seems unlikely that balance accounting can be done impartially without having the relay chain reason about specific pallets on specific parachains.

The two tools that have been suggested in this thread seem to me the most viable path as well as the least likely to incur technical debt.

  1. Basti’s suggested parachain-side recovery mode (solves 80-90% of cases)
  2. Bryan’s suggestion: Parachain admin capabilities, which already exist but should be extended to support an arbitrary account ID as the manager. Over XCM this could be controlled by a multisig or a voting mechanism on another parachain.

These can be used in conjunction with each other and use relay-chain tokenholder vote as a fallback.

When it comes to allowing tokenholders on the parachain to vote, this is actually a more general problem and may need a different solution that is relevant here but somewhat beyond the scope of the topic.

Strictly speaking, DOT held on a parachain is owned by the parachain account and claims are internally delegated to its users according to the mechanisms of the parachain. I propose that we only need two functionalities on the relay-chain in order to solve this issue:

  1. Accounts delegate governance voting rights (already exists)
  2. Accounts should be able to cast many governance votes

With these two functionalities, any parachain can delegate all of its governance voting rights to some account which is controlled on another parachain. This account will be a smart contract, which operates according to the following rules:

  1. If an XCM message is received from the parachain account directing it to vote, it votes according to the parachain’s command.
  2. If no XCM message is received from the parachain (e.g. if the parachain is down or non-operating), it provides logic tailored to the parachain for users of the parachain to prove % holding of the parachain’s DOT and votes according to their will.

There is a lot of flexibility for the specific governance voting mechanism. This smart contract can also be the controller of the relay-chain ‘para-admin’ capabilities of the parachain, if that is desired. It is totally opt-in, and will need to be upgraded whenever the parachain’s own rules, storage format, or storage schema changes, in order to make the accounting correctly.

Note that perfect tracking of claims by users is probably impossible. Users on one parachain might be owners of tokens which correspond to ownership on DOT held by another parachain, or other such cases. While these users might feel like they own DOT, and there is a case to be made that they effectively do, it is difficult to do this type of accounting and can at best be approximate, given opt-in and coordination of delegations by many parachains.

1 Like

Reviving this thread as this has now also become an issue on kintsugi and with gov2, it looks like there’s no practically feasiable way to fast-track a proposal (which is a separate issue).

From my perspective, parachain recovery should have the following properties:

  1. Each parachain should have a way to recover itself from not producing blocks.
  2. Relay-chain governance should not play any part in parachain recovery.

Parachains receive a service from the relay-chain: safety and liveness. I agree with @rphmeier that there is a political dimension. Relay chain governance is not the right place to decide (politically or otherwise) about the fate of a parachain. Hence, each parachain should determine how to recover themselves. What the wider ecosystem can offer though is a common good chain that implements a way for parachains to recover.

I think @xlc proposal is pretty good and could even done in a very simple approach at first.

  • Assume there’s a common good chain that allows to call paras.forceSetCurrentCode on the relay-chain based on (1) a check that for x minutes no block has been produced for that parachain and (2) by somesort of authority.
  • Parachains would determine that authority from their parachains prior to being bricked. IMO in the very first iteration, that authority could just be the TC if it’s elected on a parachain. The more involved votings by verifying governance token holdings from the parachain etc. can be added at a later iteration.
1 Like

I broadly agree, @dom - but also, to be sure, it seems quite possible to build this infrastructure on an existing smart contract chain such as Moonbeam or Astar without the addition of any new common-good chain.

Both OpenGov and Gov1 provide the capability for tokens to be delegated to an external account. I believe that a sensible architecture is to build a smart contract which controls an account on the relay-chain, which is delegated to by the parachain, and obeys the following rules:

  • If the parachain is not down, simply forward governance votes from the parachain. The parachain intending to vote would just send an XCM to the smart contract, which would in turn send an XCM to the relay chain to vote.
  • If the parachain is down, according to relay-chain state proofs, the smart contract can implement an alternative fallback mechanism for deciding how to vote. This could be a vote according to account holders on the parachain itself, using parachain state proofs.
  • The smart contract can also act as the parachain manager, and retain the ability to upgrade code/state on the parachain but only when the parachain is down. I believe that pallet-preimage on the relay-chain can be used to avoid sending the code blob itself over XCM, which is not viable. Needs some investigation.

The main reason to do this in a smart contract instead of on a common-good chain is that the alternative fallback voting mechanism will need to be pluggable and customizable to the particular needs and storage formats of the client chain it acts on behalf of. There is no one-size-fits-all solution here; it needs to be programmable.

1 Like

I agree that whatever the solution is, that it needs to be programmable. Personally, I’d much prefer a pallet over a smart contract since we could add this e.g., to frame or orml and many parachains, not just the ones with smart contracts, can then support the recovery mechanism.

The pallet (or whatever implements the code) should be the acting parachain manager. The fallback mechanism should not rely on relay chain governance as is.

My proposal for how this could work:

  1. Parachains can add a pallet to their runtime (if they opt-in) that allows other parachains to delegate emergency rights to them. Implementation can be based on parachain requirements. Some parachains might just want to have privileged accounts like a TC that can invoke emergency procedures. Some might want to mirror their governance and be able to vote on the measures.
  2. In case of an emergency, the pallet on the recovery parachain is able to paras.forceSetCurrentCode on the relay chain (decided by whatever emergency procedure a particular parachain has configured).

One way around sending the blob over XCM would be if there was a “parachain” governance track:

  • The parachain team could note a preimage with paras.forceSetCurrentCode and open a “parachain” track proposal
  • From a recovery parachain, the parachain that wants to recover sends an XCM message that approves the upgrade. For this, the relay-chain would have to track which parachain(s) and which origins from that chain are allowed to approve.

Instead of using each others parachains for recovery, this could also be placed on an existing common good chain. To be very resilient, a parachain could also have a recovery option from two or more parachains.

2 Likes

As we discussed this last week, I would also be in favor of putting this onto some common good parachain dedicated to this purpose. This parachain could provide ways to create DAOs that are controlled in different ways. However, as this chain should not support all possible DAO configuration, it could also host pallet_contracts to have custom DAO contracts running.

The good thing about having this as a common good chain running. If there are problems with this chain, there are no issues involved with the relay chain token holders rescuing this chain :stuck_out_tongue:

However, the entire rescue part should be really the last resort. We need to improve testing of parachains. We need to see what try-runtime, fudge and chopsticks already provide us with. Then we need to work together in the community to extend these tools. I don’t think there should only exist one tool. Having multiple tools, where each of them supports one special use case really good is better than having one tool that does all, you know :wink: We could even come up with shared test cases that are basic tests for all chains using certain pallets or whatever. The possibilities are endlessly when it comes to testing. If we are all working together on this, I think we can create great tools!

1 Like