Recover corrupted Staking ledgers in Polkadot and Kusama

TLDR: This post motivates the deployment of a new migration in the fellowship runtimes to fix the corrupted ledgers in Polkadot and Kusama. It also explains why/how the ledgers got corrupted, how they can be restored and the ledger’s final state once the migrations are executed.

For more information about this issue and recovery process, check:

Background

Note: in the context of Polkadot’s staking, a “ledger” is a data structure that keeps track of data and metadata of a staker in the system, such as the amount of stake bonded, associated stash accounts, etc.

Throughout the blocks #19551000#20181758 in Polkadot and #21570000#22515962 in Kusama (i.e. through releases v1.1.0 to v1.1.3), the staking logic did not prevent a controller from becoming a stash of another ledger (introduced by removing this check). Given that the remaining of the code expects that never happens, bonding a ledger with a stash that is a controller of another ledger may lead to data inconsistencies and data losses in bonded ledgers. For a more technical detailed explanation of this issue, check this hackmd.

In a nutshell, when fetching a ledger with a given controller, there could be some paths where the wrong ledger was returned, which could lead to unexpected/wrong ledger states.

This PR and the v1.1.3 release upgrade in Polkadot and Kusama, fixed this regression and blocked the corrupted ledgers to avoid further corruption.

Recovering the corrupted ledgers

The extrinsic Staking.restoreLedger has been introduced as a mechanism to automatically i) restore and ii) unlock the ledgers that are corrupted. This extrinsic has been introduced in PR#3706 and it restores the corrupted ledger depending on the current corruption type and path (see this walkthrough to check all the potential corruption cases).

For a detailed explanation and recovery strategies, check the following docs:

The current Staking.restore_ledger is missing an important check that ensures that the the final state of the restored ledger does not have more active stake than the current free balance of a stash (see this patch in the staking pallet). To avoid having to wait for the whole polkadot-sdk and fellowship-runtimes release train for the patch, there are currently 2 options that only require release of the fellowship runtime:

1. Option A: Runtime migration

Deploy one-time migrations in the fellowship runtime which calls into Staking.restore_ledger for the list of corrupted ledgers and performs the additional checks missing in the current pallet-staking. The migration consists of:

  1. For every ledger that needs recovery:
    1.1. Calls into Staking.restore_ledger to restore the ledger
    1.2. Performs the remaining checks missing in the current pallet-staking

Check the PR against the fellowship runtimes with the migrations for Polkadot and Kusama.

2. Option B: Temporary deployment of fixer pallet in fellowship runtimes

Another option to is to add a temporary pallet to the fellowship runtime that exposes an extrinsic that:

  1. Performs checks of whether the ledger needs to be recovered;
  2. Calls into Staking.restore_ledger to restore the ledger
  3. Performs the remaining checks missing in the current pallet-staking

This temporary extrinsic can be called by any signed origin. The checks in 1. will ensure that only ledgers that are corrupted and whitelisted can be mutated and recovered.

Rewarding the corrupted ledgers retroactively

The ledgers that have been blocked due to corruptionmay not have be able to partake as a staking nominator/validator. In order to compensate the owners of the corrupted ledgers, we propose rewarding the ledger account with funds from the treasury. The calculation and rewarding referenda will be discussed on a separate thread.

4 Likes