Security vulnerability: Executing unauthorized XCM messages

TDLR: Last Wednesday, September 21, 2022 we received a security vulnerability report by SRLabs about adding inherents containing fake data to a block. For example, potentially adding a timestamp that is multiple years in the future, which would lead to stalling the chain and require intervention to get the chain running. This type of vulnerability could only be exploited by block producers. After informing the Parachain teams about this vulnerability, the team behind Moonbeam brought to our attention that an attacker could even execute unauthorized XCM messages on a Parachain. At the time of the report, this vulnerability could be exploited on Kusama and Polkadot resulting in the chain halting.

Vulnerability

The vulnerability was a result of two bugs:

  1. Insufficient validation of inherents, allowing fake data to be included in a block.
  2. Cumulus Parachain block validation enabling signed inherents.

Insufficient validation of inherents, allowing fake data to be included in a block

An inherent is an extrinsic that is used by block producers to add data to a block that cannot be verified on chain. A good example of this is a timestamp when a block producer creates a block. Considering the on chain logic cannot verify this timestamp (as it doesn’t know the current time) the other nodes in the network need to verify the inherents on the block import. This is done using the check_inherents runtime function. The function is basically doing the following:

fn check_inherents(block: Block) {
    for ext in block.extrinsics() {
        if ext.is_inherent() {
            ext.check_inherent()
        }
    }
}

This is_inherent call in FRAME implemented in the following way:

impl RuntimeCall {
    fn is_inherent(&self) {
        match self {
            Self::PalletX(call) if PalletX::is_inherent(call) => return true,
            Self::PalletY(call) if PalletY::is_inherent(call) => true,
            // All other pallets in the runtime
            ..,
            _ => false,
        }
    }
}

RuntimeCall being the FRAME type that represents all pallets and their transactions that can be executed in a runtime.

The problem with the above code is when you have some pallet that supports wrapping calls like pallet-utility in batch etc, it stops working. This is_inherent function is not being able to look into the batch call to recursively check all wrapped calls if they are an inherent and need to be checked.

There exists a wide variety of pallets that supports wrapping calls in different ways, so how was the pallet-utility special? The batch function was looking like this:

fn batch(origin: Origin, calls: Vec<Call>) {
    let is_root = ensure_root(origin).is_ok();
    
    // Dispatch the calls
}

The problem is that we only check if the origin is root and then forward this origin directly to the calls at dispatch. Every other pallet is always requiring that the origin is at least signed or some other special origin, but never that the origin is allowed to be None.

So, a block producer could build a block with batch(Origin::None, [timestamp(timestamp_of_2050)]) and every node would import this block as check_inherents and not be able to find this wrapped inherent to flag it as invalid.

Cumulus Parachain block validation enabling signed inherents

Every Cumulus based Parachain uses the set_validation_data inherent. This inherent contains information like the XCM messages from the relay chain and some values from the state of the relay chain. Parachain blocks are not only validated by all of the parachains nodes, but also by the relay chain to ensure that the state transition is valid. This validation by the relay chain also includes checking the inherents, especially of the set_validation_data inherent. The block validation code from Cumulus checks the validity of the inherent against the input to the validate_block function from the relay chain. This validity check also includes an existence check. This existence check is similar to the is_inherent check not being able to recursively look into calls to find the set_validation_data call. The good thing is that the block validation would panic when it cannot find the set_validation_data call as it is required. The bad thing is that we didn’t check that the origin of the call is None.FRAME supports that every pallet can declare its own origins, but outside the runtime it can only be passed in two origins directly: Signed and None. This is basically controlled by putting a Signature into the extrinsic or not (note: this is abstracted for simplicity of this explanation).

Finding the set_validation_data was implemented as:

let validation_data = block
    .extrinsics()
    .iter()
    .find_map(|ext| if ext.call() == "set_validation_data" {
        Some(ext.call())
    } else { 
        None
    })
    .expect("Couldn't find `set_validation_data`");

The code of set_validation_data is implemented like this:

fn set_validation_data(origin, data) {
    ensure_none(origin)?;
    
    // Do all the processing
}

So, a block producer could have included an extrinsic like set_validation_data(Origin::Signed(some_account), correct_data) which would make the set_validation_data findable, but the call return an error on applying it. The block validation would also use the correct_data to validate it against the input to validate_block and wouldn’t find any issue,

Crafting the block

Combining both of the present vulnerabilities is quick and easy. A block producer would have needed to build the following block:

[
   set_validation_data(Origin::Signed(some_account), correct_data),
   batch(Origin::None, [set_validation_data(Origin::None, fake_data)])
]

All checks would pass for this block and the block producer could have included any XCM messages into the fake_data payload that would have been executed by the Parachain runtime.

NOTE: Any Parachain that used the CheckWeight signed extension would have been protected against the signed inherent as well. As the signed inherent would fail the ensure_none check, CheckWeight would return an error as it ensures that no inherent is returning an error.

Recap

The report was published on Wednesday 21.09.2022. The same day we included two patches into Substrate and Cumulus to fix the two bugs. At the time of writing, Wednesday 28.09.2022 all Parachains affected by this vulnerability have upgraded their chain, including all the needed fixes, thanks to the forkless upgrades system from Substrate/Polkadot.

Polkadot and Kusama were only affected by the flaw in check_inherents. So, it would have been possible by a validator to stall the block production, because other nodes would have failed at checking that the timestamp increased by some minimum when building a new block. Both chains were updated on the 28.09.2022 to a new runtime. Polkadot applied a runtime upgrade which only containing the required fixes, based on the latest release. Kusama, that had a runtime upgrade scheduled for the 28.09.2022, was upgraded to the planed runtime upgrade with the required fixes included.

Thank you to everyone who helped identify and fix this vulnerability. Also, Thank you to everyone that helped with a timely deployment to multiple chains.

17 Likes

Added this to a new category called ‘postmortem’ so that such incidents can be discoverable, as they are invaluable educational resources.

having written this, perhaps we need a new category name like ‘vulnerabilities’ or something like that. This was not exploited, so there was no ‘mortem’ involved.

3 Likes

Would it make sense to try and detect this pattern with a linting rule? Maybe using dylint? I don’t know Substrate’s full testing pipeline, but could make sense to search for in a daily/weekly regression loop.

The pattern is not wrong and is valid code. The problem here is more that multiple things need to come together to make this work. I also relaxed my initial fix pr later: pallet-utility: Only disallow the `None` origin by bkchr · Pull Request #12351 · paritytech/substrate · GitHub So, we now only reject the None origin.

I think we should also make all inherents required, meaning if they don’t appear in the extrinsics the block is invalid. This would also allow us again to bring back the None origin to batch. So, a lint is not that easy :stuck_out_tongue:

Nevertheless, playing around with these lints would probably nice. I read the docs once, but not that detailed and don’t remember everything. However, I think a good lint would be that a pallet that provides a SignedExtension throws an error lint if the SignedExtension is not added to the runtime. I this happens, I would vote for some large tip :slight_smile:

@bkchr Nobody reached out to us to inform us about this vulnerability at all - and our chain is potentially vulnerable still.