Common Vulnerabilities in Substrate/Polkadot Development

Security in development with Substrate

Substrate is a software framework used for building blockchain networks. Because it’s a base layer for many different types of projects, it is absolutely crucial to ensure that it is secure. Just like the OWASP Top 10 list highlights key security risks for web applications, among other areas, there are also relatively common security concerns specific to Substrate/Polkadot. These can include things like how data is stored, how users get into the system, and how transactions are made.

It is not possible to completely secure any system, nor to predict all weaknesses that a system might have. It is, however, possible to analyse past audits and issues in multiple repositories of the Substrate/Polkadot ecosystem and compile and curate a list of the most often found security risks and vulnerabilities.

The main objective of this list is to help new (or not :wink: ) Knowing about them helps developers to build safer code on Substrate, as they now know where to find the most common weak spots and can therefore make sure to check and secure them.

This is a short introduction to the items on this list, the risks they imply and recommendations on how to mitigate them, in no particular order. It is not exhaustive, but it is an excellent place to start securing your systems.

Thanks a lot to all the developers for their help to put that list together, including on the remediation/mitigation side. And thanks to all security experts partnering on this.

List of common security risks to be aware of while developing with Substrate/Polkadot
Insecure Randomness

  • Challenges and Risks
    Weak random numbers can mess up key features like lotteries or voting, allowing attackers to guess or change the numbers to trick the system.
  • Mitigation
    Choose a strong method for generating random numbers in your Pallets depending on your needs. If you cannot trust all validators, find a custom trusted oracle. Otherwise, you can use a method VRF, which Polkadot uses in processes like auctions. To be sure the system is secure, make a regular check to ensure that these methods are safe and working as they should.
  • Example
    An example of an insecure approach is the Randomness Collective Flip pallet from Substrate, which provides a random function that generates low-influence random values based on the block hashes from the previous 81 blocks. Here is how this pallet generates the randomness:

    A more secure approach would be to use VRF from Pallet BABE as Polkadot uses in its auctions. Here is how this pallet generates randomness:

Storage Exhaustion

  • Challenges and Risks
    If charging for storage is set too low, the barrier to prevent malicious use of storage is not effective. In this case, attackers might explore this to make the system slow and costly to run.
  • Mitigation
    Make sure you charge an adequate amount, and that you charge explicitly for storage (deposit). Be sure to check regularly if the rules are followed correctly, and if possible limit the amount of data that can be saved to storage.

Unbounded Decoding

  • Challenges and Risks
    Failing to set a depth limit for decoding can break the system, as attackers might stop the network from working properly by forcing a stack overflow.
  • Mitigation
    Set a depth limit for decoding objects such as calls (extrinsics), and always make sure that the code follows secure best practices.
    Example

In the following example an extrinsic can be used to dispatch a call and the arguments are encoded, so the call to be dispatched is decoded using the decode method. Someone could craft a highly nested call and cause a stack overflow in the pallet that could lead to the validators not being able to generate new blocks. To fix this, the decode method can be substituted by decode_with_depth_limit.

Insufficient Benchmarking

  • Challenges and Risks
    Incorrect or missing benchmarking can slow down the network and it can also let attackers spam the system.
  • Mitigation
    Run benchmarks using the worst case scenario conditions. An example is that the benchmark should cover the execution path where more DB reads and writes happen in an extrinsic. The primary goal is to keep the runtime safe, the secondary goal is to be as accurate as possible to maximise throughput.
  • Example
    The following extrinsic form a modified version of the remark pallet and stores data on-chain.

However, it doesn’t consider the size of the remark (data to be stored). This causes the weight to not be correctly calculated and operations like blake2_256, where the computation used heavily depends on the size of the input, return an underweight value. To fix it,
benchmarks like this one can be implemented:

Here it is possible to see that the length (l) of the remark is considered, and a vector of that size is created. The benchmark linearly tries multiple values for this size. Now it is only necessary to pass the length of the remark in the weight function like this: T::WeightInfo::store(remark.len() as u32)

XCM Arbitrary Execution

  • Challenges and Risks
    Poorly set-up XCM (Cross-Consensus Messaging) might allow attackers to mess with the system or make unauthorised actions.
  • Mitigation
    Limit the usage (Access Control) of XCM execute and send operations until it has been proven to be 100% safe.
  • Example
    The following configuration could enable the execution of any Transact instruction.

To fix this, the Everything value can be changed for a more detailed SafeCallFilter like this one:


This will filter all calls that are not remark_with_event.

XCM DoS

  • Challenges and Risks
    Inadequate XCM set-up might allow attackers to overload the system, slowing it down or even stopping it.
  • Mitigation
    Set up XCM correctly to filter incoming calls and allow only interaction with trusted parachains. This will help avoid DoS (Denial of Service) attacks.
  • Example
    A parachain with the security risks XCM Arbitrary Execution makes it possible for an attacker to use the send function and spam XCMs to other chains. If the chain receiving the messages does not implement a good filter, the messages could cause a bottleneck in the XCM queues of the chain, potentially stopping the chain from receiving any new ones. It can even lead to the chain dropping incoming messages.

Unsafe Arithmetic

  • Challenges and Risks
    Unsafe maths operations can lead to wrong calculation results due to overflows/underflows. This might open a door for attackers to trick the system and cause serious inconsistencies.
  • Mitigation
    Use safe maths functions that check for errors like checked_add or checked_sub, proofread the code for unsafe maths and fix them.
  • Example
    The following sample extrinsic can be used to transfer tokens. However, it does not check the amount being transferred before changing the balances, which can lead to an overflow or an underflow. This can be fixed by using checked_sub in the new_sender_balance calculation and checked_add in the new_receiver_balance calculation.

Unsafe Conversion

  • Challenges and Risks
    Changing one type of number to another type without double-checking can lead to errors, of which attackers may take advantage.
  • Mitigation
    Double-check whenever converting types of numbers to make sure that there are no mistakes. Try to avoid downcasting conversion. Proofread the code for any unsafe changes and fix them.
  • Example
    The following extrinsic, from the Frontier pallet, used to be implemented like this. It was using low_64 to downcast the gas_limit variable so as to be able to use it in the Runner::call method. However, this method can lead to overflows. To fix it, the method was changed to unique_saturated_into, where the value is truncated to the max value of u64 if the previous value was higher.

Replay Issues

  • Challenges and Risks
    Bad handling of nonces might allow attackers to repeat transactions and slow down the system.
  • Mitigation
    Always ensure nonces are correctly set-up in the logic of your system, and use checks to make sure transactions cannot be repeated.
  • Example
    A security issue discovered in Frontier found that transactions were not validated in the State Transition Function (STF) (which is important when a block is being made), due to the usage of a new function validate_self_contained that is not part of the STF. This means that a malicious validator could submit invalid transactions, and even reuse transactions from a different chain. To fix this, the transaction validations (nonces) were added back to the STF.

Outdated Crates

  • Challenges and Risks
    Using old, unsafe or incompatible code parts can open up many risks. Attackers could use known weaknesses to harm the system.
  • Mitigation
    Always use the newest and safest versions of your dependencies (crates), and keep track of any new risks and fixes.
  • Example
    The following pallet is using dependencies from different versions of Substrate. This could lead to serious incompatibility problems. To fix this, always ensure the usage of the same version in all Substrate dependencies.

Verbosity Issues

  • Challenges and Risks
    Lack of detailed logs from collators, nodes, or RPC can make it difficult to diagnose issues, especially when crashes or network halts happen.
  • Mitigation
    Implement logs in the critical parts of your pallets, regularly review them to identify any suspicious activity, and determine if there is sufficient verbosity.
  • Example
    When a parachain halts, its engineers need to check all the logs to understand what caused it. If there is no sufficient logging, engineers might need to spend days getting to the root of the problem, wasting precious time. Consensus systems are complex and almost never halt, but when they do, it is difficult to recreate the scenario that led to it. Implementing a good logging system can therefore help to reduce downtime.
9 Likes

We all know that security is the bedrock of any blockchain system, and that taking it for granted can have dire results, even the loss of its entire purpose; and therefore it should never be an afterthought. Investing time and resources into understanding these risks and how to mitigate them is crucial for anyone involved in Substrate/Polkadot development, and willing to build systems that last.

This is why we invite you to come take a closer look at it with us at our workshop during the Sub0 event on the 19th September called Common Substrate/Polkadot Vulnerabilities and Hands-on Mitigations. There we are going to discuss security risks often seen in Substrate/Polkadot development and, together, we will deep dive into application security topics. This will be a great chance to learn more about how to make your Substrate/Polkadot projects secure; and since after the workshop we will continue to keep you updated… stay tuned!

Stay safe and see you at Sub0!

@gioyik @patricio

3 Likes

Amazing list! I would also like to contribute

Based on my experience auditing several different Substrate chains one additional issue I would add is:

  • Batch Processing DoS

I recently discovered such vulnerability again in one of the engagements

Typically occurs in privileged extrinsic or hooks

Happens when one invalid item in a batch causes the whole batch to fail. A typical mistake is returning an error and stopping execution instead of skipping an invalid item in a batch. In some cases, it is intended but in some cases, it might lead to issues.

If a batch operation is performed regularly by reading items (which are put by users) from storage a malicious actor can intentionally craft an invalid item in storage causing extrinsic to fail.

In general, would love to chat and contribute to better security in the Substrate/Polkadot ecosystem

Hi Timur,

Thanks very much for the contribution. We would love to chat and will be in touch for a call.

Best,

Hello, thank you.
About second issue, Storage Exhaustion.
There is no such issue in Ethereum, how could I read and learn more about it?

It would be great to add some resource SER,
for example, As issues, Unbounded Decoding and Insufficient Benchmarking are new to me.
For the case Unbounded Decoding, I couldn’t find why a revert in single TRX could impact validators and for the case Insufficient Benchmarking, I couldn’t find how it could slow down the network?

Hello, thanks for the questions.

Ethereum, in the same way that Frontier does in Polkadot, mitigates the risk of storage exhaustion through its gas fee mechanism, which charges for computational and storage resources used by transactions and smart contracts. This approach discourages wasteful use of storage. However, while developing a Parachain, you have more flexibility and for custom pallets you need to define your own “gas fee mechanism” respect to storage manipulation. You can learn more in this Stack Exchange questions/answers:

1 Like

Thanks for the interest, you can check this presentation for a more detailed view of the security risks. For learning more I suggest you checking out the Substrate and Polkadot Stack Exchange, you can find quite useful information there.

About the Unbounded Decoding, the issue here is that the transaction will not revert cause the runtime will panic. Panics need to be avoided in Substrate cause it can lead to data corruption (you write to the storage just before panic) or parachain stalling (the logic of a mandatory extrinsic cause a panic). For example, if you have an unbounded decoding in the on_initialize hook, your parachain will panic, by consequence it will stall (each time the block is tried to be executed it panics) and you will require intervention from the governance in order to fix the system.

About Insufficient Benchmarking, it can slow down the network if your extrinsic weights are too overestimated, cause the nodes running the parachain/relay-chain will incorporate less extrinsic per block to avoid being overloaded.