Security in development with Substrate
Substrate is a software framework used for building blockchain networks. Because it’s a base layer for many different types of projects, it is absolutely crucial to ensure that it is secure. Just like the OWASP Top 10 list highlights key security risks for web applications, among other areas, there are also relatively common security concerns specific to Substrate/Polkadot. These can include things like how data is stored, how users get into the system, and how transactions are made.
It is not possible to completely secure any system, nor to predict all weaknesses that a system might have. It is, however, possible to analyse past audits and issues in multiple repositories of the Substrate/Polkadot ecosystem and compile and curate a list of the most often found security risks and vulnerabilities.
The main objective of this list is to help new (or not ) Knowing about them helps developers to build safer code on Substrate, as they now know where to find the most common weak spots and can therefore make sure to check and secure them.
This is a short introduction to the items on this list, the risks they imply and recommendations on how to mitigate them, in no particular order. It is not exhaustive, but it is an excellent place to start securing your systems.
Thanks a lot to all the developers for their help to put that list together, including on the remediation/mitigation side. And thanks to all security experts partnering on this.
List of common security risks to be aware of while developing with Substrate/Polkadot
Insecure Randomness
- Challenges and Risks
Weak random numbers can mess up key features like lotteries or voting, allowing attackers to guess or change the numbers to trick the system. - Mitigation
Choose a strong method for generating random numbers in your Pallets depending on your needs. If you cannot trust all validators, find a custom trusted oracle. Otherwise, you can use a method VRF, which Polkadot uses in processes like auctions. To be sure the system is secure, make a regular check to ensure that these methods are safe and working as they should. - Example
An example of an insecure approach is the Randomness Collective Flip pallet from Substrate, which provides a random function that generates low-influence random values based on the block hashes from the previous 81 blocks. Here is how this pallet generates the randomness:
A more secure approach would be to use VRF from Pallet BABE as Polkadot uses in its auctions. Here is how this pallet generates randomness:
Storage Exhaustion
- Challenges and Risks
If charging for storage is set too low, the barrier to prevent malicious use of storage is not effective. In this case, attackers might explore this to make the system slow and costly to run. - Mitigation
Make sure you charge an adequate amount, and that you charge explicitly for storage (deposit). Be sure to check regularly if the rules are followed correctly, and if possible limit the amount of data that can be saved to storage.
Unbounded Decoding
- Challenges and Risks
Failing to set a depth limit for decoding can break the system, as attackers might stop the network from working properly by forcing a stack overflow. - Mitigation
Set a depth limit for decoding objects such as calls (extrinsics), and always make sure that the code follows secure best practices.
Example
In the following example an extrinsic can be used to dispatch a call and the arguments are encoded, so the call to be dispatched is decoded using the decode method. Someone could craft a highly nested call and cause a stack overflow in the pallet that could lead to the validators not being able to generate new blocks. To fix this, the decode method can be substituted by decode_with_depth_limit.
Insufficient Benchmarking
- Challenges and Risks
Incorrect or missing benchmarking can slow down the network and it can also let attackers spam the system. - Mitigation
Run benchmarks using the worst case scenario conditions. An example is that the benchmark should cover the execution path where more DB reads and writes happen in an extrinsic. The primary goal is to keep the runtime safe, the secondary goal is to be as accurate as possible to maximise throughput. - Example
The following extrinsic form a modified version of the remark pallet and stores data on-chain.
However, it doesn’t consider the size of the remark (data to be stored). This causes the weight to not be correctly calculated and operations like blake2_256, where the computation used heavily depends on the size of the input, return an underweight value. To fix it,
benchmarks like this one can be implemented:
Here it is possible to see that the length (l) of the remark is considered, and a vector of that size is created. The benchmark linearly tries multiple values for this size. Now it is only necessary to pass the length of the remark in the weight function like this: T::WeightInfo::store(remark.len() as u32)
XCM Arbitrary Execution
- Challenges and Risks
Poorly set-up XCM (Cross-Consensus Messaging) might allow attackers to mess with the system or make unauthorised actions. - Mitigation
Limit the usage (Access Control) of XCM execute and send operations until it has been proven to be 100% safe. - Example
The following configuration could enable the execution of any Transact instruction.
To fix this, the Everything value can be changed for a more detailed SafeCallFilter like this one:
This will filter all calls that are not remark_with_event.
XCM DoS
- Challenges and Risks
Inadequate XCM set-up might allow attackers to overload the system, slowing it down or even stopping it. - Mitigation
Set up XCM correctly to filter incoming calls and allow only interaction with trusted parachains. This will help avoid DoS (Denial of Service) attacks. - Example
A parachain with the security risks XCM Arbitrary Execution makes it possible for an attacker to use the send function and spam XCMs to other chains. If the chain receiving the messages does not implement a good filter, the messages could cause a bottleneck in the XCM queues of the chain, potentially stopping the chain from receiving any new ones. It can even lead to the chain dropping incoming messages.
Unsafe Arithmetic
- Challenges and Risks
Unsafe maths operations can lead to wrong calculation results due to overflows/underflows. This might open a door for attackers to trick the system and cause serious inconsistencies. - Mitigation
Use safe maths functions that check for errors like checked_add or checked_sub, proofread the code for unsafe maths and fix them. - Example
The following sample extrinsic can be used to transfer tokens. However, it does not check the amount being transferred before changing the balances, which can lead to an overflow or an underflow. This can be fixed by using checked_sub in the new_sender_balance calculation and checked_add in the new_receiver_balance calculation.
Unsafe Conversion
- Challenges and Risks
Changing one type of number to another type without double-checking can lead to errors, of which attackers may take advantage. - Mitigation
Double-check whenever converting types of numbers to make sure that there are no mistakes. Try to avoid downcasting conversion. Proofread the code for any unsafe changes and fix them. - Example
The following extrinsic, from the Frontier pallet, used to be implemented like this. It was using low_64 to downcast the gas_limit variable so as to be able to use it in the Runner::call method. However, this method can lead to overflows. To fix it, the method was changed to unique_saturated_into, where the value is truncated to the max value of u64 if the previous value was higher.
Replay Issues
- Challenges and Risks
Bad handling of nonces might allow attackers to repeat transactions and slow down the system. - Mitigation
Always ensure nonces are correctly set-up in the logic of your system, and use checks to make sure transactions cannot be repeated. - Example
A security issue discovered in Frontier found that transactions were not validated in the State Transition Function (STF) (which is important when a block is being made), due to the usage of a new function validate_self_contained that is not part of the STF. This means that a malicious validator could submit invalid transactions, and even reuse transactions from a different chain. To fix this, the transaction validations (nonces) were added back to the STF.
Outdated Crates
- Challenges and Risks
Using old, unsafe or incompatible code parts can open up many risks. Attackers could use known weaknesses to harm the system. - Mitigation
Always use the newest and safest versions of your dependencies (crates), and keep track of any new risks and fixes. - Example
The following pallet is using dependencies from different versions of Substrate. This could lead to serious incompatibility problems. To fix this, always ensure the usage of the same version in all Substrate dependencies.
Verbosity Issues
- Challenges and Risks
Lack of detailed logs from collators, nodes, or RPC can make it difficult to diagnose issues, especially when crashes or network halts happen. - Mitigation
Implement logs in the critical parts of your pallets, regularly review them to identify any suspicious activity, and determine if there is sufficient verbosity. - Example
When a parachain halts, its engineers need to check all the logs to understand what caused it. If there is no sufficient logging, engineers might need to spend days getting to the root of the problem, wasting precious time. Consensus systems are complex and almost never halt, but when they do, it is difficult to recreate the scenario that led to it. Implementing a good logging system can therefore help to reduce downtime.