Offchain Workers: Design Assumptions & Vulnerabilities

Offchain Workers: Design Assumptions & Vulnerabilities

I joined Parity in late 2021. I was initially on the Ecosystem Success team, and now I’m part of the Delivery Services team. During this time, I’ve been working closely with the Substrate Builders Program (SBP), mostly doing Milestone Reviews, where I dive into the codebase of some team and give them some feedback on best practices of Substrate development.

With enough repetition, after a while we start to notice patterns. Small things that maybe are common for developers who weren’t previously familiar with Substrate, which is understandably hard to grasp at first.

But one particular pattern has caught my attention: how Offchain Workers (OCWs) are being (mis)used, and the design assumptions taken by some teams. There seems to be a systematic misunderstanding of what OCWs are, and (most importantly) what they are not.

This is a sign that there is probably something wrong with the way that the Substrate docs and examples are selling this feature to our community. And something must be done about it, so I’m starting this thread with the intent of:

  • Providing a detailed description of the issues I’m frequently seeing.
  • Creating a space so we can have a community discussion on potential solutions.

Intro on Offchain Workers

Offchain Workers have been a Substrate feature since 2019. They are a subsystem of components that enable the execution of long-running and possibly non-deterministic tasks, such as:

  • website service requests
  • encryption, decryption, and signing of data
  • random number generation
  • CPU-intensive computations
  • enumeration or aggregation of on-chain data

Basically, they’re a convenient service that nodes can provide.

OCWs provide many features, and my intent is not to cover every single one of them. More specifically, I’m choosing to explicitly ignore:

  • Offchain Indexing
  • Offchain Storage
  • Concurrency primitives

Instead, I want to focus on the context of submitting the result of OCW computation into On-Chain Storage via:

  • Signed Transactions
  • Unsigned Transactions w/ Signed Payloads
  • Unsigned Transactions

It is also worth noting that the current implementation of OCWs still lack some desirable features, namely:

  • Guaranteed execution: currently OCWs are at the will of the client’s “major sync oracle”, which means OCWs will not execute if the node is undergoing a “major sync” event.
  • Execution on finality: many use cases would prefer that OCWs act on finalized blocks as opposed to every imported (but potentially discarded) blocks.

Common Misconceptions on Offchain Workers

The term “Runtime” is used somewhat ambiguously in Substrate development. Depending on the context, it can mean:

  • The WASM blob
  • The State Transition Function (STF)

So a lot of confusion comes from the fact that the State Transition Function (a.k.a. “Runtime”) and the Offchain Worker live in the same WASM blob (a.k.a. “Runtime”).

Many developers tend to interpret this as:

the OCW lives inside the “Runtime”, therefore it has execution privileges.

Which is absolutely NOT TRUE! This only means that OCW logic will also be updated during “Runtime Upgrades” and that OCW-enabled nodes will execute them as a service.

However, the STF still treats the OCW as a foreign entity, with absolutely NO execution privileges!

Understanding this conceptual differentiation is extremely important to avoid misleading Design Assumptions that will definitely result in vulnerabilities when taken into production.

Naive Design Assumptions on Offchain Workers

A very popular OCW use-case is writing the output of some OCW computation into the on-chain Storage via:

  • Signed Transactions
  • Unsigned Transactions
  • Unsigned Transactions w/ Signed Payload

Every time this use-case is written into some Substrate chain, a very caucious adversarial modeling must be done. The rationale of such modeling can be summarized as:

What if someone could benefit from writing tampered data into the On-chain Storage, while “pretending” to be an OCW? :thinking: :face_with_monocle:

Signed Transactions

  • Keys are loaded by Admin into Node’s Keystore.
  • Each Transaction pays fees.
    • The on-chain address associated with the OCW keys must have funds to pay the fees.
  • Fees impose a cost for writing data On-Chain.

Naive assumption:

This OCW transaction pays fees. Therefore nobody can spam my network and my Runtime is safe. :angel:

:x: Unless there’s some verification on the signature before execution (which introduces a factor of centralization), anyone could be sending this tx.

Adversarial modeling:

What if the benefit of writing tampered data outweighs the fee costs? :money_mouth_face:

Unsigned Transactions

  • Feeless transaction.
  • Requires a custom implementation of the ValidateUnsigned trait.

Naive assumption:

I implemented the ValidateUnsigned trait. Therefore only the OCW could be sending this extrinsic, and my Runtime is safe. :angel:

:x: All that the ValidateUnsigned trait implementation achieves is matching which extrinsic will be executed. There’s no validation in regards to who called it (by definition).

Adversarial modeling:

Unsigned Transactions are an OPEN DOOR to the Runtime, ANYONE could be sending them FOR FREE. What if someone could benefit from writing tampered data? :skull_and_crossbones:

Naive assumption:

I implemented the ValidateUnsigned trait so that it checks for TransactionSource::Local. Therefore only a Validator could be sending this extrinsic, and my Runtime is safe. :angel:

:x: Validators are only subject to slashing if they create a block that violates the STF. Creating valid transactions with tampered data would not result in slashing, and therefore the possibility of malicious Validators in this context is real.

Adversarial modeling:

What if some Validator could benefit from writing tampered data into the on-chain Storage? :smiling_imp:

Unsigned Transaction with Signed Payload

  • Keys are loaded by Admin into Node’s Keystore.
  • Feeless transaction, but the payload is signed.
  • Requires a custom implementation of the ValidateUnsigned trait.

Naive assumption:

This OCW transaction has a signature. Therefore nobody will dare send a malicious transaction, because they would be leaving an on-chain trace. :angel:

:x: Sure, there will be an on-chain record of their actions. But anyone can create some disposable keypair! Unless there’s some verification on the signature before execution (which introduces a factor of centralization), malicious actors could be sending this tx.

Adversarial modeling:

What if the benefit of writing tampered data outweighs the cost of leaving a trace? :disguised_face:

On-Chain Storage Finality

Naive assumption:

My OCW made some calculations based on some data that came from the on-chain storage, so whatever it writes back into the on-chain storage is correct. :angel:

:x: The current implementation of OCWs is triggered at every block import, regardless of whether such block is final or not.

There’s no meaningful adversarial modeling on this case, but this naive assumption is still worth mentioning.


FRAME Offchain Worker Example

FRAME provides an OCW example pallet. It was originally written by Tomasz Drwięga, who also wrote the original OCW implementation on the Substrate Client.

This example is based on a ficticious BTC/USD price oracle use case. The pallet’s README comes with a warning that:

In this example we are going to build a very simplistic, naive and definitely NOT production-ready oracle for BTC/USD price

While I can’t speak for Tomasz, I did have a chat with him to confirm that his original intention was to simply have some tangible scenario where it was possible to showcase the available OCW APIs and give people some hints on a conceptual level.

The problem is that the Substrate docs on OCWs were written mostly based on this example, and the caveats of this naive oracle aren’t obvious to untrained eyes. So a game-of-telephone effect starts emerging, where the community reads the docs, gets some interpretation that wasn’t what the original author intended to convey, and developers end up writing insecure code as a consequence.

While at Parity we did have some recent efforts to add some warnings on the current OCW docs, a wider community discussion is still needed on about this problem.

With the purpose of highlighting all the unwanted consequences coming from this OCW example pallet, I wrote github.com/bernardoaraujor/naive-offchain-worker. It consists of:

  • A node-template equipped with FRAME’s OCW example pallet.
  • naive-ocw-exploiter: a subxt-based crate that writes false BTC/USD prices into the chain state.

Trail-of-Bits’s building-secure-contracts repository also mentions this design pattern as a potential vulnerability, although with relatively less detail (only unsigned txs are highlighted).


Proposed Solutions

The purpose of this forum thread is to discuss what to do about these issues. A few ideas that crossed my mind while working on this:

  1. Write new OCW examples:

    • Some very generic and abstract example written from scratch.
      • Does not attempt to illustrate any use-case (not opinionated).
      • All design assumptions are carefully documented.
      • Consequences of misuse are highlighted.
    • Refactor the BTC/USD price oracle example into a permissioned solution.
      • A SignedExtension makes sure that only a limited set of accounts (defined by root/governance) are allowed to write prices into storage.
      • Arguably a centralized solution.
  2. Finish the current OCW implementation:

    • Guaranteed execution during “major sync” events.
    • Execution on finality so that only final blocks trigger the execution of OCWs.
    • Would definetely require oversight by the Fellowship.
    • Potentially a treasury bounty?

Nevertheless, it would be great to get more input from the community. Perhaps other people can think of better solutions on how to go about the issues described here.

18 Likes

Nice write up and very good ideas to improve the docs around these things!

Not exactly sure what you mean by this, but if you want to say that validators get slashed for blocks that fail to import, then this isn’t the case. Validators are not getting slashed for invalid blocks.

2 Likes

Thanks for the correction @bkchr !

Indeed, Validators that create blocks which fail to import simply won’t get the rewards for such block, however they will not be explicitly slashed for this.

A more accurate way of conveying the originally intended meaning would probably be:

There’s no consensus mechanism for penalizing malicious Validators in this context. Creating blocks with valid transactions that carry tampered data would not result in slashing, and therefore the possibility of malicious Validators in this context is real.

Nevertheless, the main message that I wanted to preach remains the same: trusting the tx to be honest just because it came from an OCW that lives in a Validator node is not a safe assumption.

(It seems the forum doesn’t allow me to edit the original post anymore, so I’ll leave this message as correction.)

Are there plans to “doc-ify” this more blog style post? Seems like it would be a huge win (and preventative measure to avoid misuse of OCWs) in the place most people learn about them:

https://docs.substrate.io/learn/offchain-operations/ (and the various linked how-to guides there)

2 Likes

Thanks for this write-up @bernardoaraujor - I hear the common misconceptions about what off-chain workers can and can’t do all the time. Especially the idea that transactions created through off-chain workers are somehow “special” and are gated against being created by malicious or alternative code. Off-chain workers make implementation easier but don’t replace solid mechanism design.

Agreed with @NukeManDan that it would be great for this content to land in the Substrate docs.

2 Likes

Thanks for the inputs @NukeManDan @rphmeier

My first step on this journey was to flag this issue for the Docs team, back in January. Our hot-fix action was to add several warnings into the OCW tutorial, which I believe is where most footguns live since it’s the first place people go for practical reference.

But indeed, ideally these clarifications must be everywhere OCWs are being discussed, be it on a practical or on a conceptual level.

As soon as time allows, I want to re-write the FRAME OCW example so that it is a bit more neutral and not biased towards the naive BTC/USD price oracle example (which has been a big source of noise).

And from that, I want to go through all OCW docs with a fine comb to mitigate every possible source of confusion.

But meanwhile, I’m also curious to hear back from the community. Perhaps teams in our ecosystem have extra insights coming from their own experience with OCWs, which could make this effort even more effective at raising the bar of how OCWs will be used going forward.

2 Likes

This PR aims to refactor FRAME’s OCW examples to avoid the confusion discussed above: