RFC: trustless in-code auditing annotations

I noticed the other day that all the Rust AST structs, including ItemMod and others used by syn and quote, etc., all #[derive(Hash)]. In fact, I ran a quick test in the base FRAME #[pallet] macro, and found that we can easily get a hash code for the current AST of a pallet or really any piece of rust code.

This gave me a very interesting idea – what if we were to add auditing annotations directly to pallets, and other things in substrate/polkadot, based on hashes such as these?

The basic principle is simple, and I illustrate it below for the FRAME use case, but similar approaches could work elsewhere in the ecosystem:

For FRAME, let’s say we define some attribute macro, maybe one that looks like below, to be applied to pallets that have been audited. The attribute would specify a hashcode of the pallet AST, preferably minus any superfluous doc-related AST nodes, that represents the version of the pallet that that has been audited. Then if the hash code changes we simply issue a compiler error saying the pallet needs to be re-audited (the hash code would only change if a non-doc change is made to the AST):

pub mod pallet {

We could also optionally pair this with some sort of signature from the auditing team, but I don’t think that is even necessary – we can just only allow the auditing team to change #[audited()] lines or something like that.

Regardling the flexibility of this approach, we could also do whatever transformations to the AST we want before hashing, so in principle we could even ignore certain portions of pallets systematically if that was desirable before generating a hash code.

One of the deliverables from a pallet audit would then become a member of the auditing team adding a commit where a #[audited()] attribute is added or updated. This would make things nice and traceable right from the code and would allow us to associate audits with a commit and a line in the repo.

If a change is ever made to one of those pallets, we’ll know right away they need to be re-audited because the hash code will change and we will get a compile error, triggering the need for a new audit.

What could we do with this? Lots of things. For one, when proc macro warnings are stablizied, we could issue a warning in certain scenarios when a non-audited pallet is included in a runtime. We could also introduce some opt-in analogue of construct_runtime! like construct_audited_runtime! which would require that all pallets used in the runtime are audited and have passing hash codes. Since an overhaul of construct_runtime! is also on the horizon, this could be worked into that as an additional feature.

In general, this would provide extremely strong safety guarantees for runtime authors as they will know that if they use construct_audited_runtime! and their runtime compiles, every pallet they are using has passed an audit for the exact version being used in their code. This would be much better in my opinion than the current situation where we don’t track this in the code in a consistent and trustless way and have to go around asking or checking old PRs to see what was audited when.

In the wider ecosystem, for example on parachains and situations where people outside of Parity are making and using their own pallets, third party devs would be of course free to conduct their own audits and use this machinery on their own runtimes/pallets, so I think this would be a win for them as well. Since it is an opt-in system, they would be free to use it or not.

There is also no reason we couldn’t apply this sort of auditing process elsewhere outside of FRAME. In principle, any block of rust code could be audited and annotated in this manner :exploding_head:

This approach should also be immune to issues related to feature gating, as the #[cfg( directives at this stage of compilation are still just AST nodes, so these won’t result in different hash codes depending on what features we compile with – you should get the same hash code every time regardless of what features you enable :sunglasses:

Anyway I just wanted to set this up as a starting point for a discussion. I can confirm that generating hash codes of entire pallets is doable and that I’ve done it locally :slight_smile:

I’ll be creating an issue in FRAME suggesting some version of this #[audited()] / construct_audited_runtime! syntax, but would also love to discuss here any other places in the wider substrate and/or polkadot ecosystem that could benefit from this sort of auditing hash code syntax.

What do you think?

1 Like

Interesting idea. However I would like to note that you can still have serious bugs when building a runtime with only audited pallets. There is no guarantee they are all compatible. So that will likely give you a false sense of safety. You still need to audit the runtime.

1 Like

Although slower, I’d suggest going through the Rust secure code WG for discussions like this, as they think much more about language level security than any of us (zulip, github, rustsec)

I will follow up with them and see what they think. Thanks!

update - issue posted here: https://github.com/rust-secure-code/wg/issues/47

update: I did some tinkering over the weekend to develop these ideas into a crate called “audited”. I have arrived at the following syntax, with some additional extended options regarding how to handle “use” and “extern crate” statements:

    sig: "895476084D37BC4C9EE5C469EB1BD1178AE631DA7903481123F379AB2A921A0F\
    timestamp: "Tue, 29 Nov 2022 05:11:31 +0000",
    signed_by: "Sam Johnson <sam@durosoft.com>",
    public: "2193B7E4EE81686E4FE7FA700967A4E142259152265449E5AE2D69B959464317",
mod my_mod {

I have also figured out how to resolve issues with externally defined items and have come up with a granular system that supports signatures, and direct and indirect signing (different options for how to handle the scenario where an external signed module/file is updated, and specifically, whether that should mean you now have to re-sign all signed things that use that thing or whether merely re-signing the sub-thing is sufficient). The crate should be done after another weekend or two of tinkering.

With these planned features, you can be certain that 100% of the code that makes up an item has been audited, and I don’t think I will need to rely on special/fragile compiler behavior like macro_state.

FYI rustc has a hash on the AST called SVH that is calculated for free. It’s not outputted for rlibs at the moment (or at least wasn’t) though this PR would achieve that https://github.com/rust-lang/rust/pull/75594 . (I was after solving the docker mtime issue but got bogged down with cargo and the interactions with build.rs)