When implementing a smart-contracts that handles a lot economic value you really want to make sure it isn’t vulnerable, a smart-contract is immutable and once a vulnerability is exploited there’s no rollback, is ok to rollback state changes caused by a bug in the protocol like Bitcoin’s Overflow Incident, but it’s not ok to rollback due user’s fault like losing funds or deploying a vulnerable smart-contract, Ethereum learned this during the controversial DAO Hard Fork, a single vulnerable smart-contract caused Ethereum and Ethereum Classic to split into two competing systems.
Since then new practices were adopted by the industry, one is hire renowned audit companies to review the code before publish, for example the contract used in polkadot’s ICO polkadot-claims was audited by Chain Security.
Usually what is audited is the high-level source code, whoever only the compiled binary is stored on-chain, how you can be sure that binary was generated from the same high-level audited code?
Maybe your first thought is "Checkout the source-code locally, compile it, then check if the local binary matches the on-chain binary. Easy peasy!“.
Essentially this is what revive does using the `source.hash`, but isn’t that simple, very strict requirements must be followed: 1. You must somehow have access to the off-chain source-code (obviously). 2. Reproduce the exact same steps used to compile the original binary, it means exact same compiler version, settings and tools. 3. Guarantee all steps are "pure“ or deterministic, it means a given source always generates the exact same binary, while this is straightforward using solc, this is not simple in Rust, you must use the same compiler version, right rust toolchain version, there are global dependencies like cargo-contract you may need replace, etc… that’s why ink! suggests using a docker image.
4. Assuming we have a fully deterministic compilation pipeline, are we done? Not yet, while we guaranteed the same code always generates the same binary, we haven’t guaranteed this is the ONLY possible code that generates that exact binary, is a subtle difference with big implications, without this anyone can generate multiple sources for the same binary, I can pretend a given contract have an arbitrary code, but still match the binary using rust’s shinigamis to not include that arbitrary in the final binary, etc… this makes tools like Etherscan Verify Contract inviable to exists in ink! ecosystem, because Etherscan assumes there exists only one valid source+metadata per binary, that’s why it allows anyone to trustless publish the source and metadata of any on-chain binary.
Polkadot and ink! mainly rely on docker images to compile the source deterministically, tools based on docker like srtool and cargo-contract are used to compile wasm and riscv binaries deterministically.
Advantage:
cargo-contract docker image contains more than just the compiler, it also includes all development tools needed.
Downsides:
Incomplete: The ink! metadata doesn’t contain sufficient information for fully deterministically generate the same binary, for example it includes the compiler version, but not the docker image hash used to compile, if any. Personally I don’t think docker images should be the only possible way to create deterministic binaries, it was ok for runtime code, but is inconvenient for smart-contracts.
Disk Space: Docker images are big and not easy to distribute, contracts-verifiable:6.0.0-beta.1 have 500mb compressed and 1.5GB after you instantiate the actual container in aarch64-macos.
Slow: The official image was built for amd64, running it in a aarch64 machines requires emulation, which is very very slow even in high-end Apple Sillicon machines.
Not “always” reproducible: If docker’s registry get offline, simply rebuild the same Dockerfile locally doesn’t guarantee you get the same image, because fetching external dependencies isn’t a pure step, if you attempt to build an old ubuntu:14.04 Dockerfile today it doesn’t work because the /etc/apt/sources.list no longer exists, apt-get update no longer works, etc…
Solidity compiler appends a CBOR-encoded hash at the end of all smart-contract bytecode, this allows tools like Etherscan to verify and index deployed contracts.
Solidity compiler is deterministic and self-contained, any release is compiled to all major targets, including wasm (solc-js).
Instead the compiled code’s codehash, solidity uses the contract metadata hash, which includes compiler settings, compiled bytecode and believe or not the plain-text source and dependencies, this means any changes in the plain-text source results in a different hash. This have downsides too, but is how they guarantees one source to one binary relationship, thus allows “full verification”, as well as pinning the files publicly on IPFS to be accessed with the metadata hash.
solc-js compiler allows tools like Remix IDE to exist, it provides a complete developer experience directly in the browser, very appealing for beginners.
Compared to docker imagens, solc runs natively and the binaries are small (less than 10mb), they are easily distributed using github releases or package managers like Solidity Version Manager (SVM).