I reduced polkadot-sdk (incremental) compile times by 35%

tldr;

I forked and modified the Rust compiler, speeding up incremental builds of polkadot-sdk (and Substrate framework) by 35%. Implementation details and results of my proc macro expansion caching are at How I reduced (incremental) Rust compile times by up to 40% . I’d love to have Substrate developers try it out!

The edit-compile-run cycle for polkadot-sdk is slow

Hi all, Kapil here. I’ve been building in the Polkadot ecosystem for 18 months; I had written cross-chain apps like PrivaDEX and Koingaroo. The past few months, I have been building Rust developer productivity tooling to speed up your edit-compile-run cycle. While I love Rust (and Substrate) for its customizability and runtime performance, I have long been frustrated with the resultant compile times.

This frustration hit a tipping point a few months ago, as my VSCode’s rust-analyzer plugin was taking 20+ seconds to do a simple cargo check. From the Rust/Cargo/Rustc books and Rust community’s blog posts, I picked up and applied most of the standard tricks to speed up builds — but with only marginal improvement. Continued frustration led to me profiling and analyzing compile times, and soon I landed at the Rustc (compiler) source code…

I modified rustc, yielding 35% faster builds for polkadot-sdk

I modified the compiler and implemented procedural macro caching, which has considerably sped up incremental builds for projects with macro-heavy dependencies like polkadot-sdk. My fork of the compiler is still private as I’m still thinking through the licensing model; I am packaging it into a build/development server for a few Rust dev teams, and have already shared it with a few self-hosters (including a few Polkadot developers). In my first blog post, I discuss in detail my macro caching implementation and results.

If build times are an issue for you, I want you to try this out! I’d love to hear your thoughts, and I’m happy to talk further about my compiler modifications.

9 Likes

Awesome! Would be amazing if a variant of this hits the mainline at some point. In any case, much appreciated effort!

Thanks! I have reached out to the Rust compiler team, but there are challenges in getting this to stable Rust.
But that’s not at all a blocker for Substrate developers – particularly if they’re willing to use it on development/build servers I’ve set up (putting the ‘Remote’ in ‘CodeRemote’) or perhaps run a custom toolchain locally.

1 Like

I’d think rustc would need proc macros to declare if they’re deterministic enough.

I’m maybe blind, but I do not see the rustc patches anywhere?

@burdges you’re totally correct, and that is macro caching’s main blocker for the pathway to stable Rust. I’ve raised this in Zulip with many of the compiler developers, and they’ve pointed me to some in-progress initiatives to be able to differentiate pure from impure macros:
https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Caching.20proc.20macro.20expansion.3A.20up.20to.2040.25.20speed-up/near/427807121

That’s partly why I’ve kept my changes in a private fork. More than happy to share it with you though if you’re interested or want to try it out

We’ve a whole ecosystem, so maybe our approach should be to write a crate that memoizes ASTs into files in the ./target directory, and which one integrates directly with pure proc macro crates. All 100% local, no pure vs impure concerns since the proc macro crates author chooses. It’s completely transperent, so our whole ecosystem benefits without most even knowing. It’d need purge old AST artifactes eventually too, so some other metadata file in the ./target directory.

We’d gain enormously just from using this with internal proc macros, but we could upstream these changes into pure proc macro dependencies too. And maybe the whole rust ecosystem benefits eventually. It’ll miss some dependencies who never upstream those changes, but maybe regular conditional compilation covers them.

It’s also obvious most users should never trust a build server controlled by others, like they should never do curl .. | sh either. lol

I am also offering my compiler mod to users in a self-hostable form that is source-available; I have a small handful of users set up this way. But I genuinely think the remote server is the best user experience because it
(a) doesn’t require the user to deal with configuration or machine management, and
(b) opens the door for team-wide shared caches.

More specifically on your link to my HN post (which brings up great points, I’m glad you came across it!), please see this response to your linked comment, by one of the rustc contributors. In practice, every company chooses applications and services they can trust: proprietary compilers, proprietary IDEs, proprietary databases, and remote CI/CD servers are all pretty common – and I’ve worked with the latter three at a larger firm.

I’m not sure I understand your idea on memoizing ASTs. Are you suggesting that the Polkadot ecosystem develop a crate that wraps many hand-picked “pure” macro crates? That would be extremely tricky since polkadot-sdk relies on several crates that are technically impure e.g. include_bytes

Individual companies here have service providers, like all companys, but polkadot’s buisness model is people not trusting as many people, because that’s what “distributed systems” and “blockchain” mean. We cannot encurage the whole ecosystem to use specific dev servers, nor could any other security or blockchain company.

Yes, any proc macro developer could add the memoization call into their proc macro if they felt their proc macro was pure enough.

Why is include bytes slow? That’ sounds bizarre.

Ah gotcha you were speaking for getting this mod to all people. Agree that isn’t doable with a private fork, but it’s certainly doable for a few to get onboarded if it’s a big enough problem!
[Edit: To be clear I intend this as a replacement for their local development ie cargo check/build/test/run and rust-analyzer. This would not touch production releases.]

What you are referring to for the second one sounds backwards. In the ideal future scenario, developers simply mark their proc macro crate as pure. Then the compiler knows to cache it. The proc macro authors should not be responsible for memoization themselves, and instead should just mark that it’s a candidate. But this necessarily requires compiler modifications because the invocation context is only known by the compiler.

I meant to say include_bytes is an impure macro (and one of a few such impure macros that are heavily used throughout polkadot-sdk), not that it is slow.