One repo to rule them all: A data-driven look at Polkadot’s Monorepo

The following article, spearheaded by @joyce , is the outcome of an excellent collaboration between Parity Data, Parity Engineering (and of course the support of @oliverbrett111 ). It delves into the quantifiable impact of a significant technical shift made a few months back. We’re sharing these insights with you as we think it showcases a novel approach in looking at GitHub data beyond just counting commits.

Technical decisions very often boil down to finding a middle ground among differing opinions. Such is the case for example when a group of engineers debates whether organizing the work in a monorepo or in multiple different repositories makes more sense.

To illustrate my point, these are to date the top Google results for “monorepo vs repo” at least in the SEO perimeter I am currently in:

These articles have something in common insofar as they mostly focus on qualitative arguments for one or the other approach and leave it to the reader to decide which approach could be the better one for their particular case.

In our case, since we’re working on decentralized and open source infrastructure, it makes it a lot easier to collect data on and derive quantitative insights from these key decisions.

In August 2023, the development of Polkadot underwent a major transformation, merging 3 distinct repositories (Substrate, Cumulus, Polkadot) into 1 repository, called Polkadot-SDK monorepo. While developers sensed an immediate shift in their workflow, we needed some time to gather enough data to evaluate the impact of this change fully. (more info here)

We now have a significant amount of data to make a good call on whether or not this was a good decision and we’re sharing the results with you. So what do you think, was it the right call? I invite you to read more on Joyce’s blog linked in this tweet:

Link if you don’t use Twitter: One Repo to Rule Them All: A Data-Driven Look at Polkadot's Monorepo

The data this analysis is based on is mainly sourced from GitHub and sits within the off-chain side of the DotLake. We’ve enhanced our capabilities in collecting off-chain data and persisting it alongside on-chain data to provide the most comprehensive insights in the Polkadot ecosystem (as you might have seen in the Polkadot 2023 End Of Year Report)

This includes information from various sources such as Twitter, Telegram, Polkassembly and many more, each source offering a unique perspective that when combined, can reveal profound insights into the effectiveness, progress and impact of our ecosystem’s initiatives and decisions. Think of each data source as a puzzle piece that, when put together, paints a full picture of the development of Polkadot. :wink:

We’re really excited to share more of these insights around engineering, infrastructure and data over the next months.

Who knows what else the data holds?

Note: Want to collaborate on similar analysis/articles? Please reach out! :hugs:


The data team killing it YET AGAIN. :fire: :fire:

1 Like