Release of Dotlake - Community Edition

parity-data · December 19, 2024, 7:14pm

Build your own Polkadot data ingestion pipeline with Dotlake Community Edition

In order to democratize the ability to ingest, process, and visualize Substrate data, we are open-sourcing a community edition of Dotlake (our data lake of blockchains).

The Dotlake Community Edition is designed to simplify access to Polkadot and other Substrate-based blockchain data. With this release, we aim to make it easier for developers, analysts, and researchers to ingest, process, and visualize blockchain data without unnecessary complexity.

This version combines robust components—a Substrate API Sidecar, a Custom Block Ingest Service, and Apache Superset—to create a complete pipeline for blockchain data ingestion and analytics.

In this blog, we’ll walk you through what dotlake-community is, how to set it up, and how you can start pulling data like a blockchain wizard.

Introducing dotlake-community

Before we dive into details, this is what’s in the package:

A data ingestion pipeline for Substrate-based blockchains (like Polkadot).
A simple combo of tools: Substrate Sidecar API, a custom ingest service, and Apache Superset for analytics.
The ability to turn raw blockchain data into something insightful and visually pleasing.

Sounds cool? Let’s get started!

Prerequisites: What You Need in Your Toolbox

Make sure you’ve got these essentials in place:

Docker and Docker Compose — For container magic.
Access to a Substrate-based blockchain node — You’ll need a WebSocket (WSS) endpoint.

Got everything? Let’s dive in!

The Building Blocks of dotlake-community

Here’s a quick peek under the hood. dotlake-community is made up of three key components:

Substrate API Sidecar — Your seamless gateway to blockchain data, providing a user-friendly REST API for easy access.
Custom Block Ingest Service — This is where the magic happens. It pulls, processes, and stores blockchain data.
Apache Superset — Your data visualization sidekick. Think dashboards, charts, and all the pretty graphs.

Getting Started: Let’s Build That Pipeline!

Let’s get started! Follow these steps:

Step 1: Clone the Repository

First things first—grab the code from GitHub:

git clone https://github.com/paritytech/dotlake-community.git
cd dotlake-community

Step 2: Tweak Your Config

Now, open up config.yaml and customize it to your needs. Here’s a cheat sheet:

relay_chain: Polkadot
chain: Polkadot
wss: wss://polkadot-rpc.dwellir.com
create_db: true  # Set to true if database needs to be created
retain_db: true  # Set to true to retain database after the end of process.
ingest_mode: live  # live/historical
start_block: 1
end_block: 100

Change the wss endpoint to match your target blockchain. Easy peasy.

Step 3: Fire Up the Ingestion Pipeline

You’re almost there! Run the following script to kick things off:

sh dotlakeIngest.sh

Kick back and let dotlake-community handle the heavy lifting—it’s built to fetch, process, and store your blockchain data effortlessly.

How It All Works: A Peek at the Architecture

Curious about what’s happening behind the scenes? Here’s the lowdown:

1. Substrate API Sidecar

Connects to your blockchain node via WebSocket.
Exposes a user-friendly REST API (port 8080) for fetching data.

2. Custom Block Ingest Service

The heavy lifter of the pipeline. It handles:

Data Extraction — Grabs data from the Sidecar API.
Transformation & Enrichment — Preps the data for storage.
Storage — Saves the processed data into your PostgreSQL database.

3. Apache Superset

Plug into your database and build stunning dashboards.
Explore, analyze, and unlock insights from your blockchain data with ease.

Why dotlake-community? Why Not!

Whether you’re a blockchain developer, data analyst, or just someone who loves charts (no judgment here), dotlake-community gives you the tools to:

Extract blockchain data easily.
Process and store it efficiently.
Visualize it beautifully.

This tool can also be helpful for the newly launched parachains that don’t have all the necessary integrations yet in place (e.g., indexers, block explorers) and need to access data quickly, easily, and on their own terms.

Ready to Jump In?

If you’ve made it this far, you’re officially ready to start your dotlake-community journey. Head over to the GitHub repository to grab the code, give it a spin, and let us know what you think!
We’re looking forward to your feedback to keep improving Dotlake for everyone! Feel free to post questions, comments, and suggestions!

—

Happy data wrangling, blockchain explorers!

somedude · December 20, 2024, 2:21am

All right, sounds good!

Can you add System Requirements to that README.md (or here, but it’s better if its’ in the repo)? CPU, RAM, disk space.

Topic		Replies	Views
Select * from polkadot; Tech Talk	9	2665	February 15, 2024
Dune Analytics Style Data Service for Polkadot / Kusama Tech Talk infrastructure	16	2007	December 1, 2023
DotLake Community: Major Updates & New Features 🚀 Tech Talk	0	77	July 28, 2025
Substrate-etl 2.0: Polkadot + Kusama Data on Dune (12 months, 3/15/24-3/31/25) Ecosystem	3	601	December 4, 2024
DotSentry: Ecosystem-wide Monitoring Solution Tech Talk	7	782	June 19, 2024