Build your own Polkadot data ingestion pipeline with Dotlake Community Edition
In order to democratize the ability to ingest, process, and visualize Substrate data, we are open-sourcing a community edition of Dotlake (our data lake of blockchains).
The Dotlake Community Edition is designed to simplify access to Polkadot and other Substrate-based blockchain data. With this release, we aim to make it easier for developers, analysts, and researchers to ingest, process, and visualize blockchain data without unnecessary complexity.
This version combines robust components—a Substrate API Sidecar, a Custom Block Ingest Service, and Apache Superset—to create a complete pipeline for blockchain data ingestion and analytics.
In this blog, we’ll walk you through what dotlake-community is, how to set it up, and how you can start pulling data like a blockchain wizard.
Introducing dotlake-community
Before we dive into details, this is what’s in the package:
- A data ingestion pipeline for Substrate-based blockchains (like Polkadot).
- A simple combo of tools: Substrate Sidecar API, a custom ingest service, and Apache Superset for analytics.
- The ability to turn raw blockchain data into something insightful and visually pleasing.
Sounds cool? Let’s get started!
Prerequisites: What You Need in Your Toolbox
Make sure you’ve got these essentials in place:
- Docker and Docker Compose — For container magic.
- Access to a Substrate-based blockchain node — You’ll need a WebSocket (WSS) endpoint.
Got everything? Let’s dive in!
The Building Blocks of dotlake-community
Here’s a quick peek under the hood. dotlake-community is made up of three key components:
- Substrate API Sidecar — Your seamless gateway to blockchain data, providing a user-friendly REST API for easy access.
- Custom Block Ingest Service — This is where the magic happens. It pulls, processes, and stores blockchain data.
- Apache Superset — Your data visualization sidekick. Think dashboards, charts, and all the pretty graphs.
Getting Started: Let’s Build That Pipeline!
Let’s get started! Follow these steps:
Step 1: Clone the Repository
First things first—grab the code from GitHub:
git clone https://github.com/paritytech/dotlake-community.git
cd dotlake-community
Step 2: Tweak Your Config
Now, open up config.yaml
and customize it to your needs. Here’s a cheat sheet:
relay_chain: Polkadot
chain: Polkadot
wss: wss://polkadot-rpc.dwellir.com
create_db: true # Set to true if database needs to be created
retain_db: true # Set to true to retain database after the end of process.
ingest_mode: live # live/historical
start_block: 1
end_block: 100
Change the wss
endpoint to match your target blockchain. Easy peasy.
Step 3: Fire Up the Ingestion Pipeline
You’re almost there! Run the following script to kick things off:
sh dotlakeIngest.sh
Kick back and let dotlake-community handle the heavy lifting—it’s built to fetch, process, and store your blockchain data effortlessly.
How It All Works: A Peek at the Architecture
Curious about what’s happening behind the scenes? Here’s the lowdown:
1. Substrate API Sidecar
- Connects to your blockchain node via WebSocket.
- Exposes a user-friendly REST API (port 8080) for fetching data.
2. Custom Block Ingest Service
The heavy lifter of the pipeline. It handles:
- Data Extraction — Grabs data from the Sidecar API.
- Transformation & Enrichment — Preps the data for storage.
- Storage — Saves the processed data into your PostgreSQL database.
3. Apache Superset
- Plug into your database and build stunning dashboards.
- Explore, analyze, and unlock insights from your blockchain data with ease.
Why dotlake-community? Why Not!
Whether you’re a blockchain developer, data analyst, or just someone who loves charts (no judgment here), dotlake-community gives you the tools to:
Extract blockchain data easily.
Process and store it efficiently.
Visualize it beautifully.
This tool can also be helpful for the newly launched parachains that don’t have all the necessary integrations yet in place (e.g., indexers, block explorers) and need to access data quickly, easily, and on their own terms.
Ready to Jump In?
If you’ve made it this far, you’re officially ready to start your dotlake-community journey. Head over to the GitHub repository to grab the code, give it a spin, and let us know what you think!
We’re looking forward to your feedback to keep improving Dotlake for everyone! Feel free to post questions, comments, and suggestions!
—
Happy data wrangling, blockchain explorers!