Polkadot Native Storage Updates

One last point regarding performance. My understanding is the polkadot native storage is intended to replace essentially all storage including historical block data – I would suggest doing some benchmarks on retrieval while PoSt is on-going.

If the size of the window was reduced you could force gen 3/4 nvme or better requirements on storage providers. If you look at other provider networks, they’ve all focused on capacity rather than performance. The result is an over abundance of “cold store” quality storage. Low bandwidth, low iops, but extremely high capacity.

Since this will be utilized for things such as Pixelproof (polkadot #1631) and IIUC, to replace RPC – it’s critical that we have not just sufficient quantity of storage but that it’s sufficiently responsive (high mixed (70/30) iops) and sufficiently accessible with high bandwidth & throughput.

I would strongly recommend not rushing on this aspect and taking the time to get it right as it will impact the performance of every application that builds on it.

we can consider it but current plan is just DOT.

Please do – It will be difficult to gain traction on a service where the providers need to script to constantly update their market pricing and where the amount of dot needed will vary from payment to payment. The underlying cost basis for providers are in stables, that’s how providers will look at the economics and it provides a less volatile mechanism of exchange for customers of DA. The treasury won’t care if it receives USD or DOT as long as it’s revenue.

1 Like

(off-topic, discussing JAM with @olanod)

One pitfall you should be aware is that a WP is a rather large amount of computation and access to DA that is given to you. So the service is configured such that any registered user of this service will pass the authorization and can provide a WP whenever? What will be in the rest of the WP?

This is, from the perspective of the developer, definitely interesting, and what is currently the gap. Let’s see how @jmg-duarte clarifies the plan, and maybe it will fit this.

touching-the-DA-as-a-service is interesting and could be as good as we can get wrt storing more data such that it is availble to JAM services.

Thank you all for your feedback. You raise very valid points, please allow us some time to thoroughly review them and discuss internally before we share a response pushing the discussion forward.

Hey @kianenigma, thank you for diving into the project, we appreciate your time. Firstly, we wrote about the JAM integration idea 10 months ago, which in JAM time is an eternity. The wording is probably raw and needs some refining, we aim to improve the idea itself and its presentation by iterating with discussions like these. One of the original goals was to store all the expunged or forgotten data, keeping historical archive if it was later needed. We called it “resupply of unrequested data that needs to be provided to validators” which we think is relevant to what you said “once a hash is solicited, someone has to give the preimage to the JAM validators to provide the E_{p}, and to us this seems like a potential gap”, maybe our sentence mixes both ideas into one, we were unsure at the time if expunged data needed to be re-supply-able. We also wanted to make the storage available to all services, we think this your #3 type of storage “Service storage, fully kept offchain (analogous to parachain state). Can be as large as it wants to be, but the issue is that all of this has to be part of the work-package”, @olanod puts it in words better then we did — “but I’d like apps to be able to persist documents for longer simply by moving/writing files to a different location(e.g. /archive/*), that action should trigger some kind of preimage pinning that uses a separate storage service”.

That is the goal, being persistent storage to the temporary DA, the SSD to RAM. The challenge then becomes technical, how to integrate it with JAM clients.

@tom.stakeplus Good point on performance and the pov of how people calculate the economic model in stables. We plan to do these performance and economics model tests now in Phase 4.

1 Like

2025-07-17 Update

Since our last update we’ve massively simplified the parachain, previously relying on libp2p DHTs, bitswap and so on.

We’ve moved more information on-chain and now discovery is made relying on information on chain, no more PeerIDs!

We’ve also removed the market account, no more layers!

Likewise, we’ve implemented support for all these changes in Delia.

hey everyone! im super excited about the opportunity to participate in the future of web3 storage. ive been reading through the docs and i just want to make sure I understand this revolutionary model correctly:

so as I understand it, as provider i’ll be storing complete, unencrypted full files on my server(direct connection leaking ip on uploads). this is great because ive always wanted to know exactly what legal liability I’m assuming! none of that pesky plausible deniability that filecoin/tahoe-lafs/ipfs pioneered.

the permanent blockchain record linking my real identity to every file I store is particularly innovative. when someone uploads “ThailandKingMemes.pdf” and i get that 3am knock on my door, at least the prosecutors will have immutable proof of my involvement.

tldr - who the fuck is going to be chad enough for running provider nodes for this architecture risking CSAM being uploaded for pennies?

6 Likes

Any idea when we will get a response to this? I’m naying any refs that use PNS or involve PNS until there is resolution.

Hi @hitchhooker, thank you for taking the time to dive into the details, its a valid concern we’ve been thinking about for a long time too.

The old design was an intentionally naïve implementation to reach a working system but not the end goal. Now, in phase 4 we are working on a new proof system, erasure coding, data shards and either native or client side encryption. We talked about this briefly in the phase 3 proposal but we’ll soon publish some documents detailing the approach.

We’ll try to do a better job to clarify these plans in the documentation, thanks for raising this.

2 Likes

Apologies in advance for any errors or misunderstandings on my part. I have limited first hand experience with filecoin. If you need any help brainstorming I’d be more than happy to help send an invite on matrix ( @stakeplus:matrix.org ). I suspect that the other IBP members would also be more than happy to help but I’m not going to speak for them.

Anyway – I have been giving this some thought since your last reply

It appears as if filecoin is switching from PoRep to “PDP” (Proof of Data Possession) ? PDP: No Sealing / Unsealing, Incremental proofs, Uses SHA-2 (HW Accel),

If you need to redesign this from scratch (seemingly) anyway – I was curious, why not directly interact with Ceph Bluestore? EC would be built in to Ceph. IIUC, EC is only necessary on filecoin for sealing and redundancy (recovery). With PDP we no longer need to do sealing so we could probably just let Ceph itself deal with EC.

Each file could / would / should be stored as its own object allowing files to be independently read / written. For example, Pixel Proof would be able to view, edit, delete, or add a single file / image without needing intermediate access from a phala tee. The client would just need to decrypt the streamed files on the fly.

Storing all files as independent objects would require maintaining a metadata in the store itself so the client can quickly know every file currently part of that store. The provider itself would be able to maintain this metadata file as the end user reads, writes, adds, and deletes files from the store.

With Proof of Data Possession – It would only be necessary to grab some bytes of a file (examples I saw were < 1KB) and perform the proof. Very light weight it seems.

For encryption – It appears HPKE with x25519 signing is / would be the recommended solution to allow full ACL / UAC controls, public access, private access and everything inbetween.

The storage customer’s flow would be –
A) Create a store
B) Modify ACL / UAC of the store
C) Upload (Encrypted) files to the store based on ACL/UAC
D) Optionally modify ACL/UAC of individual files

The storage provider’s flow would be –
A) Accept storage request from user (create the store)
B) Provider updates UAC/ACL in metadata
C) Provider starts spot checks with PDP on arbitrary files, arbitrary locations, and arbitrary data segment sizes
D) Provider provides PDP results to network

The end user / website enjoyer / storage downloader / receiver flow would be –
A) User visits website or is using an application that downloads data from storage. The application would contain the “public key” (I do not mean the public port of the key, but whatever the key was that was assigned to “PUBLIC DISPLAY” or “PUBLIC VIEWING”) allowing the JS on the site to grab and decrypt the data that needs to be displayed. The JS itself would just have the private key of the “PUBLIC KEY” (again, to specify a key assigned to allow the files to be viewed publicly) built into it.
B1) If the user has a private ACL / UAC for their owned address – they would download and decrypt files with that address.
B2) If the user is connecting via a public assignment – There would be a key assigned on the HPKE that would be utilized with any software / website. That key becomes the “PUBLIC KEY” (it’s actually a private key) – The “PUBLIC KEY” is just the private key of a new key that you don’t care if someone knows it or note because the file(s) were meant to be public anyway.

ACLs & UAC should definitely have read, write or read&write permissions assigned to each address listed/included in the HPKE (regardless if ACL/UAC inclusion is by store or by file)

---------------------------------------------------------------------------

Additional Thought: Assuming this works how I think it would work – there would be almost no difference between utilizing Ceph Bluestore and Redis Cluster (different from normal redis). Assuming these changes are applied successfully, Polkadot Native Storage should be renamed to something like Polkadot Disk Storage, Polkadot NVMe Storage, etc and we should launch an additional service called Polkadot Memory Storage. With Polkadot Memory Storage we can roll apps that will be insanely fast. My current use case for this is to develop our own Element/Matrix/Discord service. We could also probably use it for an email like service, but, I’m not sure that needs to be stored in memory.

We could also launch something like Polkadot Cold Store (Also using Ceph Bluestore) for very high capacity very low cost long term HDD / spindle disk storage.

The only difference between Memory, NVMe and HDD – other than Ceph Bluestore vs Redis Cluster) would be the PoSt & PDP configurations. Optimally configuring proof size, proof window, etc so that you must have certain hardware minimally to be able to complete the proof in the necessary window.

Update: Since Ceph RGW is AWS S3 compatible – It would be possible to remote mount the ceph store as a posix “compatible-ish” filesystem using a fuse adapter ( s3fs-fuse ). It wouldn’t have the full range of posix compatibility (lacks full perms and some other stuff) but would be able to provide many desirable features like random read/writes, multi-part transfers, etc.

2025-11-3 Update

I would like to bring the community up to date as to where Polka Storage stands. Currently the team is preparing a Phase 4 proposal based on the vast research done as part of Phase 3 along with some industry related interviews to determine best coarse of action for Phase 4. More granular updates will be provided moving forward, however to reconcile the gap in communication we will be sharing our learnings over a series of updates. More to come.