Additionally, I wanted to give a quick project update.
Here is a list of all the main areas of the code of smoldot, and what I think of them:
-
Trie: Everything regarding encoding/decoding trie nodes or trie proofs is more or less fleshed out. Two things that remain to do is split the proof decoding code in multiple steps in order to avoid freezing the browser if a big proof needs to be decoded, and implement compact trie proofs (which is low priority as if I’m not mistaken they are a parachain implementation detail).
-
Wasm execution: Executing the runtime, including the changes overlay, is in a pretty good shape. A lot of tests are unfortunately missing because it is often not clear what the expected behavior is. One change that remains to do is to provide the changes to the storage that are being done during the executing in a streaming way rather than buffer them in memory, in order to avoid being limited by memory in case a block performs a lot of storage changes.
-
Syncing: “Syncing” is what we call the algorithm that synchronizes the local chain with the one of the other nodes we’re connected to (including warp syncing). The main difficulty is to avoid attacks of all kinds. While the code is relatively robust, it is in the middle of a refactoring that would allow a warp syncing to be done if it turns out that we are far behind the head of the chain, which is important for example in case of a netsplit. I am also unsatisfied with the API of the syncing code, as this API causes a few pieces of code to be
O(n)
(n
being the number of peers) or evenO(m*n)
(number of peers and number of non-finalized blocks). -
Low-level networking: This covers encryption, multiplexing substreams over a single connection, interacting with the operating system, etc. I personally strongly dislike the “idiomatic” Rust way of doing things through the
AsyncRead
andAsyncWrite
traits. I have created a “read-write” API that requires way fewer copies. This system works well now, but I’m still a bit conflicted over some very small details of this API. Apart from this, smoldot doesn’t support clean connection shutdowns. Disconnecting from a peer is always abrupt (through a TCP RST for example). Given that Substrate doesn’t do clean disconnections either anyway, I consider this low priority. -
High-level networking: This covers choosing which peers to connect to, a ban system for misbehaving nodes, sending requests and trying again if peers are unresponsive, and so on. After a lot of trial an error, I believe that the high-level networking is finally in a pretty good state. Given that it has recently been refactored, some debug assertions are unfortunately triggering from now and then, but they’re all slowly but surely getting fixed. Both the full node and light client should be able to properly recover if Internet connectivity is lost, which for a long time wasn’t the case. Some small features are missing, such as sending periodic identify requests to peers for debugging purposes. Note that none of the parachain-related networking is implemented. Kademlia is also not completely implemented, as we only use it for discovering other nodes, and implementing it will be necessary if the smoldot full node is to ever become production-ready.
-
Chain spec, light client checkpoint, etc.: The existing “checkpoint” and “database” system of the light client will be reworked so that you can instead ask smoldot to send back an updated version of the chain specification. This system is overall more simple, as the only concept remaining would be chain specs, and you’re just manipulating chain specs.
-
JSON-RPC server: I am overall pretty dissatisfied with the code quality of the JSON-RPC servers of both the full node and light client. This code has gone through several refactorings, and I’ve had trouble finding a code design that leads to simple-to-read code. My latest attempt at simplifying the code is to completely split the code that answers requests from the code that deals with potential attacks, so that the code that answers requests doesn’t have to deal with that. This change is in progress.
-
Light client transactions pool. The code that validates then tracks transactions is way more complicated than one might initially think. Some very rare corner cases are unfortunately not handled properly, but overall the code is pretty robust. Proper tests are missing.
-
Full node database. The full node uses an SQLite database. The code is missing two important features: proper blocks pruning (i.e. removing old blocks and storage that we no longer need in order to save space), and an in-memory cache. The lack of in-memory cache unfortunately makes the full node so slow that it is unusable right now.
-
Light client and full node in general. I’ve been working on unifying all the light client and full node code to use the same code paradigm, which I find easy to read. I’ve also been working on making the light client more robust to internal panics by restarting services that crash. This of course needs to be done carefully in order to not break any logic. I am for the moment not doing this for the full node, as I think that it makes more sense for the full node to simply crash given that it is connected to a single chain, as opposed to the light client which can be connected to many different chains at once. The light client is also waiting for a refactoring: at the moment, adding a chain performs a lot of CPU-heavy operations synchronously, while it would be better to perform this in a background task and make the
Client
object a simple wrapper that sends messages. -
JavaScript code on top of the light client. The JavaScript code that actually provides the API to users of smoldot is overall in a good shape, and there’s not much to say about it.