My thoughts on Substrate
Originally posted on Medium
I’ve been playing around with Substrate over the past couple of weeks. Substrate is a Rust-based blockchain building framework that is powering Polkadot and its associated blockchains. I’ve been a blockchain skeptic but there are a few changes in the ecosystem that is changing my opinion.
So, why would I be interested in the world’s worst database? Read on.
On Blockchains
As many of you would know, blockchains started getting popular after Bitcoin and Ethereum showed that you can run a permission-less network and how it can reach consensus without a central authority.
I’m less interested in the financial/speculative aspects of these and rather more interested in its technical applications. Bitcoin had a very limited set of op-codes as a part of its sigscript field to allow arbitrary mathematical puzzles to be inserted in the transaction. This allowed other developers to come up with interesting applications by embedding all sorts of data within those transactions. These could embed things that need not relate to cryptocurrency as such. For example, reserving a name or pointing to a hash of a file.
Ethereum then entered the scene and allowed a full Turing-complete language (Solidity) as the script. Technically, it included the compiled EVM bytecode but for all intents and purposes, Ethereum contracts are mostly exclusively written in Solidity (And yes, I know about vyper).
The whole point of blockchains is to ensure that there is no central authority and hence no single point of failure. However, to achieve consistency and linearizability of data, these systems use Proof-of-Work based consensus systems. This is where computers make brute-force calculations to guess a number that can be hashed as per a constraint (often called a difficulty).
Since good hash algorithms ensure a uniform probability, you cannot apply any other algorithm other than brute-force hacking to solve this problem. Not only is it an absolute waste of energy, but is also not scalable.
Many people claim that this is what gives bitcoin its value which is more real than a fiat-based currency. This gets into an entire topic of monetary and fiscal policies and the role of central banks and Keynesian economics. That is not my area of expertise, so I will reserve my opinions on that.
Permissioned — Anti Blockchains
As blockchain captured people’s attention and imagination, enterprises were not far behind. And as it happens, the Hyperledger foundation was created and people started to talk about proof-of-authority based blockchains. Proof-of-authority based blockchains lose the de-centralized nature of blockchains and make a few nodes more equal than others. Hyperledger Fabric uses docker for isolating chain code which is essentially a normal binary.
This completely flies in the face of blockchains. I feel that most solutions solved by a proof of authority based blockchain can be easily solved with a well thought out distributed system with better databases and with far fewer overheads. Just my $0.02.
Changes driving new blockchains technologies
I have been quite skeptical about the current solutions. However, there are a few changes that have put blockchain back on my radar. Firstly, there is WASM. WebAssembly (abbreviated WASM) is a binary instruction format for a stack-based virtual machine. It is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.
It was created for the browser for porting applications and games written in C++ to be run in a sandboxed container. WebAssembly is not only extremely performant but can also be sandboxed very efficiently. Starting a WebAssembly based call should have overheads in terms of microseconds instead of milliseconds. I believe one of the first blockchains that I heard of using WASM for its contracts was EOS. But now, pretty much every new blockchain is using WASM for custom logic. Even HyperLedger has a project called Sawtooth that uses WASM-based smart contract library called Sabre.

Secondly, the community is evolving to support Proof-of-Stake based consensus using a variation of the Ouroboros Praos based consensus system. Again each blockchain calls it with different names but the basics are pretty much the same. Instead of miners, we now have block authors and validators. We also have some nodes acting to police these producers or authors and are incentivized to catch fraud. These block authors have strict uptime and block production frequency requirements based on the blockchain implementation. These blocks once produced can then be finalized using a practical byzantine fault tolerance-based algorithm.
While anybody can be a producer, you need modest hardware or cloud-based VMs and some technical expertise to be a block producer. This is not very different from Proof-of-Work based setups where you needed expertise and the money to produce and run specialized hardware like GPU racks or ASIC miners.
One can argue that Proof-of-Stake based networks are more democratized than current PoW based chains and have a much lower bar to entry. Most blockchains also support nominated or delegated Proof-of-Stake where people who hold these tokens or currency can nominate producers who have a good reputation on the network to operate these nodes and share rewards with them.
Finally, these blockchain networks have recognized that having a single chain is a bottleneck and most of these networks have mechanisms for candidate chains to participate and scale accordingly. Ethereum supports shards and Polkadot supports parachains/parathreads. While the exact economics of these sharded chains are still in flux, I’m confident that this is the right step forward to solve the scalability problem. This also means that enterprises can also target permissionless networks and benefit from having validators in the open ecosystem.
Hello Substrate
I almost picked Substrate on a whim but I am glad that I started with it. Here are my opinions as a developer about Substrate. I’ve been learning Rust for the past few months, so I was familiar, if not comfortable with Rust, when I started learning Substrate. But even then, I found a lot of proc-macro black magic with Substrate makes it a bit difficult to reason about. Also, I have been a distributed systems developer building microservices based backends and deploying them on cloud solutions like AWS or Google Cloud. Developing for the blockchain forces you to change your perspective.
By default, a blockchain built on Substrate is backed by RocksDB which then is used to store key-value pairs of a Patricia tree (similar to Ethereum). Insertion and Retrievals are both O(log n) operations. It uses WASM for its runtime but can compile WASM to native code via the cranelift compiler. It uses libp2p for its networking stack. It uses a variant of Byzantine Fault Tolerance called (pBFT) or practical byzantine fault tolerance where it can tolerate up to a third of its network being malicious or unavailable.
Most of the database handling and networking code is implemented natively while the block transition and state computation logic are implemented in web assembly. This allows substrate-based chains to be upgraded in flight for all logic relating to state transitions. We only need to update the binary, if we change anything that has to do with the database or network handling bits of the code.
Substrate offers a set of libraries and primitives called “FRAME”. A horrible name in my opinion. FRAME apparently stands for “Framework for Runtime Aggregation of Modularized Entities ”. It used to be called SRML before. It’s one of those acronyms where you are hunting for words to fit an acronym.
FRAME provides a set of primitives that are required for most blockchains. For example, it defines the concept of runtime and an associated Trait that the runtime has to satisfy. It then has a few proc_macro black magic incantations that can generate the runtime (construct_runtime). It also has a set of “modularized entities” called pallets. Pallet is a much better name than “Modularized Entity” and keeping with the cargo/crate theme. The support library in FRAME has more proc_macro magic to declare the runtime modules with decl_module, decl_storage, decl_error and decl_event. This expands to other structs like Call, Module etc.
The thing that threw me off first was the decl_storage macro. It was bizarre, and the Rust code I saw in the macro was similar enough to Rust but alien enough to utterly confuse me. It took my brain a while to start reading the code and to look at the code and how it is used in the runtime. To be fair, the amount of work the macro does to simplify development is impressive. I’ve come to understand that now. But I cannot say now if I truly find it amazing or if I am under the effect of the Stockholm syndrome. Love it or hate it, you’ll still need those macros to work with FRAME and Substrate. There is a newer pallet library that is getting released as a part of the next version of Substrate, that changes the current proc-macros to a more attribute style proc macros. This promises to reduce the peculiarity of the storage definition macro.
Another cool thing I like about Substrate is the idea of off-chain workers. It is quite evident that any blockchain logic should be completely deterministic. Its outputs should always be a function of the previous state and the inputs. However, the real world is quite a bit messier and has a nasty habit of having side effects. For example, if you’d like to send an email or make an HTTP call, you’d lose all guarantees of making the targeted block production time.
So to get around this, Substrate allows you to package such off-chain logic into an off-chain worker. If we did not have such a worker, we’d have to run an external service for such things and we’d invariably centralize such logic. Substrate allows such off-chain logic to be baked into the runtime. But as you recall, the runtime is compiled to WASM so you’d only be able to work with things that are exposed into the WASM context from the node.
Right now it has a bespoke HTTP client and some storage access logic. Since it runs within the blockchain runtime, we can bypass the RPC layer and can call the storage layer (rocks db abstraction layer) directly. As it stands today, the off chain worker is run by all validators on the network and other full nodes or archive nodes can choose to run them while starting the node. Such workers are run every block and multiple such workers may execute in parallel if the execution does not complete by the single block.
The default template project also includes a test harness by creating a mock runtime that allows us to write unit tests. This is quite nice. There are always Substrate’s own pallets to fall back on for reference.
Substrate flavored Rust
Since the runtime compiles down to WASM, we need to stick to a subset of Rust called the no_std Rust. As the name suggests, you do not get access to the standard library. That also means that you do not get access to allocators. That means anything that allocates on the heap like strings or vectors in the standard library is out of bounds. However, there is a sp_std library that provides common types like Vecand str that abstracts over the WASM memory. I expect that as the WASI types interface is standardized, we’d be able to write using standard library objects.
The trouble is that an average developer will find themselves in the idiosyncrasies of the no_std land rather quickly. If they do not understand the constraints with WASM and its memory model, it’d be quite frustrating to understand why things are the way they are.
The other weirdness with Substrate as I mentioned above is the syntax in the decl_storage macros. There are a lot of concerns covered by that macro. Any newcomer to Substrate regardless of his knowledge of Rust will find it a bit annoying at first. But there is a method to the madness and becomes significantly easier later.
In Conclusion
I’m planning to invest more in Substrate in 2021. We are planning to build operational tooling to set up a validator cluster for Substrate based chains and monitor it with Grafana. We are also planning to set up cloud-agnostic K3S based clusters for running observable substrate chains. Our goal is to have a working validator for Kusama, Rococo, and Polkadot. We’ll open source our Terraform and Kubernetes operator codebases to encourage others to run their own nodes using our setup.
We are also looking to work with interesting projects based on Substrate. Please feel free to get in touch with us if you’d like to work with us.
Hello Montreal! We’ll be at the Startupfest ’22 all three days, so drop by and say hi. We’d love to see what you’re building, and early birds get a free (limited) design audit or tech consultation. DM us on Twitter (@tarkalabs) and let’s talk.