Development
Mintlayer Moves Off Substrate
2022-02-01
In view of irreconcilable incompatibility between Substrate and Mintlayer’s long-term goals, Mintlayer has decided to port its existing feature base off Substrate and on to its own node written from scratch. In this article, we explain the rationale behind this decision in fair detail.
We understand that Substrate is a complex and very advanced piece of software; here we present our educated opinion, based on months of research and development, as to why, to the best of our understanding, Substrate is unsuitable for our needs.
Finality, what is it?
Users of any financial system want to know when their transactions are confirmed. Having no central authority, blockchains must rely on some mechanism for the parties involved in a transaction to consider it final. The Bitcoin protocol relies on probabilistic finality. Let us briefly explain what this means.
Probabilistic Finality
In Bitcoin, members of the network, known as miners, listen for transactions on the network and bundle them into proposed blocks. Miners compete against each other to find a solution to a specific cryptographic puzzle involving their proposed block. This process is called mining a block. A miner who has successfully mined a block publishes their solution to the network, causing other nodes to update their internal state. This entails appending the newly mined block to their local copy of the blockchain, as well as updating a local database of coin owners that the node software uses to process incoming transactions and blocks. In technical terms, we call this “updating the blockchain state”; the newly appended block then becomes the tip of the node’s main chain.
However, there is a nuance that needs to be addressed. Since miners are independent actors, different miners can mine different blocks on top of the same initial block. Because of network synchronization issues such as latency, a node will often observe “branches” in the state of the network:
How does a node know which state to regard as the true state of the network? When branching occurs and there are chains of equal lengths, nodes listen for more blocks until one of the chains “wins.” An arbitrary rule in Bitcoin determines that the winner is always the longest chain, which is, by definition, the one with the most accumulated work. This is called a selection rule, or, in Substrate terminology, a fork choice rule.
Consequently, if a node sees a branch on the network that is longer than the branch it chose earlier, it will regard this new longer branch as the true state of the network, reverting the blocks on its own branch, and applying the transactions in the new chain’s blocks. In blockchain terminology, the process of switching branches to a new main chain is called “reorganization,” or “reorg.”
Besides network synchronization issues, reorgs can also happen for other reasons. For instance, a set of nodes can become isolated from the main network by attackers or government-mandated firewalls. If this happens, these nodes will sync to the “true” longest chain once they are made aware of it according to the selection rules.
As another example, consider the infamous 51% attacks, where an attacker controls 51% or more of the network’s computing power. Such an attacker can mine blocks without broadcasting them, while at the same time spending money on the publicly known chain. Once they complete their public transactions, the attacker broadcasts their secretly-mined blocks, double-spending the funds they already spent on the original chain. Thus, the attacker obtains the products they purchased via their transactions while cheating the merchant of their payment.
Since it is always possible that a new and different longer chain will be published, transactions are technically never finalized. However, the computationally-intensive nature of Proof-of-Work means that each block mined on top of a given block serves as a confirmation of that block’s validity. The more confirmations a block has, the harder it is to overwrite by a competing branch. This is known as probabilistic finality, since the probability of a transaction or block being reverted decreases as more blocks are mined on top of it. It is then up to the trade parties to agree on how many block confirmations are required for the transaction to be considered “finalized.”
While probabilistic finality might sound like an undesirable “feature,” it is rooted in the nature of decentralized systems and boils down to the science behind blockchains. For example, the famous CAP theorem states that a system cannot be distributed, consistent and available at the same time. Thus, attempts to avoid probabilistic finality will invariably involve trade-offs. These include centralization, violation of selection rules, and potential contentious chain splits (as in the example of government firewalls isolating a network). In any reasonable blockchain, there must always be an unbiased and fair way to reconcile a network that has been split for any reason.
Accepting the inevitability of probabilistic finality, Mintlayer is focusing its efforts on realistic solutions to blockchain congestion issues, such as second layers.
What makes reorgs possible in a blockchain?
A node’s ability to select a new longest chain as the true state of the blockchain depends on its ability to revert blocks, which in turn relies on the chain’s transactions having a simple and well-defined structure. In Bitcoin, for example, a transaction consists of inputs and outputs, where inputs represent a user’s coins that they wish to spend, and outputs represent the destinations of these coins. When a block updates the state of the blockchain, inputs are marked as spent, while outputs are created and become available for spending by the recipient. Under this model, reverting a block is as easy as marking all of its transaction inputs as unspent and deleting all outputs. In the same way, blockchains such as Ethereum and Monero rely on the simplicity of their transaction structure for the ability to revert blocks.
Reorgs in Substrate
We established previously that blocks, and hence the transactions within them, modify the node’s local database when they become part of the main chain. Substrate transactions, by design, can update the database in an arbitrary manner, as chosen by the developer building on the Substrate framework. However, Substrate’s database interface does not provide a mechanism for reverting such arbitrary transaction changes. This is contrary to Bitcoin and Ethereum, for example, where transactions have a specific, well-defined, effect on the database, making them reversible. So how does Substrate handle reorganization?
Deterministic Finality and Canonicalization
The core issue leading to Mintlayer’s decision to cease building on Substrate is the following:
Substrate assumes that in every possible blockchain built with it, using any consensus system, blocks will eventually be deterministically finalized - that is to say, irreversible .
A block in Substrate is called finalized if enough nodes in the network agree that it will never change in the future. From a technical point of view, all candidate non-finalized blocks are stored in memory. When a block is finalized, all competing candidates are erased from memory. Since transactions are irreversible, this means that finalized blocks are also irreversible. This process is called canonicalization in Substrate terminology. Canonicalization differs from finalization in that finalization is a consensus-related concept, while canonicalization is a technical database concept.
Non-finalized blocks held in memory before canonicalization.
Although the depth for non-finalized blocks can be configured, it’s not practically possible to keep the entire history of block candidates in memory. Therefore, Substrate can decide to canonicalize blocks that are not even finalized to prevent memory exhaustion.
This behavior directly conflicts with the principles of a decentralized blockchain. For example, although Substrate advertises a Proof-of-Work pallet (a pallet is a module in Substrate), it is impossible to create a true Proof-of-Work system with Substrate because the canonicalization process interferes with the selection rules, making reorgs technically impossible. For this reason, building on Substrate would render it impossible for Mintlayer to become a side-chain to Bitcoin.
Substrate’s documentation on finality and canonicalization is sparse, and it took a great deal of time and energy on the part of the Mintlayer team, as well as community outreach, to arrive at only a partial understanding of how Substrate treats these notions. Even after this investigation, it is still not clear to us how a Substrate-based system should behave in the face of a long chain split. Should the system resync from the genesis block every time there is a long reorg? Should nodes consider each other malicious and ban each other? Even if Mintlayer were to test this scenario, the outcome would be of no use since the intended behavior of the system is not documented anywhere.
Secondary Issues
Although the assumptions regarding block finality were the main catalyst for our decision to move away from Substrate, a number of other considerations served to reinforce this decision. As mentioned previously, we have found Substrate’s documentation sparse in certain areas. Also, Mintlayer aims to be runnable on any hardware platform, but Substrate’s support for ARM is limited. In addition, Windows is not well-supported. Another issue our team encountered was Substrate’s dependence on RocksDB, which resulted in crashes on node shutdown that severely impaired our testing. Although Polkadot’s ParityDB is said to resolve these issues, it is explicitly labeled as an experimental piece of software: according to Parity, “ParityDB is still in development and should not be used in production.” Finally, removing Substrate as a dependency ensures Mintlayer will not be impacted by any future decision by Substrate which is not in line with our goals.
The Future of Minlayer
In an effort to maintain its existing Substrate-based codebase, the Mintlayer team investigated the possibility of removing Substrate’s canonicalization feature. However, an exploration of the source code revealed that canonicalization is ingrained into every interaction with Substrate’s database layer, resulting in a practically unbounded number of changes required to remove it. Discussions with members of the Substrate team confirmed this endeavor to be undetermined in scope.
After expending significant time and energy investigating the above issues without arriving at satisfactory answers, the Mintlayer team has realized that it is simply not practical to continue building Mintlayer on Substrate. Substrate’s fundamental assumptions about finality, seemingly incompatible with the principles of a decentralized blockchain, together with a lack of documentation and limited availability of community support, indicate that it is in Mintlayer’s best long-term interest to build its own infrastructure.
Bonus Content
In this part, we delve into further technical details related to our decision.
Does Proof-of-Stake justify deterministic finality?
For Mintlayer, the primary issue with Substrate is this: despite advertising that finality can be opted out of by removing the finality gadget (e.g. GRANDPA), the canonicalization mechanism makes it impossible to have probabilistic finality.
Mintlayer understands that finality is sometimes used in Proof-of-Stake systems to protect against vulnerabilities such as long-range attacks. A typical and accepted solution to this problem is manual checkpointing: the developers add a table of block IDs of the main branch blocks at specific heights, ensuring that the correct chain will always be selected when syncing. The usefulness of checkpointing is not restricted to Proof-of-Stake. In fact, checkpointing is often used in Proof-of-Work blockchains to future-proof against long-range attacks in case of a compromised Proof-of-Work mining algorithm. Some prominent examples are Bitcoin Core, which offers optional support for checkpointing, and Monero.
The problem of long-range attacks is much worse in Proof-of-Stake systems. In theory, a validator owning a significant fraction of the total supply of coins could rebuild the chain over a long period of history, similar to an “easy” 51% attack.
It seems the reason many communities use manual checkpointing rather than automated checkpointing is the absence of a smart enough algorithm for the latter. On one hand, a system finalizing blocks after a fixed number of confirmations and creating checkpoints automatically risks chain splits if the number of confirmations is too low. The cost of a potential chain split is extremely high: namely, destroying the reputation of the blockchain. On the other hand, a number that is too high defeats the purpose of the mechanism. Ultimately, no one knows how to achieve this safely, and any attempt at automated checkpointing or finalization is inherently optimistic in nature.
Archive mode and disabling pruning
Given that blocks are non-reversible, Substrate offers the option to store all serialized database operations required to reverse a block in an “archive node” option. According to Polkadot’s documentation, as measured on the Kusama chain, this can take up to 20 GB of storage for 1.6 million blocks. This amounts to just 4 months of data, with a 6-second block time, in a relatively new blockchain with mostly empty blocks. Compared to simply storing blocks, this represents a dramatic increase in storage requirements, potentially by orders of magnitude (for perspective, the Bitcoin blockchain is 380 GB in size after 12 years of operation with predominantly full blocks). This approach is not feasible for normal users, and thus is impractical as a solution.
Although Substrate allows the user to configure the “depth” of pruning to keep blocks, this does not sound like a reasonable decision to the Mintlayer team. A node operator should not have control over consensus-related decisions.