How does Solana attempt to solve the 'downtime' problem?

Solana (Sol) is one of the few public chain projects that has experienced 'mainnet downtime' multiple times.

How to solve the downtime problem

Direction 1: Improve node stability

QUIC network replaces UDP: Enhances the reliability of communication between nodes, preventing node disconnection due to network packet loss.

Stricter leader node selection mechanism: Introduces leader failover mechanism; if the current leader cannot produce blocks, quickly switch to the next one.

Direction 2: Optimize parallel execution mechanism

Solana proposed parallel transaction execution (Sealevel). Although this architecture already exists, further optimization of the scheduling algorithm is needed to reduce resource conflicts.

Perform account conflict detection in advance (pre-execution phase), grouping transactions that may conflict to avoid deadlocks or blocking during execution.

Direction 3: Introduce partial modularity and independent design

For example, in Firedancer, a 'module separation' approach is proposed:

Decoupling the consensus layer and execution layer to prevent contract execution failures from impacting consensus.

Rewrite the node client to avoid historical bugs and performance bottlenecks in the original Rust client.

Technical Update 1: Firedancer Client

Developed by Jump Crypto, the Solana client is rewritten in C language.

Advantages:

Higher performance, lower latency.

Fewer memory leaks and stability issues.

After Firedancer goes live, it will provide a fault-tolerant backup client, allowing the network to switch to another implementation if the original client encounters issues.

Technical Update 2: Retry Mechanism and Transaction Review Optimization

In the new version of the client, an automatic transaction retry mechanism is added.

A malicious spam transaction detection and defense module is added, for example, a large number of invalid transaction attacks will be rate-limited.

Solana is currently still in the parallel phase of 'extreme performance' and 'stability refinement', but with the official launch of Firedancer and the strengthening of the leader replacement mechanism, the frequency of large-scale downtime events has significantly decreased.