Are Data Infrastructures Ready for the Era of Crypto Super Applications?

Written by: Story @IOSG
 
TL;DR
 
Data challenge: The competition for block times among high-performance public chains has entered the sub-second era. The increased demand for high concurrency, high traffic volatility, and multi-chain heterogeneity on the C-end has added complexity to the data side, necessitating that data infrastructure shifts to real-time incremental processing + dynamic scaling. Traditional batch processing ETL delays range from minutes to hours, making it difficult to meet real-time trading needs. Emerging solutions like The Graph, Nansen, and Pangea are introducing streaming computing, compressing delays to real-time tracking levels.
 
The paradigm shift in data competition: The previous cycle met the need for 'understandable'; this cycle emphasizes 'profitability.' Under the Bonding Curve model, a one-minute delay can lead to multiple cost differences. Tool iteration: from manually setting slippage → sniper bots → GMGN integrated terminals. The ability to trade on-chain is gradually being commoditized, and the core competitive frontier is shifting towards the data itself: those who can capture signals faster will help users profit.
 
The dimensional expansion of trading data: Meme is essentially the financialization of attention, keying in on narratives, attention, and subsequent dissemination. Closed loop of off-chain public sentiment × on-chain data: narrative tracking summaries and sentiment quantification become the core of trading. 'Underwater data': fund flow, role profiling, smart money/KOL address labeling, revealing the hidden competition behind anonymous on-chain addresses. The next generation of trading terminals will integrate on-chain and off-chain multidimensional signals to the millisecond, enhancing entry and risk judgment.
 
AI-driven executable signals: From information to profits. The new competitive goal: Fast enough, automated, and able to bring excess returns. LLM + multimodal AI can automatically extract decision signals and combine with Copy Trading, take profit and stop loss execution. Risks and challenges: Hallucinations, short signal lifespan, execution delays, and risk control. Balancing speed and accuracy, reinforcement learning and simulation backtesting are key.
 
The survival choice of data dashboards: Lightweight data aggregation/dashboard applications lack moats, compressing survival space. Downward: Deepening high-performance underlying pipelines and unified research and development. Upward: Extending to the application layer, directly engaging user scenarios, and enhancing data usage activity. Future track landscape: Either become the infrastructure for Web3 utilities or become the user platform of Crypto Bloomberg.
 
The moat is shifting towards 'executable signals' and 'underlying data capabilities.' The loop of long-tail assets and trading data presents a unique opportunity for crypto-native entrepreneurs. The opportunity window in the next 2–3 years:
 
Upstream infrastructure: Web2 level processing capabilities + Web3 native demand → Web3 Databricks/AWS.
Downstream execution platform: AI Agent + multidimensional data + seamless execution → Crypto Bloomberg Terminal.
 
Thanks to projects like Hubble AI, Space & Time, OKX DEX for their support of this research report!
 
Introduction: The triple resonance of Meme, high-performance public chains, and AI
 
During the previous cycle, the growth of on-chain transactions relied mainly on infrastructure iteration. Entering a new cycle, as infrastructure gradually matures, super applications represented by Pump.fun are becoming a new growth engine for the crypto industry. This type of asset issuance model, with a unified issuance mechanism and sophisticated liquidity design, has created a trading trench that is fair and generates myths of sudden wealth. The replicability of this high multiplier wealth effect is profoundly changing user yield expectations and trading habits. Users need not only faster entry opportunities but also the ability to acquire, parse, and execute multidimensional data in a very short time, while existing data infrastructure struggles to support such density and real-time demands.
 
This is accompanied by a higher-level demand for trading environments: lower friction, faster confirmations, deeper liquidity. Trading venues are rapidly migrating to high-performance public chains and Layer2 Rollups represented by Solana and Base. The transaction volume of these public chains has increased more than tenfold compared to the last round of Ethereum, posing more severe data performance challenges for existing data providers. With the imminent launch of new generation high-performance public chains like Monad and MegaETH, the demand for on-chain data processing and storage will exhibit exponential growth.
 
At the same time, the rapid maturity of AI is accelerating the realization of intelligent equity. GPT-5's intelligence has reached a doctoral level, and models like Gemini can easily understand K-lines... With the help of AI tools, previously complex trading signals can now be understood and executed by ordinary users. In this trend, traders are beginning to rely on AI for trading decisions, and AI trading decisions are inseparable from multidimensional, high-efficiency data. AI is evolving from 'auxiliary analysis tool' to 'trading decision hub,' and its popularity further amplifies the demands for real-time, explainable, and scalable processing of data.
 
Under the triple resonance of the Meme trading frenzy, the expansion of high-performance public chains, and the commercialization of AI, the on-chain ecosystem's demand for a new data infrastructure is becoming increasingly urgent.
 
Addressing the data challenge of 100,000 TPS and millisecond block times
 
With the rise of high-performance public chains and high-performance Rollups, the scale and speed of on-chain data have entered a new phase.
 
With the widespread adoption of high concurrency and low latency architectures, daily trading volumes easily surpass tens of millions, with raw data scales measured in hundreds of GB. For example, Solana has averaged over 1,200 TPS daily in the past 30 days, with daily transactions exceeding 100 million; on August 17, it even set a historical high of 107,664 TPS. According to statistics, Solana's ledger data is rapidly growing at a rate of 80-95 TB per year, equating to 210-260 GB daily.
 
▲ Chainspect, 30-day average TPS
 
▲ Chainspect, 30-day trading volume
 
Not only is throughput rising, but the block times of emerging public chains have also entered the millisecond level. The BNB Chain's Maxwell upgrade has shortened block times to 0.8 seconds, while the Base Chain's Flashblocks technology has compressed it to 200ms. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation times to 150ms, while the MegaETH mainnet aims for real-time blocks at 10ms. These breakthroughs in consensus and technology have significantly improved the real-time nature of transactions but have also placed unprecedented demands on block data synchronization and decoding capabilities.
 
However, downstream data infrastructure largely still relies on batch processing ETL pipelines, inevitably leading to data delays. For example, in Dune, contract interaction event data on Solana usually has a delay of about 5 minutes, while aggregated protocol layer data may take up to 1 hour to be available. This means that on-chain transactions that could have been confirmed in 400ms are delayed by hundreds of times before they are visible in analytical tools, which is almost unacceptable for real-time trading applications.
 
▲ Dune, Blockchain Freshness
 
To address the challenges on the supply side of data, some platforms have shifted to streaming and real-time architectures. The Graph utilizes Substreams and Firehose to compress data delays to near real-time. Nansen has achieved dozens of times performance improvement on Smart Alerts and real-time dashboards through the introduction of streaming processing technologies like ClickHouse. Pangea provides real-time streaming data for B-end clients such as market makers, quantitative analysts, and central limit order books (CLOBs) with delays of less than 100ms by aggregating computing, storage, and bandwidth from community nodes.
 
▲ Chainspect
 
In addition to the huge volume of data, on-chain transactions also exhibit a clear uneven distribution of traffic. Over the past year, Pumpfun's weekly trading volume has varied nearly 30 times from lowest to highest. In 2024, the Meme trading platform GMGN experienced six server 'overload' incidents within four days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After the migration, the system's horizontal scaling capacity and computational elasticity significantly improved, enhancing business agility by about 30%, which greatly alleviated pressure during trading peaks.
 
▲ Dune, Pumpfun Weekly Volume
 
▲ Odaily, TiDB's Web3 service case
 
The multi-chain ecosystem further exacerbates this complexity. Differences in log formats, event structures, and transaction fields across different public chains mean that each new chain requires customized parsing logic, significantly testing the flexibility and scalability of data infrastructure. Some data providers have thus adopted a 'customer-first' strategy: where there is active trading activity, they prioritize connecting to the services of that chain, weighing flexibility against scalability.
 
If, against the backdrop of high-performance chains, data processing remains at the stage of fixed interval batch processing ETL, it will face challenges of delay accumulation, decoding bottlenecks, and query latency, unable to meet the demands for real-time, fine-grained, and dynamic interactive data consumption. Therefore, on-chain data infrastructure must further evolve to streaming incremental processing and real-time computing architectures, while also incorporating load balancing mechanisms to cope with the concurrency pressures brought by periodic trading peaks in the crypto circle. This is not only a natural extension of the technical path but also a key aspect to ensure the stability of real-time queries, and it will form a true watershed in the competition of the new generation of on-chain data platforms.
 
Speed is wealth: The paradigm shift in on-chain data competition
 
The core proposition of on-chain data has shifted from 'visualization' to 'executability.' In the last cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors to 'understand,' as people used SQL charts to stitch together on-chain narratives.
 
GameFi and DeFi players rely on Dune to track fund inflows and outflows, calculate mining yields, and timely withdraw before market turning points.
NFT players analyze transaction volume trends, whale holdings, and distribution characteristics using Dune to predict market heat.
 
However, in this cycle, Meme players are the most active consumer group. They have driven the phenomenal application Pump.fun to accumulate revenues of $700 million, nearly twice the total revenue of the previous cycle's leading consumer application Opensea.
 
In the Meme track, the market's time sensitivity has been amplified to the extreme. Speed is no longer a luxury; it is the core variable determining profit and loss. In the primary market priced under a Bonding Curve, speed equals cost. Token prices rise exponentially with buying demand; even a one-minute delay can result in multiple differences in entry costs. According to Multicoin's research, the most profitable players in this game often have to pay a 10% slippage to enter blocks three points ahead of competitors. The wealth effect and 'get-rich-quick myth' drive players to chase millisecond K-lines, same-block trading execution engines, and one-stop decision panels, competing on information gathering and order speed.
 
▲ Binance
 
In the manual trading era of Uniswap, users had to set slippage and gas themselves, and the front end could not see prices; trading was more like 'buying a lottery ticket.' In the sniper bot era of BananaGun, automatic sniping and slippage technology allowed retail players to stand on the same starting line as scientists. Then, in the PepeBoost era, bots not only pushed pool opening information instantly but also simultaneously pushed front-row holding data; finally, evolving into the current GMGN era, which has created an integrated terminal combining K-line information, multidimensional data analysis, and trading execution, becoming the 'Bloomberg Terminal' of meme trading.
 
As trading tools continue to iterate, execution thresholds gradually dissolve, and the competitive frontier inevitably shifts towards the data itself: those who can capture signals faster and more accurately will establish trading advantages in rapidly changing markets and help users profit.
 
Dimensions equal advantages: The truth beyond K-lines
 
The essence of Memecoin is the financialization of attention. Quality narratives can continuously break the circle, aggregating attention, thus boosting prices and market values. For Meme traders, while real-time performance is crucial, to achieve significant results, it is even more critical to answer three questions: What is the narrative of this token? Who is paying attention? How will attention continue to amplify in the future? These leave traces only on the K-line; the real driving force relies on multidimensional data—off-chain public sentiment, on-chain addresses, and holding structures, as well as the precise mapping of the two.
 
On-chain × Off-chain: From attention to transaction closed-loop
 
Users attract attention off-chain and complete transactions on-chain; the closed-loop data of both is becoming the core advantage of Meme trading.
 
#Narrative Tracking and Chain Identification
 
On social platforms like Twitter, tools such as XHunt can help Meme players analyze the KOL follow lists of projects to judge the related persons and potential attention dissemination chains behind the projects. 6551 DEX generates complete, real-time AI reports for traders by aggregating Twitter, official websites, tweet comments, order records, KOL follow-ups, etc., helping traders accurately capture narratives.
 
#Sentiment Indicator Quantification
 
Infofi tools like Kaito and Cookie.fun aggregate content and perform sentiment analysis on Crypto Twitter, providing quantifiable metrics for Mindshare, Sentiment, and Influence. For instance, Cookie.fun directly overlays these two metrics onto price charts, transforming off-chain sentiment into readable 'technical indicators.'
 
▲ Cookie.fun
#On-chain and Off-chain are equally important
 
OKX DEX displays Vibes analysis alongside market data, aggregating KOL shout-out timestamps, leading related KOLs, Narrative Summaries, and comprehensive scores, shortening the time for off-chain information retrieval. Narrative Summary has become the best-received AI product feature among users.
 
Underwater data display: Transforming 'visible ledgers' into 'usable Alpha'
 
The order flow data of traditional finance is held by large brokers, and quantitative firms must pay hundreds of millions of dollars annually to obtain it for optimizing trading strategies. In contrast, the transaction ledger of Crypto is completely open and transparent, equivalent to 'open-sourcing' high-priced intelligence, forming an open-pit gold mine waiting to be mined.
 
The value of underwater data lies in extracting invisible intentions from visible transactions. This includes the flow of funds and role characterization—whether the dealer is accumulating or distributing clues, KOL sub-accounts, concentrated or dispersed chips, bundled trading (bundles), and abnormal capital flows; it also includes address profiling linkage—labeling each address with tags such as smart money, KOL/VC, developers, phishing, and wash trading, and binding them with off-chain identities, connecting on-chain and off-chain data.
 
These signals are often difficult for ordinary users to detect but can significantly influence short-term market trends. By real-time parsing of address labels, holding characteristics, and bundled transactions, trading assistance tools are revealing the competitive dynamics 'beneath the surface,' helping traders avoid risks and seek alpha in millisecond market conditions.
 
For example, GMGN has further integrated smart money, KOL/VC addresses, developer wallets, wash trading, phishing addresses, and bundled transactions on top of on-chain real-time trading and token contract data collection, mapping on-chain addresses with social media accounts, aligning capital flow, risk signals, and price behavior to the millisecond, helping users make faster entry and risk judgment.
 
▲ GMGN
 
AI-driven executable signals: From information to profits
 
'In the next round of AI, what is sold is not tools, but returns.' — Sequoia Capital
 
This judgment also holds in the Crypto Trading field. When both the speed and dimensions of data meet standards, the subsequent competitive goal becomes whether multidimensional complex data can be directly transformed into executable trading signals in the data decision-making process. The evaluation criteria for data decision-making can be summarized in three points: fast enough, automated, and excess returns.
 
Fast enough: As AI capabilities continue to improve, the advantages of natural language and multimodal LLM will gradually unfold here. They can not only integrate and understand vast amounts of data but also establish semantic connections between data, automatically extracting decisive conclusions. In the high-intensity, low-trading-depth trading environment on-chain, each signal has a very short timeliness and capital capacity; speed directly affects the returns that signals can generate.
 
Automation: Humans cannot monitor trades 24 hours a day, but AI can. For example, users can place Copy Trading buy orders with take profit and stop loss conditions on the Senpi platform. This requires AI to perform polling or monitoring on data in real-time in the background and automatically decide to place an order when it detects a suggestion signal.
 
Returns: Ultimately, the effectiveness of any trading signal depends on whether it can continuously deliver excess returns. AI must not only have sufficient understanding of on-chain signals but also integrate risk control to maximize risk-return ratios in highly volatile environments. For example, considering slippage losses, execution delays, and other unique factors affecting on-chain return rates.
 
This capability is reshaping the commercial logic of data platforms: from selling 'data access rights' to selling 'profit-driven signals.' The competitive focus of the next generation of tools will no longer be data coverage but the executability of signals—whether they can truly complete the last mile from 'insight' to 'execution.'
Some emerging projects have begun exploring this direction. For example, Truenorth, as an AI-driven discovery engine, incorporates 'decision execution rates' into the evaluation of information validity, continuously optimizing output through reinforcement learning to minimize ineffective noise, helping users build directly executable information streams for orders.
 
▲ Truenorth
 
Although AI has great potential in generating executable signals, it also faces multiple challenges.
 
Hallucinations: On-chain data is highly heterogeneous and noisy. LLMs can easily experience 'hallucinations' or overfitting when parsing natural language queries or multimodal signals, affecting signal yield and accuracy. For example, for multiple tokens with the same name, AI often fails to find the corresponding contract address for the CT ticker. Similarly, many AI signal products often misattribute discussions of AI in CT to Sleepless AI.
 
Signal lifespan: The trading environment is ever-changing. Any delay will erode returns; AI must complete data extraction, inference, and execution in a very short time. Even the simplest Copy Trading strategy, if not aligned with smart money, can turn profitable to negative.
 
Risk control: In highly volatile scenarios, if AI continuously fails to go on-chain or experiences excessive slippage, it may not only fail to deliver excess returns but could also deplete all capital within minutes.
 
Therefore, finding a balance between speed and accuracy, and using mechanisms such as reinforcement learning, transfer learning, and simulation backtesting to reduce error rates, is a competitive point for AI in this field.
 
Upward or downward? The survival choice of data dashboards
 
As AI can directly generate executable signals or even assist in placing orders, the 'light intermediate layer applications' that rely solely on data aggregation are facing an existential crisis. Whether assembling on-chain data into dashboard tools or layering execution logic on top of aggregation trading bots, these fundamentally lack sustainable moats. In the past, such tools could sustain themselves through convenience or user mentality (for example, users were accustomed to checking token CTO conditions on Dexscreener); but now, as the same data is available in many places, execution engines become increasingly commoditized, and AI can directly generate decision signals and trigger execution on the same data, their competitiveness is rapidly diluted.
 
In the future, efficient on-chain execution engines will continue to mature, further lowering trading thresholds. In this trend, data providers must make a choice: either go downward, focusing on faster data acquisition and processing infrastructure; or go upward, extending to the application layer to directly control user scenarios and consumption traffic. Those stuck in the middle, only doing data aggregation and lightweight packaging, will see their survival space continuously squeezed.
 
Downward means building a moat of infrastructure. Hubble AI, in developing trading products, realized that relying solely on TG Bot could not form long-term advantages, so it shifted to upstream data processing, aiming to create 'Crypto Databricks.' After optimizing the data processing speed for Solana to the extreme, Hubble AI is transitioning from data processing to a unified platform for research and development, occupying a position upstream in the value chain, providing underlying support for the U.S. 'financial on-chain' narrative and the data needs of on-chain AI agent applications.
 
Upward means extending to application scenarios, locking in end users. Space and Time initially focused on sub-second SQL indexing and oracle pushing, but has recently begun exploring C-end consumption scenarios, launching Dream.Space on Ethereum—a 'vibe coding' product. Users can naturally write smart contracts or generate data analysis dashboards. This transformation not only increases the calling frequency of its own data services but also creates direct stickiness with users through terminal experience.
 
It can be seen that roles stuck in the middle, relying solely on selling data interfaces, are losing their space for survival. The future B2B2C data track will be dominated by two types of players: one type controls the underlying pipeline and becomes the 'on-chain utilities' infrastructure company; the other type is close to user decision scenarios, transforming data into application experiences.
 
Summary
 
In the threefold resonance of the Meme frenzy, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data track is undergoing a structural shift. Iteration of trading speed, data dimensions, and execution signals has made 'visible charts' no longer the core competitiveness; the real moat is shifting towards 'executable signals that can help users make money' and 'the underlying data capabilities that support all of this.'
 
In the next 2–3 years, the most attractive entrepreneurial opportunities in the field of crypto data will emerge at the intersection of Web2 level infrastructure maturity and Web3 native execution models.
 
Conversely, the data of Meme coins and long-tail on-chain assets exhibits extremely high non-standardization and fragmentation characteristics—from community narratives, on-chain public opinion to cross-chain liquidity. This information needs to be interpreted in conjunction with on-chain address profiling, off-chain social signals, and even millisecond trading execution. It is precisely under this difference that the processing and trading loop of long-tail assets and Meme data constitutes a unique opportunity window for crypto-native entrepreneurs.
We are optimistic about projects deeply cultivating in the following two directions:
 
Upstream infrastructure — on-chain data companies with streaming data pipelines comparable to Web2 giants, ultra-low latency indexing, and cross-chain unified parsing frameworks. These projects are expected to become the Web3 version of Databricks/AWS. As users gradually migrate on-chain, transaction volumes are expected to grow exponentially, and the B2B2C model has long-term compounding value.
 
Downstream execution platform — applications integrating multidimensional data, AI Agent, and seamless trading execution. By transforming fragmented signals on-chain/off-chain into directly executable trades, these products have the potential to become the Crypto-native Bloomberg Terminal, with their business model no longer relying on data access fees, but monetizing through excess returns and signal delivery.
 
We believe that these two types of players will dominate the next generation of crypto data tracks and build sustainable competitive advantages.
Are Data Infrastructures Ready for the Era of Crypto Super Applications?

Explore More From Creator

Latest News

Are Data Infrastructures Ready for the Era of Crypto Super Applications?

Explore More From Creator

Latest News

Trending Articles