Are data infrastructures ready in the era of crypto super applications?

Author: Story, IOSG Ventures
 
TL;DR
Data challenge: The competition for block time among high-performance public chains has entered the sub-second era. The increase in high concurrency, high traffic volatility, and multi-chain heterogeneous demands has added complexity to the data side, necessitating a shift in data infrastructure towards real-time incremental processing + dynamic scaling. Traditional batch processing ETL has delays ranging from minutes to hours, making it difficult to meet real-time trading needs. Emerging solutions like The Graph, Nansen, and Pangea introduce stream processing, compressing delays to near real-time tracking levels.
The paradigm shift in data competition: The last cycle met the need to 'understand'; this cycle emphasizes the ability to 'make money'. Under the Bonding Curve model, a one-minute delay can result in several times the cost. Tool iteration: from manually setting slippage → sniper bots → GMGN integrated terminals. The ability to execute trades on-chain is gradually being commoditized, with the core competitive frontier shifting towards the data itself: those who can capture signals faster will help users profit.
Expansion of trading data dimensions: Meme is essentially the financialization of attention, focusing on narrative, attention, and subsequent dissemination. The closed loop of off-chain public opinion × on-chain data: narrative tracking summary and sentiment quantification become the core of trading. 'Underwater data': capital flow, role profiling, smart money/KOL address tagging reveals the implicit games behind on-chain anonymous addresses. The new generation of trading terminals will integrate multi-dimensional signals from on-chain and off-chain to the second level, enhancing entry and risk avoidance judgments.
AI-driven executable signals: From information to profit. New stage competitive goals: Fast enough, automated, capable of generating excess returns. LLM + multi-modal AI can automatically extract decision signals and combine with Copy Trading, take profit, and stop-loss execution. Risk challenges: hallucinations, short signal lifespan, execution delays, and risk control. Balancing speed and accuracy is key, and reinforcement learning and simulation backtesting are crucial.
The survival choice of data dashboards: Lightweight data aggregation/dashboard applications lack moats, and their survival space is being compressed. Downward: Delving into high-performance underlying pipelines and integrated data research. Upward: Extending to the application layer, directly addressing user scenarios and increasing data utilization activity. Future track landscape: Either become the infrastructure of Web3's water, electricity, and coal, or become the user platform of Crypto Bloomberg.
The moat is shifting toward 'executable signals' and 'underlying data capabilities,' with the closed loop of long-tail assets and trading data representing a unique opportunity for crypto-native entrepreneurs. The opportunity window in the next 2-3 years:
Upstream infrastructure: Web2-level processing capabilities + Web3 native needs → Web3 Databricks/AWS.
Downstream execution platform: AI Agent + multi-dimensional data + seamless execution → Crypto Bloomberg Terminal.
Thanks to projects like Hubble AI, Space & Time, OKX DEX for their support of this research report!
Introduction: The triple resonance of Meme, high-performance public chains, and AI.
In the previous cycle, the growth of on-chain trading mainly relied on infrastructure iteration. Entering a new cycle, as infrastructure gradually matures, super applications represented by Pump.fun are becoming the new growth engine of the crypto industry. This asset issuance model, with a unified issuance mechanism and sophisticated liquidity design, shapes a fair and original trading trench where myths of sudden wealth frequently emerge. The replicability of this high multiple wealth effect is profoundly changing users' profit expectations and trading habits. Users need not only faster entry opportunities but also the ability to acquire, analyze, and execute multi-dimensional data in a very short time, while existing data infrastructure struggles to support such density and real-time demands.
This brings about a higher-level demand for trading environments: lower friction, faster confirmations, and deeper liquidity. Trading venues are rapidly migrating to high-performance public chains and Layer 2 Rollups represented by Solana and Base. The trading data volume of these public chains has increased more than tenfold compared to the previous Ethereum cycle, presenting more severe data performance challenges for existing data providers. With the imminent launch of new generation high-performance public chains like Monad and MegaETH, on-chain data processing and storage demands will grow exponentially.
At the same time, the rapid maturation of AI is accelerating the realization of intelligent equality. The intelligence of GPT-5 has reached a doctoral level, and multi-modal large models like Gemini can easily understand K-lines... With AI tools, formerly complex trading signals can now be understood and executed by ordinary users. In this trend, traders are beginning to rely on AI for trading decisions, and AI trading decisions are inseparable from multi-dimensional, high-effectiveness data. AI is evolving from 'auxiliary analysis tools' to 'central hubs for trading decisions,' and its popularization further amplifies the demand for real-time data, interpretability, and scalable processing.
Under the triple resonance of the meme trading frenzy, the expansion of high-performance public chains, and the commercialization of AI, the demand for a new data infrastructure in the on-chain ecosystem is becoming increasingly urgent.
Meeting the data challenges of 100,000 TPS and millisecond block times
With the rise of high-performance public chains and high-performance rollups, the scale and speed of on-chain data have entered a new phase.
With the widespread adoption of high concurrency and low-latency architectures, daily trading volumes easily surpass ten million transactions, with raw data sizes measured in hundreds of GB. For example, Solana's average TPS has exceeded 1,200 over the past 30 days, with daily transactions exceeding 100 million; on August 17, it even set a historical high of 107,664 TPS. Statistics show that Solana's ledger data is growing rapidly at a rate of 80-95 TB per year, translating to 210-260 GB per day.
▲ Chainspect, 30-day average TPS
▲ Chainspect, 30-day trading volume
Not only has throughput increased, but the block time of emerging public chains has also entered the millisecond level. The Maxwell upgrade of BNB Chain has reduced the block time to 0.8s, while Base Chain's Flashblocks technology compresses it to 200ms. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation time to 150ms, while MegaETH's mainnet aims for real-time block generation at 10ms. These breakthroughs in consensus and technology have greatly improved the real-time nature of trading but have also posed unprecedented demands on block data synchronization and decoding capabilities.
However, most downstream data infrastructures still rely on batch processing ETL pipelines, inevitably facing data delays. For instance, in Dune, contract interaction event data on Solana is typically delayed by about 5 minutes, while protocol layer aggregation data may wait up to 1 hour. This means that a transaction that could be confirmed on-chain in 400ms faces hundreds of times delay before it becomes visible in analysis tools, which is nearly unacceptable for real-time trading applications.
▲ Dune, Blockchain Freshness
To address challenges on the data supply side, some platforms have shifted to streaming and real-time architectures. The Graph compresses data delays to near real-time using Substreams and Firehose. Nansen has achieved dozens of times performance improvement on Smart Alerts and real-time dashboards by introducing streaming processing technologies like ClickHouse. Pangea aggregates computing, storage, and bandwidth provided by community nodes to offer real-time streaming data with less than 100ms delay to B-end users like market makers, quantitative analysts, and central limit order books (Clobs).
▲ Chainspect
In addition to the large data volume, on-chain trading also presents significant uneven flow distribution. Over the past year, Pumpfun's weekly trading volume varied by nearly 30 times from lowest to highest. In 2024, the meme trading platform GMGN experienced six server 'crashes' within four days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After migration, the system's horizontal scalability and computational elasticity were significantly improved, and business agility increased by about 30%, greatly relieving the pressure during peak trading times.
▲ Dune, Pumpfun Weekly Volume
▲ Odaily, TiDB's Web3 service case
The multi-chain ecosystem further exacerbates this complexity. Differences in log formats, event structures, and transaction fields among different public chains mean that each new chain requires customized parsing logic, greatly testing the flexibility and scalability of data infrastructure. Some data providers have therefore adopted a 'customer-first' strategy: wherever there is active trading activity, they will prioritize integrating that chain's services, weighing trade-offs between flexibility and scalability.
If, in the context of high-performance chains, data processing remains at the fixed interval model of batch processing ETL, it will face the dilemmas of delay backlog, decoding bottlenecks, and query lags, failing to meet the demand for real-time, refined, and dynamic interactive data consumption. Therefore, on-chain data infrastructure must further evolve to stream incremental processing and real-time computing architecture, while also incorporating load balancing mechanisms to cope with the concurrency pressure brought by periodic trading peaks in the crypto market. This is not only a natural extension of the technical path but also a key link in ensuring the stability of real-time queries, which will form a true watershed in the competition of the next generation of on-chain data platforms.
Speed equals wealth: The paradigm shift in on-chain data competition
The core proposition of on-chain data has shifted from 'visualization' to 'executability'. In the previous cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors who 'could understand' by using SQL charts to stitch together on-chain narratives.
GameFi and DeFi players rely on Dune to track capital inflows and outflows, calculate yield rates, and retreat promptly before market turning points.
NFT players analyze transaction volume trends, whale holdings, and distribution characteristics through Dune to predict market heat.
However, in this cycle, meme players are the most active consumer group. They have driven the phenomenal application Pump.fun to accumulate $700 million in revenue, nearly twice the total revenue of the previous cycle's leading consumer application, Opensea.
In the meme track, the market's time sensitivity is amplified to the extreme. Speed is no longer just a bonus but the core variable determining profit and loss. In the primary market priced by Bonding Curve, speed equals cost. Token prices surge exponentially with buying demand; even a one-minute delay can result in several times the entry cost. According to Multicoin research, the most profitable players in this game often have to pay a 10% slippage to enter the block three points ahead of competitors. The wealth effect and 'get-rich-quick myth' drive players to chase second-level K-lines, same-block trading execution engines, and one-stop decision panels, competing in information collection and order placement speeds.
▲ Binance
In the manual trading era of Uniswap, users had to set slippage and gas themselves, and the front end couldn't see prices; trading felt more like 'lottery'. In the bot-hunting era of BananaGun, automatic sniping and slippage techniques allowed retail players to compete on the same starting line with scientists. Then came the PepeBoost era, where bots pushed pool opening information at the first moment while also synchronously pushing front row holding data; finally evolving into today's GMGN era, which integrates K-line information, multi-dimensional data analysis, and trading execution into one terminal, becoming the 'Bloomberg Terminal' of meme trading.
As trading tools continue to iterate, execution barriers gradually dissolve, and the competitive frontier inevitably shifts toward the data itself: those who can capture signals faster and more accurately will establish trading advantages in the rapidly changing market, helping users profit.
Dimensions as advantages: The truth beyond K-lines
The essence of Memecoin is the financialization of attention. Quality narratives can continuously break barriers, aggregate attention, thereby pushing up prices and market values. For meme traders, while real-time nature is important, achieving significant results hinges on answering three questions: what is the narrative of this token, who is paying attention, and how will that attention continue to amplify in the future? These only leave shadows on the K-line; the real driving forces rely on multi-dimensional data—off-chain public opinion, on-chain addresses, and holding structures, and the precise mapping of both.
On-chain × off-chain: The closed loop from attention to execution
Users attract attention off-chain and complete transactions on-chain, with the closed-loop data becoming the core advantage of Meme trading.
# Narrative tracking and transmission chain identification
On social platforms like Twitter, tools such as XHunt can help meme players analyze project KOL attention lists to determine the associated individuals behind projects and potential attention dissemination chains. 6551 DEX generates complete, real-time AI reports for traders by aggregating Twitter, official websites, tweet comments, issuance records, KOL attention, etc., helping traders precisely capture narratives.
# Quantifying sentiment indicators
Tools like Kaito and Cookie.fun aggregate content and sentiment analysis on Crypto Twitter, providing quantifiable metrics such as Mindshare, Sentiment, and Influence. For example, Cookie.fun directly overlays these two metrics onto price charts, turning off-chain sentiment into readable 'technical indicators'.
▲ Cookie.fun
# On-chain and off-chain are equally important
OKX DEX displays Vibes analysis alongside market conditions, aggregating KOL shout-out timestamps, top associated KOLs, Narrative Summary, and comprehensive scores, reducing off-chain information retrieval time. Narrative Summary has already become the most well-received AI product feature among users.
Underwater data display: Turning 'visible ledgers' into 'usable Alpha'
Order flow data in traditional finance is held by large brokers, who must pay hundreds of millions of dollars each year to optimize trading strategies. In contrast, the trading ledger in crypto is completely open and transparent, essentially 'open-sourcing' high-priced intelligence, forming an open-pit gold mine waiting to be mined.
The value of underwater data lies in extracting invisible intentions from visible transactions. This includes capital flow and role characterization—whether the market maker is building positions or distributing clues, KOL alt addresses, concentrated or dispersed chip distribution, bundled trades, and abnormal capital flows; it also includes address profiling linkage—labeling each address with smart money, KOL/VC, developer, phishing, wash trading tags, and binding them with off-chain identities, linking on-chain and off-chain data.
These signals are often difficult for ordinary users to perceive but can significantly influence short-term market trends. By real-time parsing address tags, holding characteristics, and bundled trades, trading assistance tools are revealing the gaming trends 'underwater,' helping traders avoid risks and seek alpha in second-level markets.
For example, GMGN further integrates smart money, KOL/VC addresses, developer wallets, wash trading, phishing addresses, and bundled trading tags on top of on-chain real-time trading and token contract data sets, mapping on-chain addresses to social media accounts, aligning capital flow, risk signals, and price behavior to the second level, helping users make faster entry and risk avoidance judgments.
▲ GMGN
AI-driven executable signals: From information to profit
'The next round of AI will not sell tools, but rather returns.' — Sequoia Capital
This judgment also holds in the Crypto Trading field. Once the speed and dimensions of the data meet standards, the subsequent competitive goal lies in the data decision-making stage—whether multi-dimensional complex data can be directly transformed into executable trading signals. The evaluation criteria for data decision-making can be summarized in three points: fast enough, automated, and exceeding yield rates.
Fast enough: As AI capabilities continue to improve, the advantages of natural language and multi-modal LLMs will gradually be realized here. They can not only integrate and understand vast amounts of data but also establish semantic links between data, automatically extracting decision-making conclusions. In the high-intensity, low-depth trading environment on-chain, every signal has a very short validity and capital capacity, and speed directly affects the yield that signals can bring.
Automation: Humans cannot monitor trading 24 hours a day, but AI can. For example, users can place Copy Trading orders with take profit and stop-loss conditions on the Senpi platform. This requires AI to poll or monitor data in real-time in the background and automatically decide when a suggested signal is detected.
Yield: Ultimately, the effectiveness of any trading signal depends on its ability to continuously generate excess returns. AI needs not only to understand on-chain signals sufficiently but also to combine risk control to maximize risk-reward ratios in a highly volatile environment. For example, it should consider unique yield-influencing factors on-chain such as slippage losses and execution delays.
This capability is reshaping the business logic of data platforms: from selling 'data access rights' to selling 'revenue-driven signals.' The competition focus of the next generation of tools is no longer data coverage, but the executability of signals—whether they can truly complete the last mile from 'insight' to 'execution.'
Some emerging projects have begun to explore this direction. For example, Truenorth, as an AI-driven discovery engine, incorporates 'decision execution rates' into the assessment of information effectiveness, continuously optimizing output through reinforcement learning, minimizing ineffective noise, and helping users build executable information flows directly targeting order placement.
▲ Truenorth
Despite the huge potential of AI in generating executable signals, it also faces multiple challenges.
Hallucinations: On-chain data is highly heterogeneous and noisy. When LLM parses natural language queries or multi-modal signals, it can easily experience 'hallucinations' or overfitting, affecting signal yield rates and accuracy. For example, for multiple tokens with the same name, AI often cannot find the contract address corresponding to the CT Ticker. Similarly, for many AI signal products, discussions about AI in CT often point to Sleepless AI.
Signal lifespan: The trading environment is ever-changing. Any delay will erode profits; AI must complete data extraction, reasoning, and execution in a very short time. Even the simplest Copy Trading strategy will turn negative if it doesn't follow the smart money.
Risk control: In high-volatility scenarios, if AI fails to go on-chain consecutively or slippage is too high, it may not only fail to generate excess returns but could also deplete all principal within minutes.
Therefore, finding a balance between speed and accuracy, and using mechanisms like reinforcement learning, transfer learning, and simulation backtesting to reduce error rates, is the competitive point for AI in this field.
Upward or downward? The survival choice of data dashboards
As AI can directly generate executable signals and even assist in order placement, 'light middle-layer applications' that solely rely on data aggregation are facing a survival crisis. Whether it's piecing together on-chain data into dashboard tools or layering execution logic on top of aggregations in trading bots, they essentially lack sustainable moats. In the past, these tools could still stand based on convenience or user mentality (for example, users habitually check token CTO status on Dexscreener); but now, as the same data becomes available in multiple places, execution engines commoditize, and AI can directly generate decision signals and trigger execution on the same data, their competitiveness is rapidly being diluted.
In the future, efficient on-chain execution engines will continue to mature, further lowering trading barriers. In this trend, data providers must make choices: either deepen into faster data acquisition and processing infrastructure or extend to the application layer, directly controlling user scenarios and consumption flow. Those caught in the middle, merely aggregating data and providing lightweight packaging, will see their survival space continuously squeezed.
Downward means building infrastructure moats. Hubble AI, while developing trading products, realized that solely relying on TG Bots could not form a long-term advantage and thus shifted to upstream data processing, aiming to create 'Crypto Databricks'. After maximizing the data processing speed of Solana, Hubble AI is moving from data processing to an integrated data research platform, occupying a position upstream in the value chain, providing underlying support for the U.S. 'financial blockchain' narrative and the data needs of on-chain AI Agent applications.
Upward means extending to application scenarios, locking in end users. Space and Time initially focused on sub-second SQL indexing and oracle push, but recently began exploring C-end consumer scenarios, launching Dream.Space on Ethereum — a 'vibe coding' product. Users can write smart contracts or generate data analysis dashboards in natural language. This transformation not only increases the calling frequency of its own data services but also forms direct stickiness with users through terminal experience.
Thus, it can be seen that roles caught in the middle, solely relying on selling data interfaces, are losing their survival space. The future B2B2C data track will be dominated by two types of players: one that controls the underlying pipeline, becoming the 'water, electricity, and coal' of on-chain infrastructure; and another that is close to user decision scenarios, transforming data into application experiences.
Summary
Amid the meme craze, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data track is undergoing a structural transformation. The iteration of trading speed, data dimensions, and executable signals has made 'visible charts' no longer the core competitiveness; the true moat is shifting to 'executable signals that help users make money' and 'the underlying data capabilities that support all of this.'
In the next 2-3 years, the most attractive entrepreneurial opportunities in the crypto data field will emerge at the intersection of the maturity level of Web2 infrastructure and the on-chain native execution model of Web3. Data from large cryptocurrencies like BTC/ETH, due to their high standardization, closely resembles traditional financial futures products, and has gradually been included in the data coverage scope of traditional financial institutions and some Web2 fintech platforms.
Conversely, the data of meme coins and long-tail on-chain assets exhibit extremely high non-standardization and fragmentation—ranging from community narratives, on-chain public opinion to cross-chain liquidity, this information needs to be interpreted in conjunction with on-chain address profiling, off-chain social signals, and even second-level transaction execution. It is under this difference that the processing and trading loops of long-tail assets and meme data create unique opportunity windows for crypto-native entrepreneurs.
We are optimistic about projects that will deeply cultivate the following two directions:
Upstream infrastructure - on-chain data companies with streaming data pipelines, ultra-low latency indexing, and cross-chain unified parsing frameworks that rival the processing capabilities of Web2 giants. Such projects are expected to become the Web3 version of Databricks/AWS, as users gradually migrate on-chain, transaction volumes are expected to grow exponentially, and the B2B2C model holds long-term compounding value.
Downstream execution platform - an application that integrates multi-dimensional data, AI agents, and seamless trading execution. By converting fragmented signals from on-chain/off-chain into directly executable trades, such products have the potential to become the crypto-native Bloomberg Terminal, with a business model that no longer relies on data access fees, but monetizes through excess returns and signal delivery.
We believe that these two types of players will dominate the next generation of the crypto data track and build sustainable competitive advantages.
 
Click to learn about ChainCatcher's job openings
 
Recommended reading:
Shift in exchange listing strategies: The rise of DEX issuance and a new pattern dominated by secondary listings.
Backroom: Information tokenization, the solution to the clutter of data in the AI era? | CryptoSeed
Dialogue with Saros CEO Lynn Nguyen: After completing a $38 million buyback, how to break through in the Solana DEX track?
Are data infrastructures ready in the era of crypto super applications?

Explore More From Creator

Latest News

Are data infrastructures ready in the era of crypto super applications?

Explore More From Creator

Latest News

Trending Articles