Author: Story @IOSG

Original link: https://mp.weixin.qq.com/s/zJaPl8autXtYEgbHgGQzLA

Statement: This article is a reprint. Readers can obtain more information through the original link. If the author has any objections to the form of reprint, please contact us, and we will modify it according to the author's requirements. Reprinting is for information sharing only and does not constitute any investment advice, nor does it represent Wu's views and positions.

TL;DR

  1. Data challenge: The competition for block time of high-performance public chains has entered the sub-second era. The increase in high concurrency, high traffic fluctuations, and multi-chain heterogeneous demands on the C side has added complexity to the data side, requiring data infrastructure to shift towards real-time incremental processing + dynamic expansion. Traditional batch ETL has delays ranging from minutes to hours, making it difficult to meet real-time trading needs. Emerging solutions such as The Graph, Nansen, and Pangea introduce stream computing, compressing delays to real-time tracking levels.

  2. The paradigm shift in data competition: the last cycle met the need for 'understandable data'; this cycle emphasizes 'profitable data'. Under the Bonding Curve model, a one-minute delay can lead to multiple costs. Tool iteration: from manually setting slippage → sniper bots → GMGN integrated terminal. The ability to execute on-chain is gradually being commoditized, shifting the core competitive frontier towards the data itself: whoever can capture signals faster can help users profit.

  3. The dimensional expansion of trading data: Meme is essentially the financialization of attention, keying in on narratives, attention, and subsequent propagation. The closed loop of off-chain public sentiment × on-chain data: narrative tracking summaries and sentiment quantification have become the core of trading. 'Underwater data': capital flow directions, role images, labeling of smart money/KOL addresses reveal the hidden games behind anonymous on-chain addresses. The new generation of trading terminals will integrate on-chain and off-chain multidimensional signals to the second level, enhancing entry and risk avoidance judgments.

  4. AI-driven executable signals: from information to revenue. The new competitive goals in this stage are: speed, automation, and the ability to deliver excess returns. LLM + multimodal AI can automatically extract decision signals and combine them with Copy Trading and execution of profit-taking and stop-loss. Risk challenges: hallucinations, short signal lifetimes, execution delays, and risk control. Balancing speed and accuracy, reinforcement learning, and simulation backtesting are key.

  5. The survival choices of data dashboards: lightweight data aggregation/dashboard applications lack moats and their survival space is compressed. Downward: deepen high-performance underlying pipelines and integrated data research. Upward: extend to the application layer, directly engaging user scenarios to improve data call activity. Future track patterns: either become the infrastructure of 'on-chain utilities' or the user platform of 'Crypto Bloomberg.'

The moat is shifting towards 'executable signals' and 'underlying data capabilities.' The closed loop of long-tail assets and trading data is a unique opportunity for crypto-native entrepreneurs. The opportunity window in the next 2-3 years:

Upstream infrastructure: Web2-level processing power + Web3 native demand → Web3 Databricks/AWS.

Downstream execution platforms: AI Agent + multidimensional data + seamless execution → Crypto Bloomberg Terminal.

Thanks to projects like Hubble AI, Space & Time, OKX DEX for their support of this research report!

Introduction: The triple resonance of Meme, high-performance public chains, and AI

In the previous cycle, the growth of on-chain trading primarily relied on infrastructure iteration. As we enter a new cycle, with the gradual maturation of infrastructure, super applications represented by Pump.fun are becoming the new growth engine for the crypto industry. This asset issuance model, with a unified issuance mechanism and sophisticated liquidity design, has shaped a trading trench that is fair and original, frequently producing wealth myths. The replicability of this high multiple wealth effect is profoundly changing users' return expectations and trading habits. Users need not only faster entry opportunities but also the ability to acquire, analyze, and execute multidimensional data in an extremely short time, while existing data infrastructures are struggling to support such density and real-time demands.

The resulting demand for trading environments is at a higher level: lower friction, faster confirmation, deeper liquidity. Trading venues are accelerating their migration to high-performance public chains and Layer 2 Rollups, represented by Solana and Base. The trading data volume of these public chains has increased by more than ten times compared to the last round's Ethereum, posing more severe data performance challenges for existing data providers. With the upcoming launch of new-generation high-performance public chains like Monad and MegaETH, the demand for on-chain data processing and storage will grow exponentially.

Meanwhile, the rapid maturation of AI is accelerating the realization of intelligent equality. GPT-5's intelligence has reached doctoral levels, and multimodal large models like Gemini can easily understand K lines... With the help of AI tools, previously complex trading signals can now be understood and executed by ordinary users. Under this trend, traders are beginning to rely on AI for trading decisions, and AI trading decisions are inseparable from multidimensional, high-efficiency data. AI is evolving from 'auxiliary analysis tools' to 'trading decision centers', and its widespread adoption further amplifies the demands for data real-time responsiveness, interpretability, and scalable processing.

Under the triple resonance of the Meme trading frenzy, the expansion of high-performance public chains, and the commercialization of AI, the on-chain ecosystem's demand for new data infrastructures is becoming increasingly urgent.

Meeting the data challenges of 100,000 TPS and millisecond block times

With the rise of high-performance public chains and high-performance Rollups, the scale and speed of on-chain data have entered a new stage.

With the popularity of high-concurrency and low-latency architectures, daily trading volume easily exceeds tens of millions of transactions, and raw data scales in hundreds of GB. Taking Solana as an example, its average daily TPS over the past 30 days has exceeded 1,200, with daily transaction numbers exceeding 100 million; on August 17, it even set a historical high of 107,664 TPS. Statistics show that Solana's ledger data grows rapidly at a pace of 80-95 TB per year, which translates to 210-260 GB per day.

▲ Chainspect, 30-day average TPS


▲ Chainspect, 30-day trading volume
Not only has throughput increased, but the block time of emerging public chains has also entered the millisecond level. The BNB Chain's Maxwell upgrade has reduced block time to 0.8s, while Base Chain's Flashblocks technology has compressed it to 200ms. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation time to 150ms, while MegaETH's mainnet aims for real-time block generation of 10ms. These breakthroughs in consensus and technology have significantly improved the real-time nature of transactions, but have also imposed unprecedented demands on block data synchronization and decoding capabilities.
However, most downstream data infrastructures still rely on batch processing ETL pipelines, inevitably leading to data delays. For example, Dune's contract interaction event data on Solana usually has a delay of about 5 minutes, while protocol layer aggregated data can take up to 1 hour to be available. This means that an on-chain transaction that could be confirmed in 400ms must be delayed hundreds of times to be visible in analytical tools, which is nearly unacceptable for real-time trading applications.
▲ Dune, Blockchain Freshness
To address the challenges on the data supply side, some platforms have turned to streaming and real-time architectures. The Graph, using Substreams and Firehose, has compressed data latency to near real-time. Nansen has achieved dozens of times performance improvement on Smart Alerts and real-time dashboards by introducing streaming processing technologies like ClickHouse. Pangea, through aggregating computing, storage, and bandwidth provided by community nodes, offers real-time streaming data to B-side users such as market makers, quantitative analysts, and central limit order books (CLOBS) with less than 100ms latency.
▲ Chainspect In addition to the massive data volume, on-chain transactions also exhibit significant uneven traffic distribution characteristics. Over the past year, Pumpfun's weekly trading volume varied by nearly 30 times from lowest to highest. In 2024, the Meme trading platform GMGN experienced 6 server 'overload' incidents within 4 days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After the migration, the system's horizontal scalability and computational elasticity significantly improved, increasing business agility by about 30%, and significantly alleviating pressure during peak trading periods.
▲ Dune, Pumpfun Weekly Volume

▲ Odaily, TiDB's Web3 service case
The multi-chain ecosystem further exacerbates this complexity. Differences in log formats, event structures, and transaction fields across different public chains mean that each new chain requires customized parsing logic, putting significant pressure on the flexibility and scalability of data infrastructure. As a result, some data providers have adopted a 'customer-first' strategy: where there is active trading activity, they prioritize integrating services from that chain, weighing flexibility against scalability.
If data processing remains at the fixed interval mode of batch processing ETL in the context of high-performance chains, it will face challenges of latency backlog, decoding bottlenecks, and query lags, failing to meet the demands for real-time, refined, and dynamic interactive data consumption. Therefore, on-chain data infrastructure must further evolve into streaming incremental processing and real-time computing architectures, while incorporating load balancing mechanisms to address the concurrent pressures brought by periodic trading peaks in the crypto space. This is not only a natural extension of the technical path but also a key link to ensure the stability of real-time queries, which will create a true watershed in the competition of the next generation of on-chain data platforms. Speed is wealth: the paradigm shift of on-chain data competition has moved from 'visualization' to 'executable'. In the previous cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors for 'understandable' data, as people stitched together on-chain narratives using SQL charts.

  • GameFi and DeFi players rely on Dune to track capital inflows and outflows, calculate mining return rates, and withdraw in a timely manner before market turning points.

  • NFT players analyze transaction volume trends, whale holdings, and distribution characteristics through Dune to predict market heat.

However, in this cycle, Meme players are the most active consumer group. They drove the phenomenally successful application Pump.fun to accumulate $700 million in revenue, nearly double the total revenue of the leading consumer application Opensea from the previous cycle.

In the Meme track, the market's time sensitivity is amplified to the extreme. Speed is no longer just an advantage but the core variable determining profit and loss. In the primary market priced by Bonding Curve, speed is cost. Token prices rise exponentially with buying demand, and even a one-minute delay can lead to a multiple difference in entry costs. According to Multicoin research, the most profitable players in this game often have to pay 10% slippage to get into the block three points earlier than competitors. The wealth effect and 'get rich quick myths' drive players to chase second-level K lines, execute transactions in the same block, and create one-stop decision panels, competing on information collection and order placement speed.

▲ Binance

In the era of manual trading on Uniswap, users had to set slippage and gas themselves, and the front end could not see prices, making trading more like 'lottery'; to the era of BananaGun sniper bots, automatic sniping and slippage technology allowed retail players to stand on the same starting line as scientists; then to the PepeBoost era, where bots first push pool opening information while also synchronizing front-row holding data; ultimately developing to today’s GMGN era, creating an integrated terminal that combines K line information, multidimensional data analysis, and trading execution, becoming the 'Bloomberg Terminal' of meme trading.

As trading tools continue to iterate, execution thresholds gradually dissolve, and the competitive frontier inevitably shifts towards the data itself: whoever can capture signals faster and more accurately can establish a trading advantage in the rapidly changing market and help users make money.

Dimension is an advantage: the truth beyond K lines

The essence of Memecoin is the financialization of attention. High-quality narratives can continuously break through barriers, aggregating attention, thus driving up prices and market cap. For Meme traders, while real-time responsiveness is crucial, achieving significant results hinges on answering three questions: what is the narrative of this token, who is paying attention, and how will that attention continue to amplify in the future? These are only shadows left on K lines; the real driving force relies on multidimensional data—off-chain public opinion, on-chain addresses and holding structures, and the precise mapping of the two.

On-chain × Off-chain: From Attention to Transaction Closure User attracts attention off-chain and completes transactions on-chain, the closed-loop data of the two is becoming the core advantage of Meme trading.

#Narrative Tracking and Propagation Chain Identification
On social platforms like Twitter, tools like XHunt can help Meme players analyze the KOL attention lists of projects to identify the associated people behind the projects and potential attention propagation chains. 6551 DEX aggregates Twitter, official websites, tweet comments, issuance records, KOL attention, etc., to generate comprehensive AI reports for traders that change in real-time with public opinion, helping traders precisely capture narratives.


#Sentiment Indicator Quantification
Tools like Kaito and Cookie.fun aggregate content from Crypto Twitter and conduct sentiment analysis to provide quantifiable indicators like Mindshare, Sentiment, and Influence. For example, Cookie.fun overlays these two indicators directly onto price charts, turning off-chain sentiment into readable 'technical indicators'.

▲ Cookie.fun

#On-chain and Off-chain are Equally Important
OKX DEX displays Vibes analysis alongside market conditions, aggregating KOL calling times, leading associated KOLs, Narrative Summary, and comprehensive scoring to shorten off-chain information retrieval time. Narrative Summary has become the AI product feature that resonates best with users.

Underwater data presentation: turning 'visible ledgers' into 'usable Alpha'
Traditional finance's order flow data is held by large brokerage firms, and quantitative firms need to pay hundreds of millions of dollars each year to optimize trading strategies. In contrast, the trading ledger of Crypto is completely open and transparent, akin to open-sourcing high-priced information, forming an open-pit gold mine waiting to be mined.
The value of underwater data lies in extracting invisible intentions from visible transactions. This includes capital flow direction and role characterization—whether market makers are building positions or distributing clues, KOL sub-account addresses, concentrated or dispersed chips, bundled trading (bundles), and unusual capital flow; it also includes address profiling linkage—labeling various addresses as smart money, KOL/VC, developers, phishing, or rat trading, and binding them with off-chain identities, linking on-chain and off-chain data.
These signals are often difficult for ordinary users to detect, yet they can significantly impact short-term market trends. By real-time parsing address labels, holding characteristics, and bundled trading, trading assistance tools are revealing the gaming dynamics 'beneath the surface', helping traders avoid risks and seek alpha in second-level markets.
For example, GMGN integrates on-chain real-time trading and token contract data with further analysis of smart money, KOL/VC addresses, developer wallets, rat trading, phishing addresses, and bundled trading, mapping on-chain addresses to social media accounts, aligning capital flows, risk signals, and price behavior to the second level, helping users make faster entry and risk avoidance judgments.
▲ GMGN AI-driven executable signals: from information to revenue 'In the next round of AI, what we sell is not tools, but revenue.' — Sequoia Capital
This judgment also holds in the Crypto Trading field. Once the speed and dimensionality of data meet standards, the next competitive goal will be in the data decision-making phase, whether it can directly convert multidimensional complex data into executable trading signals. The evaluation criteria for data decision-making can be summarized in three points: speed, automation, and excess return.

  • Fast: With the continuous advancement of AI capabilities, the advantages of natural language and multimodal LLMs will gradually play a role here. They can not only integrate and understand massive amounts of data but also establish semantic connections between data, automatically extracting decisive conclusions. In an environment of high intensity and low trading depth on-chain, each signal has a very short timeliness and capital capacity, and speed directly impacts the returns that signals can generate.

  • Automation: Humans cannot monitor trading 24 hours a day, but AI can. For example, users can place Copy Trading orders with profit-taking and stop-loss conditions through agents on the Senpi platform. This requires AI to perform real-time polling or monitoring of data in the background and automatically decide to place orders when a recommendation signal is detected.


  • Yield: Ultimately, the effectiveness of any trading signal depends on whether it can continuously generate excess returns. AI must not only have sufficient understanding of on-chain signals but also incorporate risk control to enhance risk-return ratios in a highly volatile environment. For instance, considering slippage losses, execution delays, and other yield-influencing factors unique to on-chain.

This capability is reshaping the business logic of data platforms: from selling 'data access rights' to selling 'revenue-driven signals'. The focus of competition for the next generation of tools is no longer on data coverage, but on the executability of signals—whether they can truly complete the last mile from 'insight' to 'execution'.

Some emerging projects have begun to explore this direction. For example: Truenorth, as an AI-driven discovery engine, incorporates 'decision execution rate' into the assessment of information effectiveness, continuously optimizing output results through reinforcement learning, minimizing ineffective noise, and helping users build actionable information flows directly aimed at order placement.

▲ Truenorth

Although AI has enormous potential in generating executable signals, it also faces multiple challenges.

  • Hallucination: On-chain data is highly heterogeneous and noisy, and LLMs can easily experience 'hallucinations' or overfitting when parsing natural language queries or multimodal signals, impacting signal returns and accuracy. For example, for multiple tokens with the same name, AI often struggles to find the corresponding contract address for the CT ticker. Similarly, for many AI signal products, discussions about AI in the CT often point to Sleepless AI.

  • Signal Lifespan: The trading environment is rapidly changing. Any delay will erode returns, and AI must complete data extraction, reasoning, and execution in a very short time. Even the simplest Copy Trading strategies will see returns turn negative without following smart money.

  • Risk Control: In high-volatility scenarios, if AI continuously fails on-chain or incurs significant slippage, it cannot only fail to generate excess returns but could also deplete all capital within minutes.

Therefore, finding a balance between speed and accuracy, and using mechanisms like reinforcement learning, transfer learning, and simulation backtesting to reduce error rates, is a competitive point for AI to land in this field.

Upward or downward? The survival choices of data dashboards

As AI can directly generate executable signals and even assist in order placement, 'light intermediate layer applications' solely relying on data aggregation are facing a survival crisis. Whether piecing together on-chain data into dashboard tools or adding a layer of execution logic on top of the aggregation through trading bots, they fundamentally lack a sustainable moat. In the past, these tools could still stand based on convenience or user mindset (for example, users were accustomed to checking token CTO situations on Dexscreener); but now, as the same data is available in multiple places, execution engines are increasingly commoditized, and AI can directly generate decision signals and trigger execution on the same data, their competitiveness is rapidly diluting.

In the future, efficient on-chain execution engines will continue to mature, further lowering trading thresholds. Under this trend, data providers must make choices: either deepen their focus on faster data acquisition and processing infrastructure or extend to the application layer, directly controlling user scenarios and consumption flow. Roles in the middle that only aggregate data and provide lightweight packages will continue to see their survival space squeezed.

Downward means building a moat of infrastructure. Hubble AI realized during the development of trading products that solely relying on TG Bot could not form a long-term advantage, so it turned to upstream data processing, aiming to create 'Crypto Databricks'. After optimizing Solana's data processing speed to the extreme, Hubble AI is moving from data processing to an integrated data research platform, occupying a position upstream in the value chain, providing foundational support for the narrative of 'financialization on-chain' and the data needs of on-chain AI Agent applications.

Upward means extending to application scenarios and locking in terminal users. Space and Time initially targeted sub-second SQL indexing and oracle pushing but has recently begun exploring consumer scenarios, launching Dream.Space on Ethereum—a 'vibe coding' product. Users can naturally write smart contracts or generate data analysis dashboards in natural language. This transformation not only increases the frequency of its data service calls but also directly enhances user engagement through terminal experience.

It is evident that roles that rely solely on selling data interfaces are losing their space for survival. The future B2B2C data track will be dominated by two types of players: one type controls the underlying pipelines, becoming the 'on-chain utility companies'; the other type is close to user decision scenarios, transforming data into application experiences.

Summary

In the triple resonance of the Meme frenzy, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data track is experiencing structural shifts. The iteration of trading speed, data dimensions, and execution signals has rendered 'visible charts' no longer the core competitive advantage; the real moat is shifting towards 'executable signals that help users make money' and 'the underlying data capabilities supporting all of this.'

In the next 2-3 years, the most attractive entrepreneurial opportunities in the crypto data field will emerge at the intersection of Web2-level infrastructure maturity and Web3 on-chain native execution models. Data from major currencies like BTC/ETH, due to their high standardization, closely resemble traditional financial futures products and have gradually been incorporated into the data coverage scope by traditional financial institutions and some Web2 fintech platforms.

In contrast, the data of Meme coins and long-tail on-chain assets show extremely high non-standardization and fragmentation characteristics—from community narratives, on-chain public sentiment to cross-chain liquidity, this information needs to be interpreted in conjunction with on-chain address profiling, off-chain social signals, and even second-level trading execution. It is this difference that forms a unique opportunity window for long-tail assets and Meme data in handling and trading loops, constituting a unique opportunity for crypto-native entrepreneurs.

We are optimistic about projects that will be deeply cultivated in the following two directions:

  • Upstream infrastructure — on-chain data companies with streaming data pipelines, ultra-low latency indexing, and cross-chain unified parsing frameworks comparable to Web2 giants’ processing capabilities. These types of projects are expected to become the Web3 version of Databricks/AWS, as users gradually migrate on-chain, and transaction volume levels are expected to grow exponentially, making the B2B2C model have long-term compound value.

  • Downstream execution platforms — applications integrating multidimensional data, AI Agents, and seamless trading execution. By transforming fragmented on-chain/off-chain signals into directly executable trades, these products have the potential to become the crypto-native Bloomberg Terminal, where their business model no longer relies on data access fees but monetizes through excess returns and signal delivery.

We believe that these two types of players will dominate the next generation of crypto data tracks and build sustainable competitive advantages.

Reference

https://dune.com/dune/meta-monitorhttps://www.coingecko.com/research/publications/fastest-blockchainshttps://learnblockchain.cn/article/13999
https://dune.com/queries/4741262/7873099
https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-025-00561-xhttps://trino.io/assets/blog/trino-fest-2023/TrinoFest2023Dune.pdfhttps://www.odaily.news/post/5203441https://xie.infoq.cn/article/c99c78a6bacc1e13e9dd92d91
https://www.theblockbeats.info/news/21676
https://mirror.xyz/fscglobal.eth/BXw7EXkbw00HhagSKq3-xHYa7co2Tkd0rKLJPzuVw_Yhttps://web3caff.com/archives/126103
https://multicoin.capital/2025/06/26/new-modalities-for-issuance-and-trading/https://rpcfast.com/blog/solana-rpc-node-full-guide