When Trump invests a trillion in AI, who is providing trustworthy data for AI?
When Trump invests a trillion dollars in AI, it may seem like a competition between models, chips, and data centers, but it also raises deeper questions about how the data that AI models rely on is validated, whether it is traceable, whether the training process is a black box, and whether the inference process can be audited. Can models collaborate, or must they fight their own battles?
In simpler terms, when we obtain information from AI, who can ensure that the information provided by AI is correct? Data pollution is no longer just a joking term; a certain AI application that once claimed to be a ChatGPT killer has been deeply mired in a data pollution environment. When the data sources are all incorrect, how can the answers provided be right?
Is today's AI intelligent? Maybe it is, but even the smartest AI needs model training. However, we cannot know which data was used for model training, we cannot verify whether the GPU actually completed an inference process, and we cannot establish a trust logic among multiple models.
To truly advance AI to the next generation, we may need to solve these three problems simultaneously:
1. Training data must be trustworthy and verifiable.
2. The inference process must be auditable by third-party models.
3. Models must be able to coordinate computing power, exchange tasks, and share results without needing a platform to mediate.
This cannot be solved by a single model, a single API, or a single GPU platform; it requires a truly constructed system for AI. This system should not only store data permanently and at low cost, giving the data itself the rights to be reviewed and to review, but also allow models to validate inference against each other. It must also support models in autonomously discovering computing power, coordinating tasks, and auditing every step of execution under specific conditions.
This is difficult to achieve on centralized platforms, so is it possible to implement it on decentralized platforms? And why should we use a decentralized approach?
I believe only blockchain can truly integrate "data storage, data execution, and data verification" into the same underlying network. This is also one of the greatest attractions of blockchain: immutability and transparency. However, the problem is that not every chain is suitable for serving as the underlying layer for AI.
If purely for storage, IPFS protocol already exists, but mere storage is not enough. We also need the ability for smart contracts to directly invoke data, audit inference results, and even coordinate GPU resources to complete computational tasks. These features, not to mention IPFS, are currently beyond the capabilities of most L1 or AI applications.
If there is any real connection, it might be that @irys_xyz has some opportunities. Irys is not a traditional storage chain but is prepared to be built into a data execution network for AI, treating data as programmable assets. Models can read data on-chain, validate inferences, invoke computing power, and implement pricing, authorization, profit-sharing, and verification through smart contracts.
Of course, Irys still has some immature aspects at the moment, but this developmental direction seems correct. And whether it is centralized AI or decentralized AI, if the data sources are not trustworthy, then all computing power is like building a tower on sand. No matter how strong the model is, it is just a moon in the water and a flower in the mirror.