Grok4 is really impressive, a product of a 200,000-card cluster, pre-training is truly miraculous, and the scaling law is still effective.
Soon Xai will use 100,000 cards GB200 to train multimodal agents, expected to be released around October this year, with future plans for a million-card cluster.
DS R1 and the most advanced models in North America are no longer from the same generation. Remember when DS first came out this year, many people were still saying that computing power wasn't important, such anti-intellectual remarks. Now there are also voices saying that it's right for CSPs like Alibaba to focus on delivery, putting AI on hold, which is really frustrating~