The drawing ability of GPT-4o is inherently multimodal, not just a simple call to Dall-E, but rather generating images through reasoning chains combined with knowledge.
What does this mean? Controversial statement alert:
The AI derivative tools currently favored by capital (RAG, AI IDE, Workflow tools, Agents, scenario-based products, etc.) may lose all value in the face of multimodal AI.
In 2023, face-swapping required training LoRA, but now GPT-4o can accomplish it in one sentence.
Blind + Deaf ≠ Normal Person; multimodal resonance is not 1 + 1 = 2, but rather exponential evolution. In front of truly multidimensional AI, all 'curve-saving' solutions will eventually be eliminated.
At this moment, I can only think of that line from 'The Three-Body Problem' by Wei Da:
Advance! Advance! Advance by any means necessary!