According to Odaily, DeepSeek has introduced NSA, a sparse attention mechanism compatible with hardware and capable of native training. Designed for ultra-fast long-context training and inference, NSA optimizes for modern hardware, accelerating inference speed and reducing pre-training costs without compromising performance. It performs comparably or even better than full attention models in general benchmarks, long-context tasks, and instruction-based inference.