According to Foresight News, decentralized AI protocol Prime Intellect has released a preview of its inference stack. This development aims to address challenges in autoregressive decoding, including computational efficiency, KV cache memory bottlenecks, and public network latency.

The inference stack employs a pipeline parallel design, enabling high computational density and asynchronous execution. Alongside this release, Prime Intellect has introduced three open-source code libraries: PRIME-IROH, a peer-to-peer communication backend; PRIME-VLLM, which integrates vLLM with public network pipeline parallelism; and PRIME-PIPELINE, a research sandbox.

These tools allow users to run large models using GPUs such as the 3090 and 4090, enhancing the capabilities of AI protocols.