We are presenting PICO (Prompt Isolation and Cybersecurity Oversight), a new transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation in today's large language models (LLMs).

Current defenses against prompt injection—where malicious instructions can override model behavior—often mix trusted system prompts with untrusted user inputs, creating vulnerabilities that are easily bypassed.

PICO addresses this through architectural innovation: completely separating system instructions and user inputs into distinct processing channels. The system prompt pathway remains frozen and immutable, while a gated fusion mechanism dynamically weighs inputs using signals from a Security Expert Agent and Cybersecurity Knowledge Graph.

We analyze PICO’s effectiveness against sophisticated attacks, including "Policy Puppetry," where malicious instructions are disguised as configuration files. Our mathematical formulation ensures that under adversarial conditions, the trusted system instructions remain dominant in the final output.

This work contributes to building safer, more robust AI systems as language models become more widely deployed.

We invite the research community to collaborate on the empirical evaluation of the PICO framework and the refinement of adversarial training techniques to validate and enhance the model’s robustness: