Originally reported by Schneier on Security
TL;DR
Security researchers propose a structured framework mapping how AI prompt injection attacks evolve into sophisticated malware campaigns across seven distinct stages.
Academic research from Schneier et al. mapping a structured kill chain for LLM-based malware. Significant emerging threat research but no confirmed active exploitation of the described attack chain.
The cybersecurity community's focus on "prompt injection" as a singular vulnerability fundamentally misrepresents the threat landscape surrounding large language models (LLMs). Security researchers Schneier, Brodt, Feldman, and Nassi have published new research demonstrating that attacks against AI systems have evolved into a distinct class of malware execution mechanisms they term "promptware."
The research proposes a structured seven-step kill chain model that mirrors traditional malware campaigns like Stuxnet and NotPetya, providing security practitioners with a framework to understand and defend against increasingly sophisticated AI-based attacks.
Malicious payloads enter AI systems through direct prompt input or "indirect prompt injection," where adversaries embed instructions in content the LLM retrieves during inference, web pages, emails, shared documents, or even images and audio files in multimodal systems. The fundamental architectural flaw lies in LLMs processing all input as undifferentiated token sequences, eliminating the traditional boundary between trusted instructions and untrusted data.
Attackers circumvent safety training and policy guardrails through social engineering techniques, convincing models to adopt personas that ignore safety rules, or through sophisticated adversarial suffixes. This phase unlocks the model's full capabilities for malicious use.
The compromised LLM reveals information about connected services, assets, and capabilities, enabling autonomous progression through the kill chain. Unlike traditional malware reconnaissance, this occurs post-compromise and leverages the victim model's reasoning capabilities against itself.
Promptware embeds itself into AI agents' long-term memory or poisons databases the agent relies upon. For example, malicious code can infect email archives, re-executing every time the AI summarizes past communications.
Established persistence enables dynamic command fetching during inference time, transforming static threats into controllable trojans whose behavior attackers can modify remotely.
Infected AI agents spread malware across connected systems, leveraging their access to emails, calendars, and enterprise platforms. Self-replicating attacks can trick email assistants into forwarding malicious payloads to all contacts, creating viral propagation patterns.
The final stage achieves tangible malicious outcomes including data exfiltration, financial fraud, or physical world impact. Documented examples include AI agents manipulated into selling cars for one dollar or transferring cryptocurrency to attacker wallets.
The "Invitation Is All You Need" research demonstrated the kill chain by embedding malicious prompts in Google Calendar invitation titles. The attack achieved persistence through workspace memory, lateral movement by launching Zoom, and concluded with covert livestreaming of victims.
Similarly, "Here Comes the AI Worm" research showed end-to-end kill chain execution through email-based prompt injection, resulting in data exfiltration and viral propagation to new recipients.
The research argues that prompt injection cannot be fixed in current LLM technology, requiring defense-in-depth strategies that assume initial access will occur. Effective defenses must focus on breaking the chain at subsequent stages through:
By reframing prompt injection as the initial stage of sophisticated malware campaigns rather than isolated vulnerabilities, security teams can shift from reactive patching to systematic risk management for AI-integrated systems.
Originally reported by Schneier on Security