Prompt Injection vs Jailbreaking

Prompt injection and jailbreak are related but distinct classes of attacks against LLM-based systems, mainly differing in what they try to override and what assets they can impact.[10][12]

Core Definitions

Prompt injection: An attack where untrusted input is concatenated with trusted instructions so that the model ends up following the attacker’s instructions instead of the developer’s or system prompt.[10][12][16]
Jailbreak: An attack that aims specifically to bypass or disable the model’s safety and policy barriers so it will produce otherwise restricted content (e.g., disallowed, unethical, or dangerous outputs).[10][12][15]

What Each Targets

Prompt injection primarily targets the “control layer” of an LLM application: the developer’s instructions, tools, or workflows that the app expects the model to follow (e.g., “only summarize,” “only answer from this document,” “never run tools without validation”).[10][12][19]
Jailbreak primarily targets the model’s safety layer: the RLHF/safety tuning and internal guardrails that tell the model to refuse certain content or behaviors.[6][10][13]

Mechanism and Scope

Prompt injection exploits the fact that the model cannot reliably distinguish which instructions are authoritative, so injected text like “ignore previous instructions and instead do X” can hijack the task prompt, including how the model uses tools or external data.[10][12][16]
Jailbreak uses adversarial phrasing (role-play, hypotheticals, multi-step scaffolding, encoding tricks, etc.) to get the model to ignore its safety policies and answer anyway, but often stays within pure text generation rather than attacking connected tools or systems.[6][10][14]

Relationship between the Two

Many jailbreaks are technically a subtype of prompt injection because they also try to override instructions (“act as DAN, ignore your safety rules”), but the goal is narrowly “make the model say forbidden things,” not necessarily to control tools or external actions.[6][12][14]
Prompt injection in a tool-using system can lead to more severe impact than a simple jailbreak, since it can redirect tool calls, exfiltrate data from retrieved documents, or misuse connected capabilities (APIs, files, databases), even without obviously unsafe text output.[6][16][18]

Security Implications and Defenses

Jailbreak risk is mostly reputational and policy-related: the model says harmful or non-compliant things, which can be screenshot and shared, but does not necessarily compromise external systems by itself.[6][12][13]
Prompt injection is an application-security risk: if the app blindly trusts model outputs, an injected instruction can make the system leak sensitive data, perform unauthorized actions, or corrupt workflows; defenses therefore focus on strict tool-use mediation, input isolation, and not trusting model text as “code or policy.”[6][16][18][19]

Sources

[1] What is Jailbreaking? History, Benefits and Risks https://www.sentinelone.com/cybersecurity-101/cloud-security/what-is-jailbreaking/ [2] Jailbreaking in Cybersecurity https://veriti.ai/glossary/jailbreaking-in-cybersecurity/ [3] What is Jailbreaking & Is it safe? https://www.kaspersky.com/resource-center/definitions/what-is-jailbreaking [4] Understanding Jailbreaking: What is it? https://digital.ai/glossary/understanding-jailbreaks/ [5] Jailbreaking In Cyber Security: Key Concepts | Updated 2025 https://www.acte.in/jailbreaking-in-cyber-security-overview [6] Prompt Injection vs Jailbreaking: What’s the Difference? https://www.promptfoo.dev/blog/jailbreaking-vs-prompt-injection/ [7] iOS jailbreaking https://en.wikipedia.org/wiki/IOS_jailbreaking [8] What is a Prompt Injection? https://www.ai21.com/glossary/foundational-llm/prompt-injection/ [9] What is jailbreaking? https://cybrela.com/en/glossary/jailbreak-jailbreaking/ [10] Prompt Injection vs. Jailbreaking: What’s the Difference? https://learnprompting.org/blog/injection_jailbreaking [11] Jailbreaking - Security Software Glossary https://promon.io/resources/security-software-glossary/jailbreaking [12] Prompt injection and jailbreaking are not the same thing https://simonwillison.net/2024/Mar/5/prompt-injection-jailbreaking/ [13] AI Jailbreak https://www.ibm.com/think/insights/ai-jailbreak [14] LLM01:2025 Prompt Injection - OWASP Gen AI Security Project https://genai.owasp.org/llmrisk/llm01-prompt-injection/ [15] jailbreak - Glossary - NIST Computer Security Resource Center https://csrc.nist.gov/glossary/term/jailbreak [16] What Is a Prompt Injection Attack? https://www.ibm.com/think/topics/prompt-injection [17] Prompt Injection vs Prompt Jailbreak: A Comparison https://codoid.com/ai/prompt-injection-vs-prompt-jailbreak-a-detailed-comparison/ [18] LLM Vulnerability Series: Direct Prompt Injections and … https://www.lakera.ai/blog/direct-prompt-injections [19] Understanding the Differences Between Jailbreaking and … https://www.knostic.ai/blog/understanding-the-differences-between-jailbreaking-and-prompt-injection

Quartz 🪬

Explorer

Prompt Injection vs Jailbreaking

Core Definitions

What Each Targets

Mechanism and Scope

Relationship between the Two

Security Implications and Defenses

Sources

Table of Contents