Prompt Injection Attacks, a growing threat in AI and LLM usage, can cause data breaches, disrupt services and manipulate outputs, leading to serious data privacy issues. These covert attacks, sometimes hidden in social media prompts, can be mitigated through layered defence, input validation, model auditing, user training and real-time monitoring to protect data integrity and reliability.
Prompt Injection Attacks are a growing threat to any organisation that uses AI or LLMs. They present myriad risks to businesses ranging from remote control of the model to denial of service and even data exfiltration.
There are three key areas of exposure businesses face:
In this article, we’ll briefly cover what Prompt Engineering is, what Indirect Prompt Injection Attacks are and why they are a risk to your business. We’ll discuss how social media could play a role in compromising your AI models and the ways you can mitigate the risks of Prompt Injection Attacks.
Understanding the risk of Prompt Injection Attacks requires some knowledge of Prompt Engineering - a field that has grown exponentially in the last 18 months.
Since the public release of ChatGPT, thousands of people are now positioning themselves on social media as AI experts and master prompt engineers. But, should we trust them and should we blindly copy and paste their prompts into the LLMs we use within our organisations?
Put simply, prompt engineering is how we communicate with LLMs and give them instructions, or prompts, to get the information or results we need. The better the instructions, the better the outcome. It’s like making a sauce, the higher the quality of your ingredients, the better the sauce will taste.
For example, if you prompt ChatGPT to explain data mapping it will provide you with a detailed overview, like this:
But by adding an additional element to your prompt, the output will change:
There are dozens of ways you can use prompt engineering techniques to alter and enhance the outputs of an AI model. However, as with any technology, there is a downside - Prompt Injection Attacks.
This term might sound like something out of a sci-fi movie, but it's a real and emerging threat in the realm of AI. Prompt Injection Attacks involve manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions.
The impact of Prompt Injections stretches far beyond a simple glitch or error. They pose serious concerns for data privacy and the reliability of AI systems. If an AI is misled by a Prompt Injection, it could lead to incorrect or harmful actions, data breaches, or even the spreading of misinformation.
In essence, Prompt Injections can undermine the trust we place in AI systems, turning a powerful tool into a potential liability.
The idea behind Prompt Injection Attacks stems from SQL Injection Attacks, which have been around for years.
An Indirect Prompt Injection Attack is a more covert version of this threat. Unlike direct attacks where the harmful command is contained in the user's input, indirect attacks hide their commands or code and the AI processes them behind the scenes.
This approach is more concerning because it bypasses some usual checks against direct prompt injections, demanding more advanced strategies for detection and prevention.
Here’s one example:
Imagine discovering a prompt on LinkedIn shared by an “AI Expert” promising to analyse customer data and identify your Ideal Customer Profile for an effective marketing strategy. Eager to leverage this expertise, you paste this prompt into your company's AI Model and wait for insightful marketing recommendations.
However, this prompt includes hidden Unicode characters, invisible to you but fully interpretable by your LLM. As the AI processes the prompt, these hidden instructions access customer data (some of it sensitive), skew the analysis and exfiltrate the data, all while presenting seemingly valid marketing strategies.
The challenge lies in the AI’s inability to detect these indirect prompt injection attacks. Trusting the user, the AI overlooks the prompt's hidden elements, leaving your data and decisions vulnerable to unseen influence.
Prompt Injection Attacks are about misleading the AI through its immediate interaction, whereas Indirect Prompt Injection Attacks do so through a backdoor, introducing harmful instructions that the AI could act on in the background.
Both forms present unique challenges in maintaining the security and trustworthiness of AI systems.
To understand how AI can be tricked by these hidden messages, we need to know a bit about Unicode. Unicode is a universal character encoding standard that allows computers and other devices to represent and manipulate text from any writing system.
Within the Unicode library, there is a category known as "tag" characters. Under normal conditions, they are invisible—they don't show up on the screen. They're typically used to tell a computer that certain letters are part of a country's flag emoji. For example, in Unicode, the tag characters for the United States flag would be the letters 'U' and 'S.
But here's the catch: these tag characters only be seen if you start with the flag emoji character.
However, they can still be read by the AI. This is crucial for Indirect Prompt Injection Attacks because attackers can use these invisible tag characters to embed commands within a prompt that the AI will read and act upon, but humans will overlook because they can't see them.
Let’s look at an example.
In a series of recent tweets, Scale AI Prompt Engineer, Riley Goodside, demonstrated how Indirect Prompt Injection works by developing a Proof of Concept (POC).
Goodside explains that this injection attack was caused by invisible instructions from pasted text. As you can see from the image below, the prompt begins with a question from the user about the visible, pasted Zalgo text.
What the user can’t see, is the invisible Unicode tag characters at the end of the prompt that provide it with invisible instructions to return the drawing of the robot.
He goes on to explain that before encoding, the invisible portion of the text says:
Now, in this instance, there is no real damage done - however, with a basic knowledge of Unicode, it is relatively easy to introduce additional, invisible instructions into prompts that could have serious consequences.
These attacks can have various negative impacts on businesses including affecting their availability and integrity as well as repurposing the system for malicious objectives and causing privacy issues.
Availability Violations:
Attackers can disrupt service by prompting models with inputs that increase computation or overwhelm the system, causing denial of service.
Integrity Violations:
These attacks make GenAI systems untrustworthy, often by manipulating LLMs to produce factually incorrect outputs.
Privacy Compromises:
Indirect prompt injections can lead to privacy issues, such as leaking sensitive information or chat histories.
Abuse Violations:
These attacks involve repurposing a system’s intended use for malicious objectives.
The trend of sharing AI prompts on social platforms like LinkedIn has seen a massive uptick in recent months. While this practice democratises AI knowledge, it is a risk to organisations and users.
These prompts might appear safe, but can you trust the person at the other end of the profile? As we’ve discussed, it’s relatively easy to encode a prompt with hidden instructions that could easily manipulate the behaviour of an AI model and leak sensitive information or skew the analysis.
For users, verifying the credibility of the source of your prompts is vital. Use prompts from known, reputable sources, or those that have been vetted by reliable community members or experts. Users should be aware of the potential risks involved in using prompts shared by unknown or unverified entities on social media.
Educating yourself about the basics of prompt engineering and the nature of prompt injection attacks is also beneficial, as it empowers you to understand and question the results you’re given.
There are several data privacy risks associated with Prompt Injection Attacks. Because these attacks leverage the sophisticated processing capabilities of LLMs, there is a significant chance they could be used to extract sensitive information.
Indirect Prompt Injection Attacks can influence the AI’s output and cause models to reveal confidential information or private user data without explicit authorisation. In this article on Embrace the Red, the author details how Google Bard can be tricked into executing commands without the user’s knowledge.
One command involved creating a markdown image tag that, when processed by Bard, would send a request to an attacker-controlled server. The request included chat history data appended to the URL, effectively exfiltrating data. It would be easy for an attacker to use the same method to obtain email content or personal data using a similar approach.
Another data privacy risk is inherent in the integration of LLMs with broader system infrastructures or databases - for example if you’ve implemented a RAG model. This could dramatically increase the risk of data exfiltration and could also provide the attackers with backdoor access to your other systems, potentially compromising the entire organisation.
Prompt Injection Attacks can also result in training data exposure. AI models can unintentionally expose sensitive information that’s embedded in their training data when subject to carefully crafted prompts. If the model is trained on confidential data, an unintentional disclosure could have severe legal and financial consequences for your business.
One of the biggest AI risks is that they can exhibit bias. If there are biases within the training data, then an Injection Attack could be used to exploit or magnify those biases and lead to reputational damage for your organisation.
Finally, there are third-party risks to consider. If an AI model or LLM is integrated with external services then a Prompt Injection could cause it to share sensitive data with third parties. This raises a compliance concern because unintended data sharing could violate data privacy laws and resulting in non-compliance fines, legal penalties and reputational damage.
There are several things you can do to reduce the risk of Prompt Injection Attacks.
These streamlined strategies could help bolster AI system security against prompt injection attacks, maintaining integrity and reliability.
AI isn’t going away and the risks will continue to grow. These risks, particularly from prompts shared on social media, require careful attention. Both developers and users should prioritise validating and sanitising AI prompts to protect against hidden malicious codes that could compromise data security and AI integrity.
Ultimately, the goal is to balance the benefits of AI and shared knowledge with the necessary precautions to maintain trust and safety in AI systems. By being vigilant and informed, we can effectively navigate these challenges and leverage AI's potential responsibly.
Exploring Prompt Injection Attacks - NCC Group
Recommendations to help mitigate prompt injection: limit the blast radius - Simon Willison
NIST -Adversarial Machine Learning
Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Prompt Injection Attacks, a growing threat in AI and LLM usage, can cause data breaches, disrupt services and manipulate outputs, leading to serious data privacy issues. These covert attacks, sometimes hidden in social media prompts, can be mitigated through layered defence, input validation, model auditing, user training and real-time monitoring to protect data integrity and reliability.
Prompt Injection Attacks are a growing threat to any organisation that uses AI or LLMs. They present myriad risks to businesses ranging from remote control of the model to denial of service and even data exfiltration.
There are three key areas of exposure businesses face:
In this article, we’ll briefly cover what Prompt Engineering is, what Indirect Prompt Injection Attacks are and why they are a risk to your business. We’ll discuss how social media could play a role in compromising your AI models and the ways you can mitigate the risks of Prompt Injection Attacks.
Understanding the risk of Prompt Injection Attacks requires some knowledge of Prompt Engineering - a field that has grown exponentially in the last 18 months.
Since the public release of ChatGPT, thousands of people are now positioning themselves on social media as AI experts and master prompt engineers. But, should we trust them and should we blindly copy and paste their prompts into the LLMs we use within our organisations?
Put simply, prompt engineering is how we communicate with LLMs and give them instructions, or prompts, to get the information or results we need. The better the instructions, the better the outcome. It’s like making a sauce, the higher the quality of your ingredients, the better the sauce will taste.
For example, if you prompt ChatGPT to explain data mapping it will provide you with a detailed overview, like this:
But by adding an additional element to your prompt, the output will change:
There are dozens of ways you can use prompt engineering techniques to alter and enhance the outputs of an AI model. However, as with any technology, there is a downside - Prompt Injection Attacks.
This term might sound like something out of a sci-fi movie, but it's a real and emerging threat in the realm of AI. Prompt Injection Attacks involve manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions.
The impact of Prompt Injections stretches far beyond a simple glitch or error. They pose serious concerns for data privacy and the reliability of AI systems. If an AI is misled by a Prompt Injection, it could lead to incorrect or harmful actions, data breaches, or even the spreading of misinformation.
In essence, Prompt Injections can undermine the trust we place in AI systems, turning a powerful tool into a potential liability.
The idea behind Prompt Injection Attacks stems from SQL Injection Attacks, which have been around for years.
An Indirect Prompt Injection Attack is a more covert version of this threat. Unlike direct attacks where the harmful command is contained in the user's input, indirect attacks hide their commands or code and the AI processes them behind the scenes.
This approach is more concerning because it bypasses some usual checks against direct prompt injections, demanding more advanced strategies for detection and prevention.
Here’s one example:
Imagine discovering a prompt on LinkedIn shared by an “AI Expert” promising to analyse customer data and identify your Ideal Customer Profile for an effective marketing strategy. Eager to leverage this expertise, you paste this prompt into your company's AI Model and wait for insightful marketing recommendations.
However, this prompt includes hidden Unicode characters, invisible to you but fully interpretable by your LLM. As the AI processes the prompt, these hidden instructions access customer data (some of it sensitive), skew the analysis and exfiltrate the data, all while presenting seemingly valid marketing strategies.
The challenge lies in the AI’s inability to detect these indirect prompt injection attacks. Trusting the user, the AI overlooks the prompt's hidden elements, leaving your data and decisions vulnerable to unseen influence.
Prompt Injection Attacks are about misleading the AI through its immediate interaction, whereas Indirect Prompt Injection Attacks do so through a backdoor, introducing harmful instructions that the AI could act on in the background.
Both forms present unique challenges in maintaining the security and trustworthiness of AI systems.
To understand how AI can be tricked by these hidden messages, we need to know a bit about Unicode. Unicode is a universal character encoding standard that allows computers and other devices to represent and manipulate text from any writing system.
Within the Unicode library, there is a category known as "tag" characters. Under normal conditions, they are invisible—they don't show up on the screen. They're typically used to tell a computer that certain letters are part of a country's flag emoji. For example, in Unicode, the tag characters for the United States flag would be the letters 'U' and 'S.
But here's the catch: these tag characters only be seen if you start with the flag emoji character.
However, they can still be read by the AI. This is crucial for Indirect Prompt Injection Attacks because attackers can use these invisible tag characters to embed commands within a prompt that the AI will read and act upon, but humans will overlook because they can't see them.
Let’s look at an example.
In a series of recent tweets, Scale AI Prompt Engineer, Riley Goodside, demonstrated how Indirect Prompt Injection works by developing a Proof of Concept (POC).
Goodside explains that this injection attack was caused by invisible instructions from pasted text. As you can see from the image below, the prompt begins with a question from the user about the visible, pasted Zalgo text.
What the user can’t see, is the invisible Unicode tag characters at the end of the prompt that provide it with invisible instructions to return the drawing of the robot.
He goes on to explain that before encoding, the invisible portion of the text says:
Now, in this instance, there is no real damage done - however, with a basic knowledge of Unicode, it is relatively easy to introduce additional, invisible instructions into prompts that could have serious consequences.
These attacks can have various negative impacts on businesses including affecting their availability and integrity as well as repurposing the system for malicious objectives and causing privacy issues.
Availability Violations:
Attackers can disrupt service by prompting models with inputs that increase computation or overwhelm the system, causing denial of service.
Integrity Violations:
These attacks make GenAI systems untrustworthy, often by manipulating LLMs to produce factually incorrect outputs.
Privacy Compromises:
Indirect prompt injections can lead to privacy issues, such as leaking sensitive information or chat histories.
Abuse Violations:
These attacks involve repurposing a system’s intended use for malicious objectives.
The trend of sharing AI prompts on social platforms like LinkedIn has seen a massive uptick in recent months. While this practice democratises AI knowledge, it is a risk to organisations and users.
These prompts might appear safe, but can you trust the person at the other end of the profile? As we’ve discussed, it’s relatively easy to encode a prompt with hidden instructions that could easily manipulate the behaviour of an AI model and leak sensitive information or skew the analysis.
For users, verifying the credibility of the source of your prompts is vital. Use prompts from known, reputable sources, or those that have been vetted by reliable community members or experts. Users should be aware of the potential risks involved in using prompts shared by unknown or unverified entities on social media.
Educating yourself about the basics of prompt engineering and the nature of prompt injection attacks is also beneficial, as it empowers you to understand and question the results you’re given.
There are several data privacy risks associated with Prompt Injection Attacks. Because these attacks leverage the sophisticated processing capabilities of LLMs, there is a significant chance they could be used to extract sensitive information.
Indirect Prompt Injection Attacks can influence the AI’s output and cause models to reveal confidential information or private user data without explicit authorisation. In this article on Embrace the Red, the author details how Google Bard can be tricked into executing commands without the user’s knowledge.
One command involved creating a markdown image tag that, when processed by Bard, would send a request to an attacker-controlled server. The request included chat history data appended to the URL, effectively exfiltrating data. It would be easy for an attacker to use the same method to obtain email content or personal data using a similar approach.
Another data privacy risk is inherent in the integration of LLMs with broader system infrastructures or databases - for example if you’ve implemented a RAG model. This could dramatically increase the risk of data exfiltration and could also provide the attackers with backdoor access to your other systems, potentially compromising the entire organisation.
Prompt Injection Attacks can also result in training data exposure. AI models can unintentionally expose sensitive information that’s embedded in their training data when subject to carefully crafted prompts. If the model is trained on confidential data, an unintentional disclosure could have severe legal and financial consequences for your business.
One of the biggest AI risks is that they can exhibit bias. If there are biases within the training data, then an Injection Attack could be used to exploit or magnify those biases and lead to reputational damage for your organisation.
Finally, there are third-party risks to consider. If an AI model or LLM is integrated with external services then a Prompt Injection could cause it to share sensitive data with third parties. This raises a compliance concern because unintended data sharing could violate data privacy laws and resulting in non-compliance fines, legal penalties and reputational damage.
There are several things you can do to reduce the risk of Prompt Injection Attacks.
These streamlined strategies could help bolster AI system security against prompt injection attacks, maintaining integrity and reliability.
AI isn’t going away and the risks will continue to grow. These risks, particularly from prompts shared on social media, require careful attention. Both developers and users should prioritise validating and sanitising AI prompts to protect against hidden malicious codes that could compromise data security and AI integrity.
Ultimately, the goal is to balance the benefits of AI and shared knowledge with the necessary precautions to maintain trust and safety in AI systems. By being vigilant and informed, we can effectively navigate these challenges and leverage AI's potential responsibly.
Exploring Prompt Injection Attacks - NCC Group
Recommendations to help mitigate prompt injection: limit the blast radius - Simon Willison
NIST -Adversarial Machine Learning
Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection