Exposing AI's Inherent Risks

By Shaked Reiner, Principal Cyber Researcher, CyberArk Labs.

Thursday, 26th September 2024 Posted 1 year ago in AI by Phil Alsop

Large Language Models (LLMs) are rapidly spreading across industries, with many businesses implementing them for various applications. A Gartner, Inc. survey found that 55% of companies are currently testing or using LLM projects, a number expected to increase quickly. However, organisations should be cautious and thoroughly assess the risks before rushing to adopt this technology.

While this technological advancement is exciting, it's essential to tackle identity security issues and create new guidelines. I suggest one crucial principle: Always treat your LLM as a potential security risk.

And here’s why:

The Built-in Risks of Large Language Models

Despite the surge in LLM research, with over 3,000 papers published in the last year alone, a consensus on secure development and seamless integration of LLMs into existing systems remains elusive.

LLMs can be easily manipulated to produce inaccurate outputs with minor prompt alterations. Beyond unreliability, they can introduce significant security vulnerabilities to the systems they're integrated with.

Primarily, in their current form, LLMs are susceptible to "jailbreaking" - where attackers can manipulate them to behave in unintended or harmful ways. A recent study by EPFL researchers demonstrated a near 100% success rate in jailbreaking leading models using a combination of known techniques. This represents just the beginning, as new attack methods and jailbreaking strategies continue to emerge in monthly research papers.

The consequences of LLM jailbreaking vary in severity based on context. In milder cases, compromised LLMs might provide instructions for illicit activities against their intended policies. While undesirable, Simon Willison characterises this as a "screenshot attack" - the model's misbehaviour has limited impact, either publicised or potentially misused, but the information is already available online.

The stakes increase dramatically with more capable LLMs that can execute database queries, make external API calls, or access networked machines. In such scenarios, manipulating LLM behaviour could allow attackers to use the model as a springboard for malicious activities.

A paper presented at BlackHat Asia this year highlights this risk: 31% of examined code bases contained remote code execution (RCE) vulnerabilities introduced by LLMs. This means attackers could potentially execute arbitrary code using natural language inputs alone.

Considering LLMs' vulnerability to manipulation and their potential to compromise their operating environment, it's crucial to adopt an "assume breach" approach when designing system architecture. This mindset involves treating the LLM as if it's already compromised by an attacker and implementing protective identity security measures accordingly.

Tackling Associated Safety Concerns

It is crucial to cultivate an understanding that Large Language Models (LLMs) integrated into our systems cannot be inherently trusted. We must then leverage our traditional identity security expertise alongside our experience in organisational LLM integration to follow general guidelines that minimise risks associated with LLM deployments.

Firstly, never use an LLM as a security boundary. Only provide the LLM with capabilities you intend it to use. Do not rely on alignment or system prompts to enforce security measures. Adhere to the principle of least privilege. Additionally, grant the LLM only the minimum access required to perform its designated task. Any additional access could potentially be exploited by attackers to infiltrate a company's technological infrastructure.

Approach the LLM as you would any employee or end user. Restrict its actions to only those essential for completing the assigned job. It is also important to implement thorough output sanitisation. Before utilising any LLM-generated output, ensure its validation or sanitisation. This includes removing potential XSS payloads in the form of HTML tags or markdown syntax. Make sure you also sanitise training data to prevent inadvertent leakage of sensitive information by attackers.

If code execution by the LLM is necessary, employ sandboxing techniques. This limits the LLM's access to specific system resources, thereby mitigating the risk of errors or malware affecting the broader system in the event of a cyberattack. By following these guidelines, organisations can better protect themselves while harnessing the power of LLMs in their operations.

Countering Risks from Within

Large Language Models (LLMs) offer impressive abilities and potential, but their susceptibility to manipulation shouldn't be overlooked. It's crucial to design systems assuming LLMs might be compromised, treating them as possible identity security risks. The main point is to approach LLMs with the same wariness and strategic planning you'd use for a potential cyber threat.

By embracing this perspective, you can more securely incorporate LLMs into your systems, sidestepping many related security issues. This mindset is critical for leveraging LLMs' power while preserving strong identity protection

Exposing AI's Inherent Risks

By Shaked Reiner, Principal Cyber Researcher, CyberArk Labs.

AI in the SOC: why complete autonomy is the wrong goal

Managing persistent exposure: why APT defence requires a strategic shift

When AI hacks AI, the victims are still human

Why agentic AI’s next challenge is making systems work together

Autonomous buying is here: Preparing for the era of AI-powered commerce

Creating Pathways into Tech for Girls Means Starting Early

Why AI Agents stumble at the starting line and how to get them on their feet

How enterprises can deploy agentic AI workforces without data limitations