← All Posts | AI | May 26, 2026

Bypassing Cursor’s Command Allowlist with GTFOBins-Style Execution

Damian Rusinek

Damian Rusinek

Managing Partner & Smart Contract Security Auditor

While evaluating Cursor IDE, a behavior was found that looked like a security control but did not behave like one under realistic command-execution patterns.

Cursor allowlist mechanism issue

Cursor provides an allowlist-based mechanism for automatic command execution. The idea is simple: if a command is allowlisted, the agent can run it without asking every time. If a command is not allowlisted, Cursor should stop and ask the user for approval.

Predefining expected ways to handle particular tasks is a good practice,  especially in an AI-assisted IDE where the model can run shell commands on behalf of the user.

The reported issue (“allowlist bypass and potential files leak”) was that the allowlist appeared to be enforced primarily against the top-level command, not against what that command could indirectly execute. This meant that allowlisted Unix utilities could be used as execution trampolines.To demonstrate the problem the  find was used as the example because it is common, frequently trusted, and has well-known execution capabilities through flags such as -exec.


Cursor expected behavior

Cursor behaved correctly in simple cases.

For example, when the agent attempted to start a Python HTTP server directly:

python3 -m http.server 8080

Cursor asked for permission to execute python3.

This case is handled correctly. Starting a local HTTP server might expose files from the working directory, and in an IDE workspace that may contain source code, configuration files, tokens, .env files, credentials, or other sensitive material.

Cursor also handled obvious shell chaining correctly. When commands were chained in a visible way, the approval UI recognized that more than one operation was being attempted and asked for confirmation.

So far, so good.


Bypassing Cursor’s Command Allowlist

The bypass appeared when execution was hidden behind a capability of an allowlisted binary.

Instead of asking Cursor to run Python directly, the command could be wrapped through find:

find README.md -exec python3 -m http.server 8080

From a Unix perspective, this is not exotic. It is exactly the type of behavior documented in the manual. The attacker can use a legitimate binary, often one that is already trusted or allowlisted, to execute another command.

The important detail is that Cursor’s approval flow treated the operation as find, not as the effective execution of python3.

So if find was allowlisted, the nested execution could proceed without the same approval prompt that appeared when python3 was invoked directly.

That is the core vulnerability: the decision was made on the wrapper command, while the security-relevant action happened inside the wrapper.


Why this matters

This is not merely a theoretical bypass.

In a development workspace, running:

python3 -m http.server 8080

can expose the contents of the current directory over HTTP. Depending on the environment, that may include:

  • source code,
  • private repositories,
  • .env files,
  • API keys,
  • build artifacts,
  • internal documentation,
  • credentials accidentally stored in the project tree.

The screenshot shows the server serving a directory listing from the workspace after being launched through the allowlisted find path. This style of bypass could affect Cursor’s “Fetch Domain Allowlist”, because an allowlisted command could indirectly invoke network-capable behavior.

The issue becomes more serious in the context of an AI coding agent. A human user may not inspect every shell nuance in a long prompt or generated command. If the product provides an allowlist, users may reasonably assume it is enforcing a meaningful policy boundary.

That assumption is dangerous if the boundary only checks the first executable.


“But the model blocked the obvious malicious case”

One interesting part of this finding was that trivial malicious usage was sometimes blocked by the LLM itself.

For example, a direct attempt to fetch remote content and pipe it into a shell could be refused by the model as unsafe. Here is an example: Cursor recognized the classic curl | bash pattern and refused to execute it.

This is some kind of additional layer of security, but as we know from How to Minimize the Risk of Prompt Injection it also has its limitations.

An LLM refusal is probabilistic and prompt-dependent. It can recognize some dangerous patterns, miss others, or be persuaded by context. Security controls should not depend on whether the model “feels” that a command is malicious.

The allowlist should be deterministic.If the policy says “python3 requires approval,” then python3 should require approval whether it is run directly, through find -exec, through xargs, through a shell, through env, through make, or through any other execution-capable wrapper.


GTFOBins and the allowlist problem

This class of issue is well known outside AI IDEs.

GTFOBins documents legitimate Unix binaries that can be abused to bypass restricted shells, spawn interpreters, read files, write files, or execute commands. These are not vulnerabilities in the binaries themselves. They are features of powerful Unix tools.

That is exactly why binary-based allowlists are hard to get right.

An allowlist entry such as: find does not mean “safe file search only.” It may also mean:

find . -exec sh -c '...' or find . -exec python3 ... or many other variants depending on platform and available tooling. Similarly, allowing tools such as ftp, less, awk, sed, tar, git, can quickly become equivalent to allowing arbitrary execution unless arguments and behavior are constrained.


Why “out of scope” is unsatisfying

After the initial report, the email was forwarded to a managed private bug bounty program on a well-known platform, where this type of scenario was reportedly listed as out of scope.

When designing an AI-assisted IDE for developers, there is an inherent trade-off between security and usability: the assistant often needs permissions comparable to the developer’s in order to be useful, while those same permissions may introduce risk to the project. 

In this case, the chosen balance appears to be a pragmatic compromise, although unfortunately it gives false sense of security and control.

Why expose a feature as an allowlist if it is not intended to be a security boundary?

If the allowlist is only a convenience feature to reduce prompts, the UI should make that clear. But if users are expected to rely on it to control what the agent can execute automatically, then bypassing it through common GTFOBins-style patterns should be treated as a security issue.The distinction matters because users will build trust around the feature. A button labeled “Allowlist find” does not communicate “also permit any command that find can execute through its flags.”


Root cause

The root cause appears to be policy evaluation at the wrong abstraction layer.

The system checked something like:

Is the top-level command allowlisted?

But the security-relevant question is:

What actions will this command cause?

Those are very different.

A safer implementation would need to account for at least:

  1. the executable being launched,
  2. arguments that introduce command execution,
  3. shell expansion and command chaining,
  4. subprocesses created by allowed tools,
  5. network access,
  6. file-read and file-write behavior,
  7. whether the command exposes workspace contents.

For some commands, argument-aware policy may be enough. For others, the only safe option may be to treat them as dangerous unless run in a sandbox with explicit filesystem and network restrictions.


Recommended fixes

A robust fix should not rely on the LLM recognizing suspicious intent.

Possible mitigations include:

1. Argument-aware allowlisting

Do not allow a binary purely by name. Policies should distinguish between:

find . -name "*.md"

and:

find . -exec python3 -m http.server 8080 \;

The first searches files. The second executes another program.

2. Dangerous flag detection

Commands with known execution flags should trigger approval when those flags are used.

For find, examples include:

-exec

-execdir

For other tools, similar execution-enabling options should be modeled.

3. Effective command approval

If a command invokes another command, the nested command should go through the same approval path as if it were executed directly.

If python3 requires approval, then find -exec python3 … should also require “command execution” approval.

4. Clearer UX language

If the allowlist is not a security boundary, the product should avoid presenting it like one.

If it is a security boundary, then bypasses through standard Unix execution features should be in scope.


Final thoughts

This vulnerability was not about find specifically. find was just the cleanest example.

An allowlist that permits find but ignores find -exec is not really an execution policy. It is a prompt-reduction mechanism with a security-looking UI.

The uncomfortable part is that users may trust it as a security feature, while the system treats it as a convenience feature.


Secure your LLM integration

If your organization is building an AI agent or integrates with LLM’s, Composable Security can help identify realistic risks before they become production incidents.

Our team can support you with:

  • LLM application threat modeling;
  • AI agent security review;
  • secure architecture recommendations;

Contact us to assess and reduce security risk in your AI systems.


Join the newsletter now

Please wait...

Thank you for sign up!