GuardFall Exposes Open-Source AI Coding Agents to Decades-Old Shell Injection Risks

The safety check that is supposed to stop an AI coding agent from running a dangerous command can be walked straight past using a shell trick that has been public for decades.

New research from Adversa AI, which is named the bypass GuardFall, found it works against ten of the eleven popular open-source coding and computer-use agents the firm tested. Only one, “Continue,” was built to defend against it.

Why does it matter? These agents run shell commands with your full account access. Point one at a booby-trapped repository or software package, and a hidden instruction can quietly run a command that wipes files or steals the secrets your account can reach, from SSH keys and cloud credentials to anything sitting in your home folder.

How does it get past the guard?

Most of these agents try to stay safe by checking each command against a blocklist of dangerous patterns before running it. The flaw is that they check the command as plain text, while bash rewrites that text before it actually runs. The shell strips quotes and expands shortcuts, so the filter and the shell end up looking at two different things.

The simplest example: a filter watching for rm sees nothing wrong with r”m, because to a text matcher those are different strings. Bash removes the empty quotes and runs rm anyway.

The same idea works in other forms: a command hidden in base64 and piped into a shell, or ordinary tools like find and dd turned destructive with the right flag.

The researchers call this not a bug but “a dangerous convention and a class of problems,” which is why adding more blocklist patterns fixes none of it. There is no single CVE to track or patch.

Two things have to line up for an attack to land, and neither is exotic.

  • First, the AI has to produce the malicious command. A blunt “run rm -rf” is usually refused, but the same command tucked inside normal-looking work, such as a build file or a tool’s “documentation” reply, gets emitted as a routine step.
  • Second, the agent has to be running on its own, with an auto-execute flag turned on or its container sandbox switched off, both of which are routine in automated pipelines. The live tests used Claude Sonnet 4.6.

The other ten tools all left the gap open: opencode, Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands, SWE-agent, and the Hermes project, where the bug first surfaced and is documented in Hermes’s own issue tracker.

The tools in Adversa’s survey together carried roughly 548,000 GitHub stars as of May 2026. Adversa demonstrated the full attack end-to-end against the production Plandex binary, and the same shape worked against eight others. It describes the work as lab research; no public exploitation has been reported.

Continue, the one agent that held up, defends by reading the command the way bash will before deciding: it breaks the command into the same pieces the shell would, checks what actually runs, and keeps a hard list of destructive commands that are blocked outright.

That protection held against every payload in Continue’s default editor mode. Its command-line auto-run mode is weaker: a few payloads slipped through, though the most destructive ones still hit the hard block. Adversa calls the design portable and says re-implementing it is roughly a two-day job for an experienced engineer.

What to do now

None of the quick fixes is a complete answer, but they cut your exposure until a proper guard is in place:

  • Run agents with $HOME pointed at a throwaway folder, so secrets like ~/.ssh and ~/.aws are out of reach.
  • Turn off auto-execute flags such as –auto-exec, –auto-run, –auto-test, and dangerously-skip-permissions unless the job genuinely cannot pause for a human.
  • Do not let agents run on pull requests from forks, the easy path from an attacker’s file to your secrets.
  • Treat config files shipped inside a repository, like .aider.conf.yml, as untrusted code; a malicious one can trigger the attack on the first accepted edit.

GuardFall lands in the middle of a run of similar findings this year. Adversa’s own TrustFall hit Claude Code, Cursor, Gemini CLI, and Copilot CLI, and a separate deny-rule bypass hit Claude Code.

Attacks like AutoJack and Agentjacking turned poisoned content into commands that an agent runs with its owner’s privileges. The common thread is simple: untrusted text keeps reaching a real shell before the guard understands what bash will actually run.

📰 Original Source:TheHackerNews
✍️ Author: info@thehackernews.com (The Hacker News)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *