NPM Supply Chain Attacks Expose Hidden Risks for AI Agents Like OpenAI Codex

On September 8, 2025, it was uncovered that attackers launched a phishing campaign using a fake domain (npmjs.help) to compromise a maintainer account. Once inside, they injected malicious code into widely-used packages, including chalk, debug, and ansistyles. The injected code targeted client-side environments, monitoring for cryptocurrency transactions in browsers to redirect funds to attacker wallets.

While the September 8th incident appeared to result in only modest confirmed losses (~$1,000), just a week later another campaign surfaced that underscored the broader risks.

On September 15, reports surfaced of “Shai-Hulud,” a large-scale supply-chain attack that trojanized more than 40 npm packages, including the widely used @ctrl/tinycolor. The malicious update injected a bundle.js controller that downloads and runs TruffleHog to hunt for tokens and cloud credentials, validates and abuses any found npm/GitHub credentials, writes GitHub Actions workflows into repositories, and exfiltrates results to a hard-coded webhook. The analysis shows the malware automatically repackages and republishes downstream packages, enabling widespread, automated trojanization and persistent CI-level exfiltration.

Taken together, these incidents show how quickly malicious packages can ripple through the ecosystem — whether by targeting crypto users in the browser or stealing credentials from CI/CD systems. But the risks don’t stop there. Modern AI agents, which automatically install and execute dependencies to perform coding tasks, can unknowingly pull in poisoned packages the moment they’re published.

In our own research, we observed this in real time: AI agents like OpenAI’s Codex resolving dependencies that led directly to the installation of compromised releases.

Detecting Malicious Packages

Malicious packages can be incredibly harmful in the software supply chain, slipping past CI/CD checks and only revealing themselves once they run in production. This is exactly why we built the capability of detecting malicious packages in Oligo’s Cloud Application Detection and Response (CADR) – to continuously monitor installed, loaded, and executed libraries to surface threats that static tools miss. When the poisoned releases appeared, customers already had the ability to see it in real time and respond before any potential damage was done. Below, we show this capability in action.

Example We Found in the Wild: OpenAI’s Codex

Codex is OpenAI’s AI-powered coding assistant that automates tasks such as generating, completing, and testing code inside VSCode. Enterprises use Codex to accelerate software development by generating code, resolving dependencies, and automating routine engineering tasks, which also means that any compromise in its supply chain can rapidly propagate into production workflows.

With CADR, we detected Codex CLI AI agents in customer environments that had already installed these vulnerable dependencies shortly after they were published. Before we show you exactly what our platform uncovered, let’s take a look at why Codex was impacted by the npm supply chain attack:

Codex uses a VSCode extension called ripgrep.
Ripgrep uses a package called https-proxy-agent.
Https-proxy-agent uses debug in version “4” which results in the installation of the backdoored 4.2.2 version that is vulnerable and includes the malicious exploit code.

The impacted customers were promptly alerted of the agent’s actions, and were able to remove the vulnerable version from their environments. This example illustrates how AI agents amplify supply chain risks by inheriting vulnerable dependencies that skip manual review.

How we uncovered this

Simply reading the code of the different dependencies that Codex relies on from Github, it would seem that “debug: “4” is innocent enough. However, when we looked at what was actually running in production in our customer environments, a different story unfolded.

Oligo CADR provides visibility into the runtime reality of AI agents. We detected the compromised debug@4.2.2 package being installed due to our ability to analyze dependencies at runtime.

As demonstrated in the screenshots below, CADR:

Detects and alerts on installation, loading, and execution of any compromised libraries on server-side workloads.
Provides deep runtime visibility into compromised libraries and their execution in an environment.

‍

Impacted projects: who is impacted?

Any organization that built during the attack window likely ingested poisoned versions, including AI agents like Codex that automatically resolve and install dependencies.

In this example, this means projects using "debug": "^4" or "debug": "4" were at risk, since those specifications defaulted to the compromised 4.2.2 release. Any enterprise or organization that built or published software during that time also pulled in malicious versions. Only a small minority of projects had pinned debug to a safe version, which limited exposure.

Mapping the Attack: Techniques Used

When we map the initial npm compromise to the Application Attack Matrix, the techniques line up across multiple phases of the kill chain. This shows how attackers combined social engineering, supply-chain compromise, and malicious code execution to achieve their goal (even if the money stolen was far from a big payday).

Reconnaissance (Public Repository Discovery): Attackers identified a high-value maintainer with outsized influence in the npm ecosystem. By targeting someone who co-maintained packages alongside trusted developer Sindre Sorhus, they maximized downstream reach.
Resource Development (Backdoored Open-Source Libraries): Malicious versions of popular packages were created, with obfuscated JavaScript designed to hijack cryptocurrency transactions.
Malware Development (Browser Payload Engineering): The payload was tailored to hook browser APIs like fetch and window.ethereum, enabling real-time manipulation of crypto transactions.
Initial Access (Valid Accounts (Cloud Accounts)): A phishing campaign impersonating npm support tricked maintainer Junon into resetting two-factor authentication. With legitimate credentials, attackers gained full publishing access.
Supply Chain Compromise (Poisoned Package Updates): Using the compromised account, attackers published new versions of debug and other packages, seeding the ecosystem with poisoned code.
Execution (Dynamic Code Evaluation & Standard Application Flow): The malware executed automatically upon import, using obfuscation and dynamic evaluation to blend into normal JavaScript execution flows. No vulnerability exploitation was needed.
Persistence / Defense Evasion (Masquerading & C2 over Web Protocols): The malicious updates masqueraded as legitimate new versions, encouraging automated installs via npm. The malware communicated over ordinary web traffic, avoiding suspicion.
Collection (Internal Data Harvesting): Once active, the payload monitored browser APIs to detect wallet activity and crypto transactions.
Impact (Financial Theft & Transmitted Data Manipulation): The final stage redirected crypto transfers to attacker wallets by silently rewriting transaction payloads before they reached the blockchain.

This mapping shows how the attackers combined multiple techniques across the kill chain, from reconnaissance to impact, to maximize their reach. By viewing the incident through the matrix lens, we can better understand both the sophistication of the campaign and the specific gaps enterprises need to close.

Gaps in Existing Security Tools

The fundamental gaps for many enterprises today come down to the fact that most rely on SBOMs, SCA, or static scanners to track open-source dependencies. These tools identify what’s declared in manifests, but they don’t always reveal what’s actually running. The incident with debug@4.2.2 illustrates the gap:

Projects listed "debug": "4" or "debug": "^4" in their package.json. On paper, that looks safe.
In reality, npm resolved that range to the malicious debug@4.2.2.
Static tools couldn’t distinguish between a safe subversion and the backdoored one.

Because most projects specify dependencies with loose ranges (for example, "debug": "^4"), rather than pinning to a specific version or commit, they automatically pulled in the poisoned 4.2.2 release when it was published. And the only way to know that a workload was actually running the compromised release was by observing it in runtime.

For AI agents like Codex, this blind spot is even more critical. These agents install and execute dependencies automatically, without human review. That means they may pull in poisoned packages the instant they’re published, and traditional tools won’t flag the risk until long after execution.

Closing thoughts

These incidents are a reminder that securing the software supply chain isn’t just about what a manifest declares or what a static scan reports. It’s about what actually runs. Dependency ranges that look harmless on paper can resolve to poisoned versions the moment they’re published, and AI agents install and execute them automatically.

The only way to uncover that reality is through runtime visibility. By monitoring which packages are installed, loaded, and executed in live environments, enterprises can see the difference between theoretical risk and actual risk. In an era where AI agents and automated pipelines are writing, testing, and shipping code at scale, runtime is not just the final mile of supply-chain security: it’s the source of truth of what you need to protect.

The Hidden Risks of the NPM Supply Chain Attacks: AI Agents

Overview

Detecting Malicious Packages

Example We Found in the Wild: OpenAI’s Codex

How we uncovered this

Impacted projects: who is impacted?

Mapping the Attack: Techniques Used

Gaps in Existing Security Tools

Closing thoughts

expert tips

The Future of Cloud Security is Runtime

Why EDR Missed the GeoServer Exploit: The Case for Cloud Application Detection & Response (CADR)

Tackling the Top CWEs from CISA’s KEV List with Oligo

Built to Defend Modern & Legacy apps

Overview

Detecting Malicious Packages

Example We Found in the Wild: OpenAI’s Codex

How we uncovered this

Impacted projects: who is impacted?

Mapping the Attack: Techniques Used

Gaps in Existing Security Tools

Closing thoughts

expert tips

Related Posts

The Future of Cloud Security is Runtime

Why EDR Missed the GeoServer Exploit: The Case for Cloud Application Detection & Response (CADR)

Tackling the Top CWEs from CISA’s KEV List with Oligo

Subscribe and get the latest security updates

Built to Defend Modern & Legacy apps