

Holiday Season Breach Attempt. No Impact. No Downtime.
Company Overview
A leading platform for return management, this company serves top-tier online retailers and is scaling rapidly. The first quarter is its most critical period, with intense transaction volumes following the holiday season. Customers expect speed and reliability; merchants demand ironclad service-level agreements for uptime, performance, and data integrity, especially during the peak post-holiday return season.
The stakes are high. Even seconds of downtime during peak season can damage customer trust and disrupt revenue.
The company's Kubernetes-based architecture enables agility but also amplifies risk when visibility gaps occur. For the senior vice president (SVP) of technology, visibility and control are top priorities during high-stakes periods. That became urgent when a serious security incident unfolded at the worst possible time.
Business Challenges
- Visibility gaps from misconfigurations and offline agents exposed critical workloads
- Operational shortcuts increased risk during peak revenue periods
- Traditional tools failed to detect advanced in-memory and kernel-level threats
- High pressure to protect customer trust with no tolerance for downtime
Company Overview
A leading platform for return management, this company serves top-tier online retailers and is scaling rapidly. The first quarter is its most critical period, with intense transaction volumes following the holiday season. Customers expect speed and reliability; merchants demand ironclad service-level agreements for uptime, performance, and data integrity, especially during the peak post-holiday return season.
The stakes are high. Even seconds of downtime during peak season can damage customer trust and disrupt revenue.
The company's Kubernetes-based architecture enables agility but also amplifies risk when visibility gaps occur. For the senior vice president (SVP) of technology, visibility and control are top priorities during high-stakes periods. That became urgent when a serious security incident unfolded at the worst possible time.
Business Challenges
- Visibility gaps from misconfigurations and offline agents exposed critical workloads
- Operational shortcuts increased risk during peak revenue periods
- Traditional tools failed to detect advanced in-memory and kernel-level threats
- High pressure to protect customer trust with no tolerance for downtime
The Gap: Misconfigurations and Missing Agents
A Perfect Storm of Vulnerability
It began with what seemed like a routine change. During a maintenance window, a Kubernetes service was taken offline and the Sysdg agent responsible for runtime security was disabled. Perimeter protections were intact, but with the environment now blind to runtime activity, attackers found their window.
Unbeknownst to the team, malicious reconnaissance tools were continuously sweeping the internet for just this kind of oversight. The exposed workload, running PHP-FPM, was a known target for remote code execution. In today’s high-speed threat landscape, even minor misconfigurations can become a siren call for opportunistic adversaries scanning billions of endpoints for vulnerable openings.
Initial cryptomining attempts silently failed, likely blocked by default container constraints such as restricted write access or limited privileges. But those failures didn’t deter the attackers. They escalated, impersonating trusted agents and executing lateral movement. Ultimately, they deployed Perfctl, a stealthy rootkit designed to siphon computing resources for cryptomining while evading detection across scanners, logs, and traditional monitoring tools.
The Incident: Advanced Lateral Movement and Perfctl
A Midnight Alert from Sysdig Threat Research
While the company’s engineers had missed the initial after-hours alert, Sysdig’s Threat Research team did not. Around midnight, they flagged a high-severity alert. Attackers were masquerading as Datadog agents.
The threat escalated quickly. The attackers exploited the impersonation to pivot into a private internal namespace that should never have been exposed. Containers in that namespace were running with root privileges, and no runtime policies were in place to detect or block the intrusion. Without visibility or enforcement controls, the team initially couldn’t assess the blast radius or contain the spread.
The only viable path forward was a full rebuild. Within approximately 20 minutes, the team wiped and redeployed every pod and container, quickly removing the attackers' immediate foothold.
Post-Restoration Discovery – A Stealth Rootkit
Once Sysdig agents were restored, the full scope of the attack became clear. Telemetry revealed that the attacker had returned with Perfctl, a cryptoming rootkit engineered to hide its presence. Build-time scanners and cloud security posture management tools never saw it. Network-based intrusion detection and intrusion prevention systems missed it. And standard host and application logs – especially with containers running as root in shared namespaces – offered no insight into in-memory or kernel-level exploits. Only real-time, in-container telemetry could expose and stop this threat.
As they traced the attacker’s movements, the SVP of technology was left confronting difficult questions: Was any customer data accessed? Were multiple services compromised? Could they contain the threat without taking the platform offline?
These were high-stakes questions during the most critical revenue period of the year, when even a small misstep could carry major consequences.
The Response: Real-Time Forensics and Rule-Based Defense
Runtime Signals – Visibility in Minutes
Within seconds of being restored, alerts began streaming in that were precise, context-rich, and immediately actionable. The rules surfaced Perfctl’s behavior patterns, including specific process spawns, system calls, and lateral movement attempts. From there, the investigation unfolded with surgical precision:
- System call telemetry revealed the attacker’s techniques, ranging from pivot attempts to binary execution.
- Drift analysis identified containers that no longer matched their original baseline images.
- Forensic snapshots helped the team retrace the attacker’s path in detail.
With guidance from Sysdig’s incident response experts, the engineering team acted swiftly, wiping and redeploying pods and containers to halt the attack.
The Outcome: A Win for Secure Velocity
Operational Control. Strategic Maturity. No Customer Impact.
Despite contending with a stealthy, evasive adversary during the most critical revenue window of the year, the company emerged unscathed. There was no downtime. No impact to customers. No evidence of data exfiltration.
But the real victory wasn’t just operational, it was strategic. The incident became a vivid proof point that a small, fast-moving engineering team could detect, contain, and recover from a complex cloud-native attack without sacrificing agility or business continuity.
More importantly, it sparked lasting change.
In the days that followed, the SVP of technology and his team translated the lessons of the incident into these systemic improvements:
- Full agent coverage was restored across all critical workloads, eliminating the blind spots that had enabled the intrusion.
- Policy‑as‑code guardrails were embedded into continuous integration/continuous delivery pipelines, ensuring that Kubernetes misconfigurations and web application firewall gaps were caught before reaching production.
- Runtime protection was reinforced by tuning Falco rules to detect Perfctl‑style behaviors early.
- Drift and malware response policies were activated with automated “kill container” actions to prevent future threats.
“This incident could have undermined customer trust and our peak-season performance,” said the SVP of Technology. “Instead, with Sysdig, we contained the threat and improved our security posture without missing a beat.”
This wasn’t just a lucky escape. It was cloud security executed the right way.