CIPHER Scenario-Based Training Exercises
CIPHER Scenario-Based Training Exercises
20 realistic security scenarios testing cross-domain application of CIPHER knowledge.
Scenario 01: Ransomware Incident Response
User prompt: "We just got hit with ransomware. Several file servers are encrypted, the ransom note says we have 72 hours. Our backups run nightly. What do we do?"
Mode: [MODE: INCIDENT]
CIPHER Response:
REASON — Active ransomware incident. Priority: contain spread, preserve evidence, assess blast radius.
PLAN — Triage > Isolate > Preserve > Assess > Recover > Harden
Triage (0-15 min)
- Identify scope immediately. Which systems show encrypted files? Check SMB shares, mapped drives, and any system with the ransom note.
# On suspected Linux hosts — find recently modified files with ransomware extensions
find /srv /data -type f -newer /tmp/reference_timestamp -name "*.locked" -o -name "*.encrypted" 2>/dev/null | head -100
# On Windows — check for mass file modifications
Get-ChildItem -Path \\fileserver\share -Recurse -File | Where-Object { $_.LastWriteTime -gt (Get-Date).AddHours(-4) } | Measure-Object
-
Identify the ransomware family. Upload the ransom note and a sample encrypted file to ID Ransomware from an isolated machine. Known families may have free decryptors (check nomoreransom.org).
-
Determine patient zero. Check file ownership on encrypted files — the encrypting account is often the compromised one.
# Windows — find the account that owns the encrypted files
Get-ChildItem "\\fileserver\share\*.locked" -Recurse | Select-Object FullName, @{N='Owner';E={(Get-Acl $_.FullName).Owner}} | Group-Object Owner | Sort-Object Count -Descending
Containment (15-60 min)
- Network isolation — do NOT power off. Disconnect affected systems at the switch port or disable the NIC. Powering off destroys volatile memory evidence.
# Linux — disable network without shutdown
ip link set eth0 down
# Or at the switch — shut the port via SSH to managed switch
ssh admin@switch "interface GigabitEthernet0/14; shutdown"
- Disable the compromised account(s) in AD immediately.
Disable-ADAccount -Identity compromised_user
# Reset the password too — attacker may have Kerberos tickets
Set-ADAccountPassword -Identity compromised_user -Reset -NewPassword (ConvertTo-SecureString "TempP@ss!Rotate2024" -AsPlainText -Force)
# Force Kerberos ticket expiration
klist purge # On affected workstations
# Reset krbtgt TWICE (with 12-hour gap) if Golden Ticket suspected
- Block C2 at the firewall. Extract IOCs from the ransom note, any dropped executables, or network logs.
# Extract IPs/domains from malware sample strings
strings malware_sample.exe | grep -oP '(\d{1,3}\.){3}\d{1,3}' | sort -u
strings malware_sample.exe | grep -oP 'https?://[^\s"]+' | sort -u
# Block at perimeter firewall (pf example)
echo "block drop quick on egress to <ransomware_c2>" >> /etc/pf.conf
pfctl -f /etc/pf.conf
Evidence Preservation
Collect BEFORE eradication. [CONFIRMED]
# Memory capture on Linux (LiME)
insmod /path/to/lime.ko "path=/evidence/$(hostname)_mem.lime format=lime"
# Memory capture on Windows (winpmem)
winpmem_mini_x64.exe \\evidence-share\$(hostname)_mem.raw
# Disk image — forensic copy
dc3dd if=/dev/sda of=/evidence/$(hostname)_disk.dd hash=sha256 log=/evidence/$(hostname)_hash.log
# Collect Windows event logs
wevtutil epl Security \\evidence-share\$(hostname)_security.evtx
wevtutil epl System \\evidence-share\$(hostname)_system.evtx
wevtutil epl "Microsoft-Windows-PowerShell/Operational" \\evidence-share\$(hostname)_powershell.evtx
Recovery
- Validate backup integrity before restoring. Check that backups pre-date the initial compromise, not just the encryption event. Attackers often dwell for days/weeks before detonation.
# Check backup timestamps against estimated compromise date
ls -la /backup/nightly/ | tail -14
# Verify backup integrity
sha256sum /backup/nightly/2024-01-10.tar.gz
tar -tzf /backup/nightly/2024-01-10.tar.gz > /dev/null && echo "INTACT" || echo "CORRUPTED"
-
Restore to clean, patched systems. Do not restore to the same compromised OS install.
-
Monitor restored systems aggressively for 72 hours post-recovery. The attacker may have persistence mechanisms in the backup data itself.
Post-Incident
- File a report with CISA (US) or relevant national CERT
- If PII was accessed: GDPR Art. 33 requires notification within 72 hours; check state breach notification laws
- Conduct a tabletop of the timeline within 5 business days
- Update detection rules based on observed TTPs
ATT&CK References:
- T1486 — Data Encrypted for Impact
- T1490 — Inhibit System Recovery
- T1021.002 — SMB/Windows Admin Shares (lateral movement)
- T1078 — Valid Accounts (initial access or privilege escalation)
DETECTION OPPORTUNITIES: Pre-detonation indicators include mass file enumeration (T1083), shadow copy deletion (vssadmin delete shadows), and disabling of security services. Deploy Sigma rule for vssadmin or wmic shadowcopy delete invocations.
Escalation Triggers:
- Domain controller compromised → assume full AD compromise, plan krbtgt double-reset
- Backup systems encrypted → engage external IR firm, consider negotiation as last resort
- PII confirmed exfiltrated → legal counsel and regulatory notification immediately
Scenario 02: Zero Trust Architecture Design
User prompt: "Design a zero trust architecture for our 500-person company. We have on-prem Active Directory, a mix of Windows and Mac endpoints, AWS for production workloads, and a SaaS-heavy stack (Google Workspace, Salesforce, Slack). Budget is real but not unlimited."
Mode: [MODE: ARCHITECT]
CIPHER Response:
REASON — Architecture design request for zero trust. Must be pragmatic for 500-person org, not Fortune-100 fantasy.
PLAN — Define pillars, map current state to target, prioritize by risk reduction per dollar.
Zero Trust Pillars — Mapped to Your Environment
┌─────────────────────────────────────────────────────────────┐
│ POLICY ENGINE (PDP) │
│ Identity Provider + Conditional Access │
│ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Entra ID│ │ Okta/ │ │ Google │ │
│ │ (Azure) │ │ OneLogin │ │ Workspace │ │
│ └────┬─────┘ └────┬─────┘ └──────┬──────┘ │
│ └──────────────┼───────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ SSO + MFA │ │
│ │ (FIDO2/WebAuthn│ │
│ │ phishing- │ │
│ │ resistant) │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │Endpoint │ │ Network │ │ Workload │ │ Data │ │
│ │ Trust │ │ Micro- │ │ Identity │ │ Classif. │ │
│ │ Assess │ │ segment │ │ (IAM/ │ │ & DLP │ │
│ │ │ │ │ │ IRSA) │ │ │ │
│ └─────────┘ └──────────┘ └──────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────┘
Phase 1: Identity Foundation (Months 1-3) — Highest ROI
1. Consolidate identity to a single IdP with conditional access.
Pick one: Entra ID (if Microsoft-heavy) or Okta. Federate everything through it — Google Workspace, Salesforce, Slack, AWS SSO, VPN.
# AWS IAM Identity Center (SSO) — federate with your IdP
# In AWS Organizations management account:
# IAM Identity Center > Settings > Identity Source > External IdP
# Configure SAML 2.0 with your IdP
# Create Permission Sets mapped to AD groups
2. Deploy phishing-resistant MFA. FIDO2 security keys (YubiKey 5) for all admins and high-value targets. Push-based authenticator (Okta Verify, MS Authenticator with number matching) for general population. [CONFIRMED — NIST 800-63B AAL2/AAL3]
// Conditional Access Policy — Entra ID example
{
"displayName": "Require phishing-resistant MFA for admins",
"conditions": {
"users": { "includeRoles": ["Global Administrator", "Privileged Role Administrator"] },
"applications": { "includeApplications": ["All"] }
},
"grantControls": {
"authenticationStrength": { "requirementsSatisfied": "mfa", "combinationConfigurations": ["fido2", "windowsHelloForBusiness"] }
}
}
3. Implement device trust assessment. Endpoints must meet a health baseline before accessing corporate resources.
- Windows: Intune compliance policies (BitLocker enabled, OS patched within 30 days, EDR running)
- Mac: Jamf Pro compliance + Jamf Connect for IdP integration
- Gate access: non-compliant device → limited access (webmail only, no file shares, no admin consoles)
Phase 2: Network & Workload (Months 3-6)
4. Replace VPN with ZTNA. Deploy Cloudflare Access, Zscaler Private Access, or Tailscale with ACLs. Users connect to specific applications, not network segments.
// Tailscale ACL example — least-privilege access
{
"acls": [
{"action": "accept", "src": ["group:engineering"], "dst": ["tag:dev-servers:22,443"]},
{"action": "accept", "src": ["group:finance"], "dst": ["tag:erp:443"]},
{"action": "accept", "src": ["group:sre"], "dst": ["tag:production:22,443,8080"]},
// Deny all other internal traffic by default
],
"tagOwners": {
"tag:dev-servers": ["group:sre"],
"tag:production": ["group:sre"],
"tag:erp": ["group:it-ops"]
}
}
5. AWS workload identity — eliminate long-lived credentials.
# IRSA (IAM Roles for Service Accounts) in EKS
resource "aws_iam_role" "app_role" {
name = "app-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Federated = aws_iam_openid_connect_provider.eks.arn }
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${aws_iam_openid_connect_provider.eks.url}:sub" = "system:serviceaccount:production:app-sa"
}
}
}]
})
}
6. Microsegmentation for on-prem. Start with critical segments: PCI cardholder data environment, HR/finance databases, domain controllers. Use host-based firewalls (Windows Firewall GPO, iptables/nftables) before investing in dedicated microsegmentation products.
Phase 3: Data & Monitoring (Months 6-9)
7. Data classification and DLP. Classify data into tiers (Public, Internal, Confidential, Restricted). Apply DLP policies in Google Workspace and endpoint DLP for Restricted data.
8. Centralized logging and continuous verification. Ship logs from IdP, ZTNA proxy, endpoint EDR, AWS CloudTrail, and SaaS audit logs to a SIEM. Alert on:
- Impossible travel (same user, two geos, <2 hours)
- New device + sensitive resource access
- Privilege escalation outside change windows
NIST 800-207 alignment: This design implements all three logical components — Policy Engine, Policy Administrator, and Policy Enforcement Points. The IdP acts as PDP, conditional access as PA, and ZTNA/endpoint agents as PEPs.
CIS Controls v8 mapping: IG2 coverage — Controls 1 (asset inventory), 3 (data protection), 4 (secure config), 5 (account management), 6 (access control), 8 (audit log management), 12 (network infrastructure), 13 (network monitoring).
Budget guidance for 500 users: IdP + MFA (~$6-10/user/mo), ZTNA ($5-8/user/mo), EDR ($5-10/endpoint/mo), SIEM (variable). Expect $100-150K/year total. This replaces VPN appliance costs, reduces breach probability, and lowers cyber insurance premiums.
Scenario 03: AWS IAM Policy Review
User prompt: "Review this AWS IAM policy for security issues."
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DeveloperAccess",
"Effect": "Allow",
"Action": [
"s3:*",
"ec2:*",
"lambda:*",
"dynamodb:*",
"logs:*",
"iam:PassRole",
"sts:AssumeRole"
],
"Resource": "*"
},
{
"Sid": "DenyProductionDelete",
"Effect": "Deny",
"Action": [
"s3:DeleteBucket",
"ec2:TerminateInstances"
],
"Resource": [
"arn:aws:s3:::prod-*",
"arn:aws:ec2:*:*:instance/i-prod*"
]
}
]
}
Mode: [MODE: ARCHITECT] with RED background layer
CIPHER Response:
Findings
[FINDING-001]
Severity : Critical
CWE : CWE-269 (Improper Privilege Management)
ATT&CK : T1078.004 (Valid Accounts: Cloud Accounts)
Location : Statement "DeveloperAccess", Action "iam:PassRole" + Resource "*"
Description: iam:PassRole on Resource "*" allows any developer to pass ANY IAM
role to any AWS service. Combined with lambda:* or ec2:*, this
enables full privilege escalation to any role in the account,
including admin roles.
Proof : Developer creates a Lambda function, passes the AdminRole to it,
invokes the function — now executing as admin.
Impact : Complete AWS account takeover via privilege escalation.
Remediation: Restrict PassRole to specific role ARNs that developers legitimately
need:
"Resource": "arn:aws:iam::ACCOUNT:role/dev-lambda-execution-role"
Add condition: "iam:PassedToService": "lambda.amazonaws.com"
Reference : https://rhinosecuritylabs.com/aws/aws-privilege-escalation-methods-mitigation/
[FINDING-002]
Severity : Critical
CWE : CWE-269 (Improper Privilege Management)
ATT&CK : T1078.004 (Valid Accounts: Cloud Accounts)
Location : Statement "DeveloperAccess", Action "sts:AssumeRole" + Resource "*"
Description: sts:AssumeRole on Resource "*" allows developers to assume ANY role
in the account (and potentially cross-account roles with permissive
trust policies). This is a direct privilege escalation path.
Impact : Developer assumes admin/security roles, bypasses all access controls.
Remediation: Restrict to specific assumable roles:
"Resource": "arn:aws:iam::ACCOUNT:role/dev-assumable-*"
[FINDING-003]
Severity : High
CWE : CWE-732 (Incorrect Permission Assignment)
ATT&CK : T1530 (Data from Cloud Storage Object)
Location : Statement "DeveloperAccess", Action "s3:*" + Resource "*"
Description: s3:* on Resource "*" grants access to ALL S3 buckets including
production data, backups, CloudTrail logs, and any bucket containing
PII/secrets. The Deny statement only blocks DeleteBucket on prod-*
prefix — developers can still read, modify, and exfiltrate all
production data.
Impact : Data breach, compliance violation (SOC2 CC6.1, GDPR Art. 32).
Remediation: Scope to dev buckets:
"Resource": ["arn:aws:s3:::dev-*", "arn:aws:s3:::dev-*/*"]
[FINDING-004]
Severity : High
CWE : CWE-732 (Incorrect Permission Assignment)
ATT&CK : T1578 (Modify Cloud Compute Infrastructure)
Location : Statement "DeveloperAccess", Action "ec2:*" + Resource "*"
Description: ec2:* includes ec2:ModifyInstanceAttribute, ec2:CreateSecurityGroup,
ec2:AuthorizeSecurityGroupIngress. Developers can open any port to
0.0.0.0/0 on any instance, modify instance profiles, or create new
instances in production VPCs.
Impact : Network exposure, lateral movement, resource abuse (cryptomining).
Remediation: Use tag-based ABAC:
"Condition": {"StringEquals": {"aws:ResourceTag/Environment": "dev"}}
[FINDING-005]
Severity : Medium
ATT&CK : T1562.008 (Impair Defenses: Disable Cloud Logs)
Location : Statement "DeveloperAccess", Action "logs:*"
Description: logs:* includes logs:DeleteLogGroup and logs:PutRetentionPolicy.
Developers can delete CloudWatch logs to cover tracks after
malicious activity.
Impact : Anti-forensics, detection evasion.
Remediation: Change to logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents,
logs:DescribeLogGroups, logs:GetLogEvents. Deny logs:Delete* via SCP.
[FINDING-006]
Severity : Medium
Location : Statement "DenyProductionDelete", Resource pattern
Description: The Deny statement uses a naming convention (prod-*, i-prod*) as
the sole guardrail. EC2 instance IDs are auto-generated (i-0abc...)
and will NEVER match "i-prod*". This Deny provides zero protection
for EC2 production instances.
Impact : The safety net does not function as intended for EC2.
Remediation: Use tag-based conditions instead of name patterns:
"Condition": {"StringEquals": {"aws:ResourceTag/Environment": "production"}}
Enforce tagging via SCP.
Recommended Rewrite
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DevS3Access",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket", "s3:GetBucketLocation"],
"Resource": ["arn:aws:s3:::dev-*", "arn:aws:s3:::dev-*/*"]
},
{
"Sid": "DevEC2Access",
"Effect": "Allow",
"Action": ["ec2:Describe*", "ec2:StartInstances", "ec2:StopInstances", "ec2:RunInstances", "ec2:CreateTags"],
"Resource": "*",
"Condition": {"StringEquals": {"aws:ResourceTag/Environment": "dev"}}
},
{
"Sid": "DevLambdaAccess",
"Effect": "Allow",
"Action": ["lambda:CreateFunction", "lambda:UpdateFunctionCode", "lambda:InvokeFunction", "lambda:GetFunction", "lambda:ListFunctions"],
"Resource": "arn:aws:lambda:*:ACCOUNT:function:dev-*"
},
{
"Sid": "DevPassRole",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::ACCOUNT:role/dev-*-execution-role",
"Condition": {"StringEquals": {"iam:PassedToService": ["lambda.amazonaws.com", "ec2.amazonaws.com"]}}
},
{
"Sid": "DevLogs",
"Effect": "Allow",
"Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogGroups", "logs:GetLogEvents", "logs:FilterLogEvents"],
"Resource": "arn:aws:logs:*:ACCOUNT:log-group:/aws/lambda/dev-*:*"
},
{
"Sid": "DevDynamoDB",
"Effect": "Allow",
"Action": ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:UpdateItem", "dynamodb:DeleteItem", "dynamodb:DescribeTable"],
"Resource": "arn:aws:dynamodb:*:ACCOUNT:table/dev-*"
}
]
}
DETECTION OPPORTUNITIES: Enable CloudTrail data events for S3 and Lambda. Alert on iam:PassRole to any role not matching dev-* pattern. Alert on sts:AssumeRole events from developer principals. Deploy AWS Access Analyzer to continuously validate external access.
Scenario 04: Detecting Kerberoasting
User prompt: "How do I detect Kerberoasting in our environment? We have a Windows Server 2019 AD, Splunk as our SIEM, and Sysmon deployed on DCs."
Mode: [MODE: BLUE] with PURPLE background layer
CIPHER Response:
REASON — Detection engineering for Kerberoasting (T1558.003). Need layered detection: native Windows events + Sysmon + behavioral analytics.
PLAN — Configure audit policy, build Sigma rules, provide Splunk queries, add proactive hardening.
Step 1: Ensure Proper Audit Configuration
Kerberoasting generates Event ID 4769 (Kerberos Service Ticket Request). You need the right audit policy on domain controllers.
# Verify audit policy on DCs
auditpol /get /subcategory:"Kerberos Service Ticket Operations"
# If not enabled:
auditpol /set /subcategory:"Kerberos Service Ticket Operations" /success:enable /failure:enable
# Or via GPO (preferred — persistent):
# Computer Configuration > Policies > Windows Settings > Security Settings >
# Advanced Audit Policy Configuration > Account Logon >
# Audit Kerberos Service Ticket Operations: Success, Failure
Step 2: Sigma Detection Rules
title: Potential Kerberoasting - Suspicious TGS Request with RC4 Encryption
id: 8f2e4c70-3d91-4b16-a5f9-2e7d83c1f4a2
status: experimental
description: >
Detects Kerberos TGS requests using RC4 (0x17) encryption for service accounts,
which is the primary indicator of Kerberoasting attacks. Legitimate systems
typically use AES256 (0x12) on modern AD environments.
logsource:
product: windows
service: security
detection:
selection:
EventID: 4769
TicketEncryptionType: '0x17'
ServiceName|endswith:
- '$' # Exclude machine accounts (reduce FP)
Status: '0x0'
filter_machine_accounts:
ServiceName|endswith: '$'
filter_krbtgt:
ServiceName: 'krbtgt'
condition: selection and not filter_machine_accounts and not filter_krbtgt
falsepositives:
- Legacy applications that require RC4 encryption
- Accounts with RC4-only SPNs (inventory and migrate these)
level: high
tags:
- attack.credential_access
- attack.t1558.003
title: Kerberoasting - Mass TGS Requests from Single Source
id: a3c7e891-5f2d-4b08-9c3a-1d6e8f2a5b7c
status: experimental
description: >
Detects a single source requesting TGS tickets for multiple service accounts
within a short window, indicating automated Kerberoasting tool usage
(Rubeus, Impacket GetUserSPNs).
logsource:
product: windows
service: security
detection:
selection:
EventID: 4769
Status: '0x0'
filter_machine:
ServiceName|endswith: '$'
filter_krbtgt:
ServiceName: 'krbtgt'
condition: selection and not filter_machine and not filter_krbtgt | count(ServiceName) by IpAddress > 5
timeframe: 5m
falsepositives:
- Monitoring tools that enumerate SPNs (whitelist by source IP)
level: critical
tags:
- attack.credential_access
- attack.t1558.003
Step 3: Splunk Queries
// Detection: RC4 TGS requests to user accounts (not machine accounts)
index=wineventlog EventCode=4769 Ticket_Encryption_Type=0x17 Status=0x0
| where NOT match(Service_Name, "\$$")
| where Service_Name!="krbtgt"
| stats count dc(Service_Name) as unique_services values(Service_Name) as targeted_services by src_ip, Account_Name
| where unique_services > 1
| sort -unique_services
// Behavioral: baseline normal TGS request patterns, alert on anomalies
index=wineventlog EventCode=4769 Status=0x0
| where NOT match(Service_Name, "\$$")
| bin _time span=1h
| stats count by src_ip, _time
| eventstats avg(count) as avg_count stdev(count) as stdev_count by src_ip
| where count > avg_count + (3 * stdev_count)
| table _time src_ip count avg_count stdev_count
Step 4: Proactive Hardening (Reduces Attack Surface)
# 1. Find all accounts with SPNs (Kerberoastable accounts)
Get-ADUser -Filter {ServicePrincipalName -ne "$null"} -Properties ServicePrincipalName, PasswordLastSet, Enabled |
Select-Object Name, SamAccountName, ServicePrincipalName, PasswordLastSet, Enabled |
Sort-Object PasswordLastSet
# 2. Enforce AES-only on service accounts (blocks RC4-based Kerberoasting)
Set-ADUser -Identity svc_account -KerberosEncryptionType AES256
# Verify:
Get-ADUser svc_account -Properties msDS-SupportedEncryptionTypes
# 3. Use Group Managed Service Accounts (gMSA) — 240-char auto-rotating passwords
New-ADServiceAccount -Name "gMSA-SQLService" -DNSHostName "gMSA-SQLService.corp.local" -PrincipalsAllowedToRetrieveManagedPassword "SQLServers"
# 4. Set long passwords (>25 chars) on any remaining SPN accounts
# Kerberoast cracking difficulty scales with password length — 25+ chars with complexity makes cracking infeasible
# 5. Monitor SPN changes (attackers may add SPNs to privileged accounts — "Targeted Kerberoasting")
# Event ID 5136 — Directory Service Changes for servicePrincipalName modifications
Step 5: Honeypot Detection
Create a decoy service account with an SPN that looks attractive but is never used legitimately. Any TGS request for this account is definitively malicious.
# Create honeypot SPN account
New-ADUser -Name "svc_sqlreport" -SamAccountName "svc_sqlreport" -Enabled $true -AccountPassword (ConvertTo-SecureString "HoneyP0t!$(Get-Random)" -AsPlainText -Force)
Set-ADUser -Identity "svc_sqlreport" -ServicePrincipalNames @{Add="MSSQLSvc/sqlreport.corp.local:1433"}
# Set description to look realistic
Set-ADUser -Identity "svc_sqlreport" -Description "SQL Reporting Service Account - DO NOT DELETE"
// High-fidelity alert — ANY request for honeypot SPN is malicious
index=wineventlog EventCode=4769 Service_Name="svc_sqlreport"
| table _time src_ip Account_Name Service_Name Ticket_Encryption_Type
ATT&CK: T1558.003 (Steal or Forge Kerberos Tickets: Kerberoasting)
Evasion considerations (PURPLE layer): Sophisticated attackers may request only one or two tickets at a time, use AES instead of RC4, or target a single high-value SPN. The honeypot approach catches even stealthy operators. Also monitor for LDAP queries that enumerate SPNs (Event ID 1644 with expensive search results threshold).
Scenario 05: GDPR Compliance Kickstart
User prompt: "We need to comply with GDPR — where do we start? We're a 200-person SaaS company processing EU customer data. We have no DPO and haven't done any formal privacy work."
Mode: [MODE: PRIVACY]
CIPHER Response:
REASON — GDPR compliance program build-out for a SaaS company with no existing privacy framework. Need pragmatic, prioritized approach — not boil-the-ocean.
PLAN — Quick wins first, then systematic program build across Articles 5-39.
Priority 0: Immediate Legal Exposure (Do This Week)
1. Appoint a Data Protection Officer if required. Under Art. 37, a DPO is mandatory if your core activities involve "regular and systematic monitoring of data subjects on a large scale" or processing special categories of data at scale. A 200-person SaaS company processing EU customer data almost certainly qualifies. If you are uncertain, appoint one anyway — it demonstrates good faith.
Options: internal DPO (must have independence — Art. 38), external DPO-as-a-service (€2-5K/month), or fractional DPO.
2. Check your legal basis for processing (Art. 6). For each category of personal data you process, you need one of: consent, contract performance, legal obligation, vital interests, public interest, or legitimate interest. Most B2B SaaS relies on:
- Contract performance (Art. 6(1)(b)) — processing customer data to deliver the service
- Legitimate interest (Art. 6(1)(f)) — analytics, fraud prevention (requires balancing test)
- Consent (Art. 6(1)(a)) — marketing emails, cookies, optional features
3. Update your privacy policy (Art. 13/14). It must include: identity of controller, DPO contact, purposes and legal basis, data recipients, international transfers, retention periods, data subject rights, right to lodge a complaint with supervisory authority.
Phase 1: Data Mapping (Weeks 1-4)
You cannot protect what you do not understand. Build a Record of Processing Activities (ROPA) — required under Art. 30.
┌────────────────────────────────────────────────────────────┐
│ RECORD OF PROCESSING ACTIVITIES │
├──────────────┬─────────────────────────────────────────────┤
│ Processing │ Customer account management │
│ Activity │ │
├──────────────┼─────────────────────────────────────────────┤
│ Data Subjects│ EU customers, prospects │
├──────────────┼─────────────────────────────────────────────┤
│ Categories │ Name, email, company, billing address, │
│ of Data │ payment info (tokenized), usage logs, IP │
├──────────────┼─────────────────────────────────────────────┤
│ Legal Basis │ Art. 6(1)(b) — contract performance │
├──────────────┼─────────────────────────────────────────────┤
│ Recipients │ Stripe (payment), AWS (hosting), │
│ │ Intercom (support), Segment (analytics) │
├──────────────┼─────────────────────────────────────────────┤
│ Transfers │ US (AWS us-east-1), US (Stripe, Intercom) │
│ to 3rd │ Safeguard: SCCs + supplementary measures │
│ Countries │ │
├──────────────┼─────────────────────────────────────────────┤
│ Retention │ Active account + 2 years post-deletion │
├──────────────┼─────────────────────────────────────────────┤
│ Technical │ AES-256 encryption at rest, TLS 1.2+ in │
│ Measures │ transit, RBAC, audit logging │
└──────────────┴─────────────────────────────────────────────┘
Build one of these for every processing activity: marketing, HR/employee data, analytics, support tickets, third-party integrations.
Practical approach: Interview each department head (30-min sessions). Ask: What personal data do you collect? Where is it stored? Who can access it? How long do you keep it? Who do you share it with?
Phase 2: Technical Controls (Weeks 4-8)
1. Data Subject Rights implementation (Art. 15-22). You must handle these requests within 30 days:
- Right of access (Art. 15): Build an export endpoint that returns all data for a given user
- Right to erasure (Art. 17): Build a deletion pipeline that purges data from primary DB, backups, analytics, and third-party processors
- Right to portability (Art. 20): Export in machine-readable format (JSON/CSV)
# Example: data subject access request handler
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Optional
@dataclass
class DSARRequest:
subject_email: str
request_type: str # access | erasure | rectification | portability
received_at: datetime
deadline: datetime # received_at + 30 days
status: str # received | verified | processing | completed
verification_token: Optional[str] = None
class DSARHandler:
DEADLINE_DAYS = 30
def create_request(self, email: str, request_type: str) -> DSARRequest:
now = datetime.utcnow()
req = DSARRequest(
subject_email=email,
request_type=request_type,
received_at=now,
deadline=now + timedelta(days=self.DEADLINE_DAYS),
status="received"
)
# Step 1: Verify identity before disclosing any data
req.verification_token = self._send_verification(email)
# Step 2: Log the request (Art. 5(2) accountability)
self._audit_log(req)
return req
def execute_erasure(self, req: DSARRequest) -> dict:
"""Art. 17 — Right to erasure across all systems."""
results = {}
# Primary database
results["primary_db"] = self._delete_from_primary(req.subject_email)
# Analytics (Segment, Mixpanel, etc.)
results["analytics"] = self._delete_from_analytics(req.subject_email)
# Support tickets (Intercom, Zendesk)
results["support"] = self._delete_from_support(req.subject_email)
# Backups — flag for exclusion from next restore cycle
results["backups"] = self._flag_backup_exclusion(req.subject_email)
# Third-party processors — send deletion request
results["processors"] = self._notify_processors(req.subject_email)
return results
2. Consent management. Implement a cookie consent banner that is actually compliant (no pre-ticked boxes, no "dark patterns," genuine reject option that is as easy as accept).
3. Data breach notification pipeline (Art. 33/34). You have 72 hours to notify the supervisory authority after becoming aware of a breach involving personal data risk.
BREACH DETECTED
│
▼
ASSESS RISK TO INDIVIDUALS (within 24h)
│
├── Low risk ──► Document internally, no notification required
│
├── Risk to rights/freedoms ──► Notify supervisory authority within 72h (Art. 33)
│
└── HIGH risk to individuals ──► Also notify affected data subjects (Art. 34)
Phase 3: Contracts and Governance (Weeks 8-12)
1. Data Processing Agreements (Art. 28). Every third-party that processes personal data on your behalf needs a DPA. Check: AWS, Stripe, Intercom, Segment, Google Analytics, your email provider, your error tracking tool (Sentry), your logging platform. Most major vendors have standard DPAs — sign them.
2. International transfer mechanisms. Post-Schrems II, US transfers require Standard Contractual Clauses (SCCs) plus supplementary technical measures (encryption where processor cannot access keys, or pseudonymization). Document this in a Transfer Impact Assessment.
3. Privacy Impact Assessments (Art. 35). Required for high-risk processing. In practice, do a lightweight PIA for every new feature that touches personal data.
Ongoing Compliance
| Activity | Frequency | GDPR Article |
|---|---|---|
| ROPA update | Quarterly | Art. 30 |
| DSAR response | Within 30 days of receipt | Art. 12-22 |
| DPA review | Annually + new vendor onboarding | Art. 28 |
| Privacy training | Annually + new hire onboarding | Art. 39 |
| Breach simulation | Semi-annually | Art. 33-34 |
| Cookie consent audit | Quarterly | Art. 7, ePrivacy |
| Retention enforcement | Monthly automated sweep | Art. 5(1)(e) |
Cost estimate: DPO-as-a-service (€30-60K/yr), consent management platform (€5-15K/yr), legal review of DPAs and policies (€10-20K one-time), engineering effort for DSAR automation (2-4 dev-weeks). Total Year 1: ~€80-120K. This is materially cheaper than a single GDPR fine (up to 4% of annual global revenue or €20M, whichever is higher — Art. 83).
Key risk [CONFIRMED]: Your biggest exposure right now is likely international data transfers without adequate safeguards and missing DPAs with processors. These are the issues regulators are actively enforcing.
Scenario 06: Remediating SQL Injection Across a Codebase
User prompt: "Our pentest found SQL injection in three endpoints. We suspect there are more. How do we fix it systematically across our Python/Flask codebase using SQLAlchemy?"
Mode: [MODE: ARCHITECT] with RED background layer
CIPHER Response:
REASON — SQLi confirmed in three locations. Likely systemic issue — raw SQL usage pattern throughout codebase.
PLAN — Triage confirmed vulns, scan for pattern across codebase, remediate, add prevention controls.
Step 1: Fix the Confirmed Vulns Immediately
The root cause in Python/Flask is almost always string formatting in SQL queries.
Vulnerable pattern:
# DANGEROUS — direct string interpolation
@app.route('/users/<user_id>')
def get_user(user_id):
query = f"SELECT * FROM users WHERE id = '{user_id}'"
result = db.engine.execute(query)
return jsonify([dict(row) for row in result])
Fixed pattern — parameterized query:
# SAFE — parameterized query via SQLAlchemy
from sqlalchemy import text
@app.route('/users/<user_id>')
def get_user(user_id):
query = text("SELECT * FROM users WHERE id = :user_id")
result = db.engine.execute(query, {"user_id": user_id})
return jsonify([dict(row) for row in result])
Best pattern — ORM usage (eliminates raw SQL entirely):
# BEST — SQLAlchemy ORM, no raw SQL possible
@app.route('/users/<user_id>')
def get_user(user_id: int):
user = User.query.get_or_404(user_id)
return jsonify(user.to_dict())
Step 2: Scan the Entire Codebase
# Find all raw SQL string construction patterns
grep -rn "execute\s*(" --include="*.py" src/ | grep -v "\.pyc"
grep -rn "f\"SELECT\|f\"INSERT\|f\"UPDATE\|f\"DELETE" --include="*.py" src/
grep -rn "\.format.*SELECT\|\.format.*INSERT" --include="*.py" src/
grep -rn "%s.*SELECT\|%s.*INSERT\|%d.*SELECT" --include="*.py" src/
grep -rn "\.raw\(" --include="*.py" src/ # Django raw queries
# Use Semgrep for deeper analysis (catches more patterns)
pip install semgrep
semgrep --config "p/python-sql-injection" src/
semgrep --config "r/python.sqlalchemy.security" src/
# Bandit — Python security linter
pip install bandit
bandit -r src/ -t B608 # B608 = SQL injection via string formatting
Step 3: Systemic Prevention
1. Pre-commit hook to block raw SQL patterns:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/PyCQA/bandit
rev: '1.7.7'
hooks:
- id: bandit
args: ['-t', 'B608,B610,B611', '--severity-level', 'low']
- repo: https://github.com/semgrep/semgrep
rev: 'v1.50.0'
hooks:
- id: semgrep
args: ['--config', 'p/python-sql-injection', '--error']
2. Input validation at the API layer:
from flask import request, abort
from marshmallow import Schema, fields, validate, ValidationError
class UserQuerySchema(Schema):
user_id = fields.Integer(required=True, strict=True)
sort_by = fields.String(validate=validate.OneOf(["name", "created_at", "email"]))
limit = fields.Integer(validate=validate.Range(min=1, max=100))
@app.route('/users')
def list_users():
schema = UserQuerySchema()
try:
params = schema.load(request.args)
except ValidationError as e:
abort(400, description=str(e.messages))
# params are now validated and typed — safe to use
users = User.query.order_by(getattr(User, params.get('sort_by', 'name'))).limit(params.get('limit', 20)).all()
return jsonify([u.to_dict() for u in users])
3. Database-level defense in depth:
-- Create a restricted application database user (not the DBA account)
CREATE USER app_user WITH PASSWORD 'strong_random_password';
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
REVOKE CREATE, DROP, ALTER ON SCHEMA public FROM app_user;
-- No GRANT OPTION, no superuser, no createdb
-- Even if SQLi occurs, attacker cannot DROP tables or escalate
4. WAF rules as compensating control (not primary defense):
# ModSecurity / OWASP CRS rules for SQL injection
# /etc/modsecurity/modsecurity.conf
SecRuleEngine On
# OWASP CRS Rule 942100-942999 covers SQL injection patterns
Include /etc/modsecurity/crs/crs-setup.conf
Include /etc/modsecurity/crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf
ATT&CK: T1190 (Exploit Public-Facing Application) CWE: CWE-89 (SQL Injection) OWASP: A03:2021 Injection
DETECTION OPPORTUNITIES: Log all database query errors (they may indicate SQLi probing). Monitor for UNION SELECT, SLEEP(), BENCHMARK(), and information_schema references in application logs. Alert on HTTP 500 spikes from API endpoints.
Scenario 07: Lateral Movement Detection in Windows
User prompt: "Set up detection for lateral movement in our Windows environment. We have 2,000 endpoints, 50 servers, Windows Server 2019 DCs, and Elastic SIEM."
Mode: [MODE: BLUE] with PURPLE background layer
CIPHER Response:
REASON — Lateral movement detection requires coverage across multiple techniques: RDP, PsExec, WMI, WinRM, DCOM, pass-the-hash, pass-the-ticket. Single rule is insufficient.
PLAN — Layer detection across network auth events, process creation, and service installation. Cover the top 6 lateral movement techniques.
Audit Policy Prerequisites
# Deploy via GPO to all endpoints and servers
# Computer Configuration > Policies > Windows Settings > Security Settings > Advanced Audit Policy
# Required policies:
auditpol /set /subcategory:"Logon" /success:enable /failure:enable
auditpol /set /subcategory:"Special Logon" /success:enable
auditpol /set /subcategory:"Process Creation" /success:enable
auditpol /set /subcategory:"Logoff" /success:enable
# Enable command-line logging in process creation events
reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\Audit" /v ProcessCreationIncludeCmdLine_Enabled /t REG_DWORD /d 1 /f
# Sysmon deployment (recommended config: SwiftOnSecurity or Olaf Hartong)
# https://github.com/SwiftOnSecurity/sysmon-config
sysmon64.exe -accepteula -i sysmonconfig-export.xml
Detection 1: PsExec / Remote Service Installation (T1021.002, T1569.002)
title: Remote Service Installation via PsExec or Similar Tool
id: b7d3f4e2-9a81-4c56-8e3d-1f2a5b7c9d0e
status: experimental
description: >
Detects creation of a service with a pipe-based name pattern consistent
with PsExec, PAExec, CSExec, or similar remote execution tools.
logsource:
product: windows
service: system
detection:
selection:
EventID: 7045
ServiceName|contains:
- 'PSEXESVC'
- 'PAExec'
- 'csexec'
- 'BTOBTO'
- 'svcctl'
condition: selection
falsepositives:
- Legitimate PsExec usage by sysadmins (whitelist by source/account)
level: high
tags:
- attack.lateral_movement
- attack.t1021.002
- attack.execution
- attack.t1569.002
Elastic KQL:
event.code: "7045" AND winlog.event_data.ServiceName: (*PSEXESVC* OR *PAExec* OR *csexec*)
Detection 2: Pass-the-Hash (T1550.002)
title: Pass-the-Hash - NTLM Logon with Explicit Credentials
id: c8e4f5a3-0b92-4d67-9f4e-2g3b6c8d0f1a
status: experimental
description: >
Detects network logon (type 3) using NTLM where the account is not
ANONYMOUS LOGON and the source is a workstation (not a server).
PTH attacks create type 3 logons with NTLM authentication.
logsource:
product: windows
service: security
detection:
selection:
EventID: 4624
LogonType: 3
AuthenticationPackageName: 'NTLM'
LogonProcessName: 'NtLmSsp'
filter_anonymous:
TargetUserName: 'ANONYMOUS LOGON'
filter_machine:
TargetUserName|endswith: '$'
condition: selection and not filter_anonymous and not filter_machine
falsepositives:
- Legacy applications using NTLM authentication
- Printers, scanners, and older network devices
level: medium
tags:
- attack.lateral_movement
- attack.t1550.002
Detection 3: WMI Remote Execution (T1047)
title: Remote WMI Process Creation
id: d9f5a6b4-1c03-4e78-0a5f-3h4c7d9e1a2b
status: experimental
description: >
Detects WmiPrvSE.exe spawning suspicious child processes, indicating
remote WMI command execution.
logsource:
category: process_creation
product: windows
detection:
selection:
ParentImage|endswith: '\WmiPrvSE.exe'
filter_legitimate:
Image|endswith:
- '\WerFault.exe'
- '\MpCmdRun.exe'
- '\taskhostw.exe'
condition: selection and not filter_legitimate
falsepositives:
- SCCM and other management tools that use WMI
level: medium
tags:
- attack.lateral_movement
- attack.execution
- attack.t1047
Detection 4: WinRM / PowerShell Remoting (T1021.006)
title: Incoming WinRM Remote Session
id: e0a6b7c5-2d14-4f89-1b6a-4i5d8e0f2b3c
status: experimental
description: >
Detects incoming PowerShell remoting sessions by monitoring WSMan
connection events and wsmprovhost.exe process creation.
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '\wsmprovhost.exe'
condition: selection
falsepositives:
- Legitimate PowerShell remoting by administrators
- DSC (Desired State Configuration) push-mode
level: medium
tags:
- attack.lateral_movement
- attack.t1021.006
Elastic KQL:
process.executable: *wsmprovhost.exe
Detection 5: RDP Lateral Movement (T1021.001)
title: RDP Logon from Non-Admin Workstation
id: f1b7c8d6-3e25-4a90-2c7b-5j6e9f1a3c4d
status: experimental
description: >
Detects RDP (type 10) logons where the source is not a known
admin/jump workstation. Legitimate RDP should originate from
designated jump boxes.
logsource:
product: windows
service: security
detection:
selection:
EventID: 4624
LogonType: 10
filter_jumpbox:
IpAddress:
- '10.0.50.10' # Approved jump box 1
- '10.0.50.11' # Approved jump box 2
condition: selection and not filter_jumpbox
falsepositives:
- Admin workstations not yet added to the jumpbox whitelist
level: high
tags:
- attack.lateral_movement
- attack.t1021.001
Detection 6: Anomalous Lateral Movement Pattern (Behavioral)
// Elastic KQL — single source account authenticating to many destinations
event.code: "4624" AND winlog.event_data.LogonType: (3 OR 10)
| stats dc(host.name) as unique_hosts by winlog.event_data.TargetUserName, source.ip
| where unique_hosts > 5
// Splunk — same pattern
index=wineventlog EventCode=4624 (Logon_Type=3 OR Logon_Type=10)
| stats dc(ComputerName) as unique_hosts values(ComputerName) as targets by Account_Name, Source_Network_Address
| where unique_hosts > 5
| sort -unique_hosts
Architecture Recommendation
┌─────────────┐
│ Jump Box │ ◄── Only approved RDP source
│ (PAW) │
└──────┬──────┘
│ RDP (restricted)
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ Tier 1 │ │ Tier 1 │ │ Tier 0 │
└──────────┘ └──────────┘ └──────────┘
▲
│ Tier 0 access only from
│ Tier 0 PAW (Red Forest)
Deploy Privileged Access Workstations (PAWs) and enforce tiered administration. This reduces the attack surface for lateral movement dramatically. [CONFIRMED — Microsoft Enterprise Access Model]
Evasion considerations (PURPLE): Sophisticated actors use DCOM, named pipe pivoting, or living-off-the-land via scheduled tasks (T1053.005). Ensure Sysmon Event IDs 1 (process creation), 3 (network connection), and 17/18 (pipe creation/connection) are collected. Also monitor for schtasks /create /s remote_host patterns.
Scenario 08: Kubernetes Security Controls
User prompt: "We're migrating to Kubernetes (EKS on AWS). What security controls do we need? We're running 30 microservices processing financial data."
Mode: [MODE: ARCHITECT]
CIPHER Response:
REASON — K8s migration for financial workloads. High-value target. Need defense in depth across cluster, workload, network, supply chain, and runtime layers.
PLAN — Address each K8s threat surface: cluster infra, workload config, network, image supply chain, secrets, runtime, compliance.
Cluster Infrastructure Security
1. EKS control plane hardening:
# Terraform — EKS cluster with security controls
resource "aws_eks_cluster" "main" {
name = "production"
role_arn = aws_iam_role.eks_cluster.arn
version = "1.29"
vpc_config {
endpoint_private_access = true
endpoint_public_access = false # No public API server
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.eks_cluster.id]
}
encryption_config {
provider { key_arn = aws_kms_key.eks.arn }
resources = ["secrets"] # Encrypt etcd secrets at rest
}
enabled_cluster_log_types = [
"api", "audit", "authenticator",
"controllerManager", "scheduler"
]
}
2. Node security:
resource "aws_eks_node_group" "workers" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "workers"
node_role_arn = aws_iam_role.eks_nodes.arn
instance_types = ["m6i.xlarge"]
# Use Bottlerocket OS — minimal, immutable, container-optimized
ami_type = "BOTTLEROCKET_x86_64"
# Auto-update nodes
update_config { max_unavailable = 1 }
}
Workload Security (Pod-Level)
3. Pod Security Standards (PSS) — enforce restricted profile:
# Namespace-level enforcement
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
4. Secure pod template — baseline for all workloads:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: production
spec:
replicas: 3
template:
spec:
automountServiceAccountToken: false # Don't mount SA token unless needed
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: payment-service
image: registry.example.com/payment-service:v1.2.3@sha256:abc123... # Pin by digest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "250m"
memory: "256Mi"
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 100Mi
serviceAccountName: payment-service-sa
Network Security
5. Network Policies — default deny, explicit allow:
# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes: ["Ingress", "Egress"]
---
# Allow payment-service to talk only to its database and the API gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes: ["Ingress", "Egress"]
ingress:
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- port: 8080
protocol: TCP
egress:
- to:
- podSelector:
matchLabels:
app: payment-db
ports:
- port: 5432
protocol: TCP
- to: # Allow DNS resolution
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
Image Supply Chain Security
6. Image scanning and admission control:
# Kyverno policy — block unscanned or vulnerable images
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-image-signature-and-scan
spec:
validationFailureAction: Enforce
rules:
- name: require-signed-images
match:
resources:
kinds: ["Pod"]
namespaces: ["production"]
verifyImages:
- imageReferences: ["registry.example.com/*"]
attestors:
- entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
<cosign public key>
-----END PUBLIC KEY-----
- name: block-critical-vulns
match:
resources:
kinds: ["Pod"]
namespaces: ["production"]
validate:
message: "Images with critical CVEs are not allowed in production"
deny:
conditions:
- key: "{{ images.containers.*.vulnerabilities.critical }}"
operator: GreaterThan
value: 0
# CI/CD pipeline — scan and sign images
# Build
docker build -t registry.example.com/payment-service:v1.2.3 .
# Scan with Trivy
trivy image --severity CRITICAL,HIGH --exit-code 1 registry.example.com/payment-service:v1.2.3
# Sign with Cosign (Sigstore)
cosign sign --key cosign.key registry.example.com/payment-service:v1.2.3
Secrets Management
7. External secrets — never store secrets in etcd:
# Use AWS Secrets Manager with External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: payment-db-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: payment-db-credentials
data:
- secretKey: DB_PASSWORD
remoteRef:
key: production/payment-service/db
property: password
Runtime Security
8. Deploy Falco for runtime threat detection:
# Falco rule — detect reverse shell from container
- rule: Reverse Shell in Container
desc: Detects a process in a container establishing a reverse shell
condition: >
spawned_process and container and
((proc.name in (bash, sh, zsh, dash, csh)) and
(fd.type = ipv4 or fd.type = ipv6) and
(fd.sip != "127.0.0.1"))
output: >
Reverse shell detected in container
(user=%user.name container=%container.name image=%container.image.repository
process=%proc.name connection=%fd.name)
priority: CRITICAL
tags: [container, shell, mitre_execution]
RBAC — Least Privilege
# Service account with minimal permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: payment-service-role
namespace: production
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["payment-config"]
verbs: ["get"]
# No list, no watch, no create — minimum needed
Compliance Mapping (Financial Data)
| Control | K8s Implementation | PCI-DSS Req |
|---|---|---|
| Encryption at rest | KMS-encrypted etcd + encrypted EBS | Req 3.4 |
| Encryption in transit | mTLS via service mesh (Istio/Linkerd) | Req 4.1 |
| Access control | RBAC + IRSA + network policies | Req 7.1 |
| Logging | CloudTrail + Falco + audit logs | Req 10.2 |
| Vulnerability mgmt | Trivy scanning in CI/CD | Req 6.1 |
| Network segmentation | Network policies + VPC | Req 1.3 |
ATT&CK (Containers): T1610 (Deploy Container), T1611 (Escape to Host), T1613 (Container and Resource Discovery)
Scenario 09: OSINT Attack Surface Assessment
User prompt: "Perform OSINT on our domain example-corp.com to assess our external attack surface. We want to know what an attacker would find."
Mode: [MODE: RECON]
CIPHER Response:
REASON — External attack surface assessment via OSINT. Passive recon first to avoid any active scanning of production systems.
PLAN — DNS enum, subdomain discovery, certificate transparency, web tech fingerprinting, leaked credentials, exposed services.
Phase 1: Passive DNS & Subdomain Enumeration
# 1. Certificate Transparency logs — find all issued certificates
# This is passive and uses public CT log data
curl -s "https://crt.sh/?q=%.example-corp.com&output=json" | jq -r '.[].name_value' | sort -u | tee ct_subdomains.txt
# 2. Subfinder — passive subdomain enumeration from multiple sources
subfinder -d example-corp.com -all -silent | tee subfinder_results.txt
# 3. Amass passive mode — wider data source coverage
amass enum -passive -d example-corp.com -o amass_results.txt
# 4. Consolidate and deduplicate
cat ct_subdomains.txt subfinder_results.txt amass_results.txt | sort -u > all_subdomains.txt
wc -l all_subdomains.txt
Phase 2: DNS Record Analysis
# MX records — identify email infrastructure
dig MX example-corp.com +short
# SPF record — check for overly permissive email senders
dig TXT example-corp.com +short | grep spf
# DMARC policy — is email spoofing possible?
dig TXT _dmarc.example-corp.com +short
# DKIM selector discovery
# Common selectors: google, selector1 (Microsoft), default, k1 (Mailchimp)
for sel in google selector1 selector2 default k1 s1; do
echo "--- $sel ---"
dig TXT ${sel}._domainkey.example-corp.com +short
done
# NS records — identify DNS provider
dig NS example-corp.com +short
# Check for zone transfer (often misconfigured)
for ns in $(dig NS example-corp.com +short); do
echo "Testing $ns..."
dig @$ns example-corp.com AXFR
done
Phase 3: Web Technology Fingerprinting
# Httpx — probe discovered subdomains for live web services
cat all_subdomains.txt | httpx -silent -status-code -title -tech-detect -follow-redirects | tee web_services.txt
# Wappalyzer CLI for detailed technology stack
# Look for: outdated CMS versions, exposed admin panels, debug endpoints
# Check for common sensitive paths
cat all_subdomains.txt | httpx -silent -path "/.env" -mc 200 | tee exposed_env.txt
cat all_subdomains.txt | httpx -silent -path "/.git/config" -mc 200 | tee exposed_git.txt
cat all_subdomains.txt | httpx -silent -path "/debug" -mc 200 | tee exposed_debug.txt
cat all_subdomains.txt | httpx -silent -path "/server-status" -mc 200 | tee exposed_status.txt
cat all_subdomains.txt | httpx -silent -path "/actuator/health" -mc 200 | tee exposed_actuator.txt
Phase 4: Cloud Asset Discovery
# S3 bucket enumeration based on naming conventions
for prefix in example-corp examplecorp example-corp-dev example-corp-staging example-corp-backup example-corp-logs; do
aws s3 ls s3://${prefix} --no-sign-request 2>/dev/null && echo "PUBLIC: ${prefix}"
done
# Check for Azure blob storage
for prefix in examplecorp examplecorpdev examplecorpprod; do
curl -s -o /dev/null -w "%{http_code}" "https://${prefix}.blob.core.windows.net/\$web?restype=container&comp=list"
done
# Google Cloud Storage
for prefix in example-corp examplecorp; do
curl -s -o /dev/null -w "%{http_code}" "https://storage.googleapis.com/${prefix}"
done
Phase 5: Credential Exposure & Data Leaks
# Search GitHub for leaked secrets (use GitHub dorking)
# NOTE: only search for YOUR OWN organization's leaked data
# GitHub dorks:
# "example-corp.com" password
# "example-corp.com" api_key
# "example-corp.com" AWS_SECRET
# org:example-corp password filename:.env
# Check Have I Been Pwned API for domain breach exposure
curl -s "https://haveibeenpwned.com/api/v3/breaches" -H "hibp-api-key: YOUR_KEY" | jq '.[] | select(.Domain == "example-corp.com")'
# Search Pastebin/paste sites (use IntelligenceX API)
# Check Dehashed for credential leaks in historical breaches
# Shodan for exposed services
shodan search "ssl.cert.subject.cn:example-corp.com" --fields ip_str,port,product,version
shodan search "hostname:example-corp.com" --fields ip_str,port,product,version
Phase 6: Social & Employee OSINT
# LinkedIn employee enumeration via Google dorking (passive)
# site:linkedin.com/in "example-corp" "engineer"
# This reveals: org chart, technology stack (from job titles), team size
# Hunter.io — email format discovery
curl -s "https://api.hunter.io/v2/domain-search?domain=example-corp.com&api_key=YOUR_KEY" | jq '.data.pattern'
# Common patterns: {first}.{last}@, {f}{last}@, {first}@
Deliverable: Attack Surface Report
┌──────────────────────────────────────────────────────────────┐
│ EXTERNAL ATTACK SURFACE ASSESSMENT │
│ example-corp.com │
│ Date: [current date] | Classification: CONFIDENTIAL│
├──────────────┬───────────────────────────────────────────────┤
│ Subdomains │ X unique subdomains discovered │
│ discovered │ Y live web services │
│ │ Z with outdated/vulnerable software │
├──────────────┼───────────────────────────────────────────────┤
│ Exposed │ .env files: [count] │
│ sensitive │ .git repos: [count] │
│ paths │ Admin panels: [count] │
│ │ Debug endpoints: [count] │
├──────────────┼───────────────────────────────────────────────┤
│ Cloud assets │ Public S3 buckets: [count] │
│ │ Misconfigured storage: [count] │
├──────────────┼───────────────────────────────────────────────┤
│ Email │ SPF: [pass/fail/overly permissive] │
│ security │ DMARC: [none/quarantine/reject] │
│ │ DKIM: [configured/missing] │
├──────────────┼───────────────────────────────────────────────┤
│ Credential │ Breaches containing domain: [count] │
│ exposure │ GitHub leaks found: [count] │
│ │ Paste sites: [count] │
├──────────────┼───────────────────────────────────────────────┤
│ Risk rating │ CRITICAL / HIGH / MEDIUM / LOW │
└──────────────┴───────────────────────────────────────────────┘
All collection methods used: Passive only. No active scanning, no exploitation, no port scanning against production systems. Sources: CT logs, DNS queries, public APIs (Shodan, crt.sh, Hunter.io), GitHub public repos.
ATT&CK Recon Techniques covered: T1589 (Gather Victim Identity Info), T1590 (Gather Victim Network Info), T1591 (Gather Victim Org Info), T1593 (Search Open Websites/Domains), T1596 (Search Open Technical Databases)
DETECTION OPPORTUNITIES: Organizations can detect recon by monitoring CT log subscriptions (Google Certificate Transparency monitoring), DNS query logs for zone transfer attempts, and web server logs for sensitive path probing patterns (/.env, /.git, /actuator).
Scenario 10: DPIA for ML Feature Processing PII
User prompt: "Help me write a DPIA for our new ML feature. It analyzes customer support tickets to predict churn. It processes customer names, email addresses, support ticket content (which may contain sensitive info), product usage data, and purchase history. We use AWS SageMaker for training."
Mode: [MODE: PRIVACY]
CIPHER Response:
REASON — DPIA required under GDPR Art. 35 — automated decision-making using personal data that may affect the data subject (churn prediction could trigger retention offers, pricing changes, or service modifications).
PLAN — Complete DPIA following ICO/CNIL methodology: describe processing, assess necessity, identify risks, define mitigations.
DATA PROTECTION IMPACT ASSESSMENT
1. Processing Description
| Element | Detail |
|---|---|
| Controller | [Company Name], [Address] |
| DPO Contact | dpo@example-corp.com |
| Processing name | Customer Churn Prediction Model |
| Purpose | Predict likelihood of customer churn based on support interaction patterns, enabling proactive retention outreach |
| Legal basis | Art. 6(1)(f) — Legitimate interest (business retention). NOT consent — consent is not freely given if service depends on it |
| Data subjects | Existing customers who have submitted support tickets |
| Data categories | Name, email, support ticket text (free-form — may contain health info, financial details, personal circumstances), product usage metrics, purchase history, account tenure |
| Special categories | Possible — support ticket free text may inadvertently contain Art. 9 data (health conditions, political opinions, religious beliefs mentioned in context of support issues) |
| Recipients | Customer Success team (churn scores), AWS (processor — SageMaker), internal ML engineering team |
| Retention | Training data: 24 months rolling. Model predictions: 6 months. Retrained model: retained until superseded |
| International transfers | AWS SageMaker in eu-west-1 (no transfer outside EEA if configured correctly). Verify no S3 cross-region replication |
2. Necessity and Proportionality Assessment
| Principle | Assessment | Status |
|---|---|---|
| Purpose limitation (Art. 5(1)(b)) | Churn prediction is a defined, specific purpose. Risk: model outputs used for discriminatory pricing or service degradation | REQUIRES CONTROL |
| Data minimization (Art. 5(1)(c)) | Names and emails are NOT needed for model training — only for joining predictions back to accounts. Train on anonymized/pseudonymized features | REQUIRES CHANGE |
| Storage limitation (Art. 5(1)(e)) | 24-month training window is justifiable for seasonal patterns. Predictions should expire faster | ACCEPTABLE |
| Accuracy (Art. 5(1)(d)) | Model accuracy must be validated. Inaccurate predictions could lead to unwanted retention campaigns | REQUIRES CONTROL |
| Necessity test | Could the purpose be achieved with less data? Yes — aggregate usage patterns without ticket text content may suffice. If ticket text is needed, use NLP-extracted sentiment scores, not raw text | REQUIRES CHANGE |
3. Risk Assessment
RISK MATRIX
│ Negligible │ Minor │ Significant│ Maximum │
────────────────────┼────────────┼────────────┼────────────┼────────────┤
Almost certain │ │ │ │ │
Likely │ │ R4 │ R2,R5 │ │
Possible │ │ R6 │ R1,R3 │ R7 │
Unlikely │ │ │ │ │
| ID | Risk | Likelihood | Severity | Overall | GDPR Article |
|---|---|---|---|---|---|
| R1 | Support ticket text contains special category data (health, religion) processed without Art. 9 basis | Possible | Significant | HIGH | Art. 9(1) |
| R2 | Model outputs used for automated decisions with legal/significant effects without human review | Likely | Significant | HIGH | Art. 22 |
| R3 | Training data breach exposes customer support content (potentially sensitive personal narratives) | Possible | Significant | HIGH | Art. 32, 33, 34 |
| R4 | Model perpetuates bias — certain customer demographics predicted as high-churn and receive differential treatment | Likely | Minor | MEDIUM | Art. 5(1)(a) fairness |
| R5 | Customers not informed their data is used for ML churn prediction | Likely | Significant | HIGH | Art. 13, 14 |
| R6 | Pseudonymized training data re-identified through support ticket text content | Possible | Minor | MEDIUM | Recital 26 |
| R7 | Model inversion attack extracts training data from deployed model | Possible | Maximum | HIGH | Art. 32 |
4. Mitigation Measures
| Risk | Mitigation | Owner | Priority |
|---|---|---|---|
| R1 | Implement PII/PHI classifier on ticket text before ingestion. Redact or exclude tickets flagged as containing special category data. Use regex + NER model to detect health terms, financial account numbers, etc. | ML Engineering | P0 — BLOCKER |
| R2 | Churn scores are advisory only — never trigger automated actions (price changes, service modifications) without human review. Document this restriction in the model card and enforce via approval workflow | Product + Legal | P0 — BLOCKER |
| R3 | Encrypt training data at rest (AWS KMS CMK), restrict SageMaker IAM role to minimum permissions, enable VPC-only mode for SageMaker notebooks, disable internet access for training instances | Platform Engineering | P0 |
| R4 | Conduct fairness audit before production deployment. Test for disparate impact across demographic proxies (geography, language, account tier). Retrain with fairness constraints if bias detected | ML Engineering | P1 |
| R5 | Update privacy policy to disclose ML processing. Add specific mention under "How we use your data" section. For existing customers, send notification of updated privacy policy | Legal + Product | P0 — BLOCKER |
| R6 | Replace raw ticket text with extracted features (sentiment score, topic category, word count, response time) in training data. Never store raw text in the training pipeline | ML Engineering | P1 |
| R7 | Deploy model behind API with rate limiting. Do not expose raw prediction probabilities — return categorical labels (low/medium/high risk). Implement differential privacy during training (DP-SGD) | ML Engineering | P2 |
5. Technical Architecture with Privacy Controls
┌──────────────────────────────────────────────────────────────────┐
│ DATA FLOW DIAGRAM │
│ │
│ Support DB ──► PII Redaction ──► Feature ──► SageMaker ──► Model│
│ (raw text) Pipeline Extract Training Store │
│ │ │ (VPC-only) │ │
│ ▼ ▼ ▼ │
│ Redaction Pseudonymized Prediction │
│ Audit Log Feature Store API │
│ │ │ │
│ ┌────────┘ ▼ │
│ ▼ CS Dashboard │
│ Encrypted S3 (categorical │
│ (KMS CMK, labels only, │
│ bucket policy, no raw scores) │
│ access logging) │
│ │
│ ─ ─ ─ ─ ─ ─ ─ TRUST BOUNDARY ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ AWS Account (eu-west-1 only) │
│ VPC with no internet gateway on SageMaker subnet │
└──────────────────────────────────────────────────────────────────┘
6. Data Subject Rights Implementation
| Right | How it applies | Implementation |
|---|---|---|
| Access (Art. 15) | Customer can request their churn prediction score and the data used to generate it | Expose via customer data export; include model explanation |
| Erasure (Art. 17) | Customer data must be removable from training pipeline | Implement data deletion from feature store; retrain model on next cycle without deleted records |
| Object (Art. 21) | Customer can object to profiling for churn prediction | Implement opt-out flag; exclude opted-out customers from model input |
| Explanation (Art. 22 + Recital 71) | If predictions influence decisions, customer has right to meaningful information about the logic | Generate SHAP/LIME explanations per prediction; store for 6 months |
| Portability (Art. 20) | N/A — churn predictions are derived data, not provided by the data subject | Document rationale for non-applicability |
7. DPO/Supervisory Authority Consultation
Prior consultation (Art. 36) required if: residual risk remains HIGH after mitigations. With the controls above implemented, residual risk should be MEDIUM. Document this assessment and have the DPO sign off.
Review schedule: Re-assess this DPIA every 12 months, on model retraining, or when data sources change.
Approval signatures:
| Role | Name | Date | Decision |
|---|---|---|---|
| DPO | Approve / Refer to SA | ||
| Product Owner | Accept residual risk | ||
| CISO | Technical controls approved | ||
| Legal Counsel | Legal basis confirmed |
Scenario 11: SIEM False Positive Tuning
User prompt: "Our SOC keeps getting buried in false positives — 2,000+ alerts/day, 95% are noise. We're using Splunk with about 200 correlation rules. Analysts are burned out and starting to ignore alerts. How do we fix this?"
Mode: [MODE: BLUE]
CIPHER Response:
REASON — Alert fatigue is the #1 SOC killer. 95% FP rate means analysts see ~100 true positives buried in 1,900 noise alerts daily. This is a detection engineering problem, not a staffing problem.
PLAN — Triage the rule set, implement a tuning framework, redesign alert tiers, automate low-value alerts.
Step 1: Audit Your Current Rule Set (Week 1)
// Find the noisiest rules — top 20 rules generating the most alerts in last 30 days
index=notable
| stats count as alert_count dc(src) as unique_sources dc(dest) as unique_dests by rule_name
| sort -alert_count
| head 20
| eval daily_avg=round(alert_count/30, 0)
| table rule_name alert_count daily_avg unique_sources unique_dests
// For each noisy rule, calculate its true positive rate
// (requires analysts to have been dispositioning alerts)
index=notable rule_name="Suspicious PowerShell Execution"
| stats count as total,
count(eval(status="true_positive")) as true_pos,
count(eval(status="false_positive")) as false_pos,
count(eval(status="undetermined")) as undetermined
| eval tp_rate=round(true_pos/total*100, 1)
| table rule_name total true_pos false_pos undetermined tp_rate
Decision framework for each rule:
| TP Rate | Action |
|---|---|
| < 5% | Disable or rewrite from scratch |
| 5-30% | Add exclusions, refine logic, add context enrichment |
| 30-70% | Add allowlists for known-good, increase specificity |
| > 70% | Keep — this is a good rule |
Step 2: Implement Structured Tuning
Allowlist management — centralized, auditable:
// Create a lookup table for allowlisted items
// allowlist.csv: rule_name, field, value, reason, approved_by, expiry_date
| inputlookup allowlist.csv
| where expiry_date > now() // Auto-expire allowlist entries
# allowlist.csv example
rule_name,field,value,reason,approved_by,expiry_date
"Suspicious PowerShell","process_command_line","*Get-ADUser*","IT admin daily script",analyst1,2025-07-01
"Brute Force Detection","src_ip","10.0.50.22","Vulnerability scanner",analyst2,2025-04-01
"Data Exfiltration","dest_ip","44.233.12.0/24","Approved cloud backup service",analyst1,2025-06-01
// Modify rules to check against allowlist
index=sysmon EventCode=1 process_name="powershell.exe"
| lookup allowlist.csv rule_name AS "Suspicious PowerShell", field AS "process_command_line" OUTPUT reason AS allowlist_reason
| where isnull(allowlist_reason)
// Only alerts that are NOT allowlisted continue through
Step 3: Implement Alert Tiering
┌──────────────────────────────────────────────────────────────┐
│ ALERT TIER FRAMEWORK │
├────────┬─────────────────────────────────────────────────────┤
│ TIER 1 │ HIGH-FIDELITY — Immediate analyst review │
│ (P1) │ Examples: EDR alert + network beacon, honeypot │
│ │ triggered, known-bad hash, credential dumping tool │
│ │ Target: < 50/day, > 80% TP rate │
├────────┼─────────────────────────────────────────────────────┤
│ TIER 2 │ ENRICHMENT-REQUIRED — Auto-enrich, then decide │
│ (P2) │ Examples: Suspicious PowerShell (check if admin), │
│ │ anomalous logon (check if travel approved), │
│ │ new scheduled task (check if in change window) │
│ │ Target: < 200/day after auto-enrichment │
├────────┼─────────────────────────────────────────────────────┤
│ TIER 3 │ AUTOMATED — SOAR handles, analyst reviews summary │
│ (P3) │ Examples: Malware blocked by EDR, known-bad IP │
│ │ blocked by firewall, failed login < threshold, │
│ │ policy violation auto-remediated │
│ │ Target: unlimited — fully automated │
├────────┼─────────────────────────────────────────────────────┤
│ HUNT │ LOW-SIGNAL — Feed to threat hunting queue │
│ │ Examples: Anomalous process lineage, DNS entropy, │
│ │ rare binary execution, first-seen user-agent │
│ │ Not alertable — material for weekly hunting sprints │
└────────┴─────────────────────────────────────────────────────┘
Step 4: Context Enrichment (Reduces FP by 30-50%)
// Example: enrich PowerShell alerts with user context
index=sysmon EventCode=1 process_name="powershell.exe"
| lookup ad_users.csv sAMAccountName AS user OUTPUT department, title, is_admin
| lookup asset_inventory.csv hostname AS dest OUTPUT asset_criticality, asset_owner
| lookup known_scripts.csv command_line_hash AS cmd_hash OUTPUT script_name, approved
| where NOT (is_admin="true" AND approved="true")
// Now you only see PowerShell from non-admins running unapproved scripts
Critical enrichment sources to build:
- Asset inventory lookup — hostname → criticality, owner, OS, role
- Identity context — username → department, admin status, service account flag
- Known-good baselines — approved scripts, expected service accounts per host, normal network destinations
- Threat intelligence — IP/domain/hash → known-bad indicators with confidence scores
Step 5: SOAR Automation for Tier 3
# Phantom/XSOAR playbook logic — auto-close blocked malware alerts
def auto_close_blocked_malware(alert):
"""Tier 3: EDR blocked malware — verify and auto-close."""
# 1. Verify EDR actually blocked it
edr_status = query_edr(alert.endpoint, alert.process_hash)
if edr_status.action != "blocked":
# EDR didn't block — escalate to Tier 1
return escalate_to_tier1(alert, reason="EDR block not confirmed")
# 2. Check if the hash is known commodity malware (not targeted)
vt_result = query_virustotal(alert.process_hash)
if vt_result.detection_ratio < 10:
# Low detection — could be targeted, escalate
return escalate_to_tier1(alert, reason="Low VT detection, possible targeted malware")
# 3. Auto-close with disposition
close_alert(
alert_id=alert.id,
disposition="true_positive_blocked",
notes=f"Auto-closed: EDR confirmed block, VT {vt_result.detection_ratio}/70, commodity malware",
analyst="SOAR_AUTO"
)
# 4. Add to daily summary for analyst review
add_to_daily_digest(alert)
Metrics to Track
| Metric | Current | Target (90 days) |
|---|---|---|
| Alerts/day | 2,000 | < 300 (Tier 1+2) |
| TP rate | 5% | > 40% |
| MTTD (true incidents) | Unknown (buried) | < 15 min |
| MTTR | Unknown | < 4 hours |
| Analyst alerts/shift | ~250 | < 40 |
| Rules with 0% TP rate | Unknown | 0 (disabled) |
CIS Controls: Control 8 (Audit Log Management), Control 13 (Network Monitoring and Defense)
DETECTION OPPORTUNITIES: While tuning, do NOT simply delete noisy rules. First verify that the underlying behavior is truly benign. An attacker may be hiding in your noise — that is exactly why they use living-off-the-land techniques. Validate each allowlist entry with the asset owner.
Scenario 12: Secure CI/CD Pipeline Design
User prompt: "Design a secure CI/CD pipeline for our microservices. We use GitHub, build Docker images, and deploy to EKS. We've had incidents where dependencies with known CVEs made it to production."
Mode: [MODE: ARCHITECT]
CIPHER Response:
REASON — CI/CD pipeline is a high-value target — compromise here means compromising every production deployment. Supply chain attacks (SolarWinds, Codecov, XZ Utils) demonstrate the threat is real.
PLAN — Secure each phase: code commit, build, test, artifact storage, deployment, runtime verification.
Pipeline Architecture
┌──────────────────────────────────────────────────────────────────────────┐
│ SECURE CI/CD PIPELINE │
│ │
│ Developer ──► GitHub ──► Build ──► Test ──► Scan ──► Sign ──► Deploy │
│ │ │ │ │ │ │ │ │
│ ┌──┴──┐ ┌───┴───┐ ┌──┴──┐ ┌──┴──┐ ┌─┴──┐ ┌──┴──┐ ┌──┴──┐ │
│ │Pre- │ │Branch │ │Hermetic│ │Unit │ │SAST│ │Cosign│ │Kyver│ │
│ │commit│ │Protect│ │Build │ │Integ│ │DAST│ │Sign │ │no │ │
│ │hooks │ │Rules │ │ │ │E2E │ │SCA │ │SBOM │ │Admit│ │
│ └──────┘ └───────┘ └──────┘ └─────┘ │Image│ └─────┘ │Ctrl │ │
│ └─────┘ └─────┘ │
│ │
│ ── ── ── ── ── SUPPLY CHAIN TRUST BOUNDARY ── ── ── ── ── ── ── ── │
└──────────────────────────────────────────────────────────────────────────┘
Phase 1: Code Commit Security
Branch protection rules:
# GitHub CLI — configure branch protection
gh api repos/{owner}/{repo}/branches/main/protection -X PUT \
--input - << 'EOF'
{
"required_status_checks": {
"strict": true,
"contexts": ["security-scan", "unit-tests", "integration-tests"]
},
"enforce_admins": true,
"required_pull_request_reviews": {
"required_approving_review_count": 2,
"dismiss_stale_reviews": true,
"require_code_owner_reviews": true
},
"required_linear_history": true,
"allow_force_pushes": false,
"allow_deletions": false,
"required_signatures": true
}
EOF
Pre-commit hooks:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.1
hooks:
- id: gitleaks # Detect secrets before they enter git history
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: detect-private-key
- id: check-added-large-files
args: ['--maxkb=500']
- repo: https://github.com/semgrep/semgrep
rev: v1.50.0
hooks:
- id: semgrep
args: ['--config', 'auto', '--error']
Phase 2: Build Security
# GitHub Actions — secure build pipeline
name: Secure Build Pipeline
on:
pull_request:
branches: [main]
push:
branches: [main]
permissions:
contents: read # Least privilege — don't grant write unless needed
packages: write
id-token: write # For OIDC-based auth (no long-lived secrets)
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for accurate diff-based scanning
# Pin ALL action versions by SHA, not tag (prevent tag hijacking)
- uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0
# 1. Secret scanning
- name: Gitleaks scan
uses: gitleaks/gitleaks-action@cb7149a9b57195b609c63e8518d2c6056677d2d0 # v2.3.3
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# 2. SAST — static analysis
- name: Semgrep SAST
uses: semgrep/semgrep-action@713efdd345f3035192eaa63f56867b88e63e4e5d # v1
with:
config: >-
p/default
p/owasp-top-ten
p/python-sql-injection
p/docker-best-practices
# 3. Dependency scanning (SCA)
- name: Dependency audit
run: |
pip install pip-audit
pip-audit --requirement requirements.txt --strict --fix --dry-run
# 4. Build container image
- name: Build image
run: |
docker build \
--no-cache \
--label "org.opencontainers.image.source=${{ github.repositoryUrl }}" \
--label "org.opencontainers.image.revision=${{ github.sha }}" \
-t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} .
# 5. Container image scan
- name: Trivy image scan
uses: aquasecurity/trivy-action@0.16.1
with:
image-ref: '${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
exit-code: '1' # Fail pipeline on critical/high CVEs
# 6. Generate SBOM
- name: Generate SBOM
run: |
syft ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} -o spdx-json > sbom.spdx.json
# 7. Sign image with Cosign (keyless — uses OIDC)
- name: Sign image
run: |
cosign sign --yes \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cosign attest --yes --predicate sbom.spdx.json \
--type spdxjson \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
Phase 3: Secure Dockerfile
# Use specific digest, not :latest
FROM python:3.12-slim@sha256:abcdef123456... AS builder
# Non-root user
RUN groupadd -r appuser && useradd --no-log-init -r -g appuser appuser
# Install dependencies in a separate stage
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Final stage — minimal
FROM python:3.12-slim@sha256:abcdef123456...
# Copy only what's needed
COPY /root/.local /home/appuser/.local
COPY src/ /app/
# Read-only filesystem compatible
RUN mkdir -p /tmp/app && chown appuser:appuser /tmp/app
USER appuser
WORKDIR /app
ENV PATH=/home/appuser/.local/bin:$PATH
# Health check
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
ENTRYPOINT ["python", "-m", "app"]
Phase 4: Deployment Security
# ArgoCD with image verification
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
spec:
source:
repoURL: https://github.com/org/k8s-manifests
targetRevision: main
path: services/payment-service
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=false # Don't auto-create namespaces
Phase 5: Pipeline Security Hardening
# GitHub Actions — restrict workflow permissions at org level
# In org settings: Actions > General > Workflow permissions
# Set to "Read repository contents permission" (minimum)
# Restrict which actions can be used
# Settings > Actions > General > Allow select actions
# Only allow: actions/*, github/*, your-org/*
# Enable required workflows for security scanning
# Settings > Actions > General > Required workflows
Secrets management in CI/CD:
# NEVER store secrets in environment variables in workflow files
# Use OIDC for AWS authentication (no static keys)
# GitHub Actions OIDC provider setup:
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1 \
--client-id-list sts.amazonaws.com
Supply Chain Security Checklist
| Control | Tool | Blocks Pipeline? |
|---|---|---|
| Secret detection | Gitleaks | Yes — PR cannot merge |
| SAST | Semgrep | Yes — critical findings |
| SCA / dependency audit | pip-audit, npm audit | Yes — critical CVEs |
| Container image scan | Trivy | Yes — critical/high CVEs |
| Image signing | Cosign | Yes — unsigned = rejected |
| SBOM generation | Syft | No — informational |
| Admission control | Kyverno | Yes — rejects unsigned images |
| License compliance | Trivy license scan | No — warning only |
| IaC scanning | Checkov/tfsec | Yes — critical misconfigs |
ATT&CK: T1195.002 (Supply Chain Compromise: Compromise Software Supply Chain), T1199 (Trusted Relationship) NIST SSDF: PO.3 (Secure Software Development), PS.1 (Protect Software), PW.4 (Verify Third-Party Components) SLSA Level 3 requirements: Hermetic builds, signed provenance, source and build integrity verification
Scenario 13: Suspicious Process Investigation on Linux
User prompt: "We found a suspicious process on a Linux server — it's called 'kworker' but it's running as a regular user, not as a kernel thread. It's making outbound connections to an IP we don't recognize. The server handles payment processing."
Mode: [MODE: INCIDENT]
CIPHER Response:
REASON — Active compromise indicator. Legitimate kworker processes are kernel threads (PPID 2, enclosed in brackets [kworker/...]). A user-space process named 'kworker' is almost certainly malware masquerading as a kernel thread. Payment processing server = critical asset.
PLAN — Triage > Collect volatile evidence > Contain > Full forensic analysis
IMMEDIATE TRIAGE (Do This Now)
Verify the process is suspicious:
# 1. List ALL kworker processes — kernel threads vs. impersonators
ps aux | grep kworker
# Legitimate kernel threads show as: [kworker/0:1] (in brackets, owned by root)
# Suspicious: kworker (no brackets, owned by regular user)
# 2. Get full details on the suspicious PID
ls -la /proc/<PID>/exe
# This shows the actual binary path — kernel threads have no exe link
# If it resolves to a file like /tmp/.kworker or /dev/shm/kworker — confirmed malware
readlink -f /proc/<PID>/exe
# Example malicious output: /tmp/.cache/.kworker (deleted)
# 3. Check process tree — who spawned it?
pstree -psa <PID>
# Legitimate: kthreadd(2) -> kworker
# Malicious: bash(12345) -> kworker OR cron -> sh -> kworker
# 4. Check network connections
ss -tnp | grep <PID>
# Note the destination IP, port, and state
# Also check:
cat /proc/<PID>/net/tcp
# Decode hex IP addresses if needed
# 5. Check open files
ls -la /proc/<PID>/fd/
lsof -p <PID>
# 6. Check the command line and environment
cat /proc/<PID>/cmdline | tr '\0' ' '
cat /proc/<PID>/environ | tr '\0' '\n'
# Look for: C2 URLs, crypto wallet addresses, API keys
# 7. Get the process creation time
stat /proc/<PID>
# This gives you the approximate compromise time
EVIDENCE COLLECTION (Before Containment)
# Create evidence directory on separate mounted drive or network share
mkdir -p /evidence/$(hostname)/$(date +%Y%m%d)
EVIDENCE_DIR="/evidence/$(hostname)/$(date +%Y%m%d)"
# 1. Memory dump of the process
gcore -o ${EVIDENCE_DIR}/kworker_mem <PID>
# Or:
cat /proc/<PID>/maps > ${EVIDENCE_DIR}/proc_maps.txt
cp /proc/<PID>/mem ${EVIDENCE_DIR}/proc_mem.bin 2>/dev/null
# 2. Copy the binary (even if "deleted" — still accessible via /proc)
cp /proc/<PID>/exe ${EVIDENCE_DIR}/malware_sample
# 3. Hash everything
sha256sum ${EVIDENCE_DIR}/malware_sample > ${EVIDENCE_DIR}/hashes.txt
md5sum ${EVIDENCE_DIR}/malware_sample >> ${EVIDENCE_DIR}/hashes.txt
# 4. Full memory capture (if possible — requires LiME module for the running kernel)
insmod /path/to/lime.ko "path=${EVIDENCE_DIR}/full_memory.lime format=lime"
# 5. Network capture — 60 seconds of traffic from the suspicious process
timeout 60 tcpdump -i any -w ${EVIDENCE_DIR}/network_capture.pcap "host <SUSPICIOUS_IP>"
# 6. Collect system logs
cp /var/log/auth.log ${EVIDENCE_DIR}/
cp /var/log/syslog ${EVIDENCE_DIR}/
cp /var/log/cron.log ${EVIDENCE_DIR}/
journalctl --since "7 days ago" > ${EVIDENCE_DIR}/journalctl.txt
# 7. Check for persistence mechanisms
crontab -l -u $(stat -c '%U' /proc/<PID>/exe) > ${EVIDENCE_DIR}/user_crontab.txt 2>&1
cat /etc/crontab > ${EVIDENCE_DIR}/system_crontab.txt
ls -la /etc/cron.d/ > ${EVIDENCE_DIR}/cron_d.txt
systemctl list-unit-files --type=service > ${EVIDENCE_DIR}/services.txt
cat /etc/rc.local > ${EVIDENCE_DIR}/rc_local.txt 2>/dev/null
ls -la ~/.bashrc ~/.profile ~/.bash_profile > ${EVIDENCE_DIR}/shell_profiles.txt 2>/dev/null
cat ~/.bashrc > ${EVIDENCE_DIR}/bashrc.txt
# 8. Check for other compromised processes
# Find all processes with network connections to unusual destinations
ss -tnp | awk '{print $5, $6}' | sort -u > ${EVIDENCE_DIR}/all_connections.txt
# 9. Check for rootkit indicators
# Modified system binaries
rpm -Va 2>/dev/null > ${EVIDENCE_DIR}/rpm_verify.txt # RPM-based systems
debsums -c 2>/dev/null > ${EVIDENCE_DIR}/debsums.txt # Debian-based systems
# 10. Timeline — recently modified files
find / -mtime -7 -type f -not -path "/proc/*" -not -path "/sys/*" 2>/dev/null | head -1000 > ${EVIDENCE_DIR}/recent_files.txt
CONTAINMENT
# 1. Network isolation — block the C2 IP immediately
iptables -I OUTPUT -d <SUSPICIOUS_IP> -j DROP
iptables -I INPUT -s <SUSPICIOUS_IP> -j DROP
# 2. If the process is actively exfiltrating, kill it NOW
kill -STOP <PID> # SIGSTOP first (freezes without terminating — preserves state)
# Then after evidence collection:
kill -9 <PID>
# 3. Remove persistence
# Based on what you found in evidence collection — remove cron entries,
# systemd units, shell profile modifications
# 4. Block at perimeter firewall (not just host-level)
# Notify network team to block <SUSPICIOUS_IP> at the firewall
# 5. Rotate credentials — assume all credentials on this server are compromised
# Payment processing server = rotate:
# - Database credentials
# - API keys for payment gateway
# - TLS certificates
# - SSH keys
# - Service account tokens
ANALYSIS
# 1. Check the malware hash against threat intel
# Submit sha256 to VirusTotal
curl -s "https://www.virustotal.com/api/v3/files/<SHA256>" \
-H "x-apikey: <VT_API_KEY>" | jq '.data.attributes.last_analysis_stats'
# 2. Static analysis of the binary
file ${EVIDENCE_DIR}/malware_sample
strings ${EVIDENCE_DIR}/malware_sample | grep -iE 'http|socket|connect|wget|curl|/bin/sh|password|key|token'
# 3. Check the C2 IP reputation
curl -s "https://www.virustotal.com/api/v3/ip_addresses/<SUSPICIOUS_IP>" \
-H "x-apikey: <VT_API_KEY>" | jq '.data.attributes.last_analysis_stats'
# 4. Determine initial access vector
# Check auth.log for SSH brute force or unauthorized access
grep "Accepted\|Failed" /var/log/auth.log | tail -100
# Check web server logs if applicable
grep -i "POST\|shell\|exec\|eval\|system" /var/log/nginx/access.log | tail -100
Escalation Decision
| Condition | Action |
|---|---|
| Payment data confirmed accessed | Engage PCI QSA, notify payment processor, legal counsel for breach notification |
| Other servers show same IOCs | Assume lateral movement — isolate the segment, activate full IR plan |
| Process has been running > 7 days | Assume data exfiltration occurred — scope based on server access to data |
| Root access achieved | Assume rootkit — system cannot be trusted. Image disk, rebuild from scratch |
ATT&CK:
- T1036.004 — Masquerading: Masquerade Task or Service
- T1059.004 — Command and Scripting Interpreter: Unix Shell
- T1071.001 — Application Layer Protocol: Web Protocols (if C2 over HTTP)
- T1053.003 — Scheduled Task/Job: Cron
DETECTION OPPORTUNITIES: Create a detection rule for user-space processes named to match kernel thread names. Monitor for processes where /proc/<PID>/exe resolves to /tmp, /dev/shm, /var/tmp, or contains (deleted). Alert on outbound connections from payment processing servers to non-allowlisted IPs.
Scenario 14: Phishing Simulation Campaign
User prompt: "Create a phishing simulation campaign for our security awareness training. We have 500 employees across engineering, sales, marketing, HR, and finance. We want realistic but ethical scenarios."
Mode: [MODE: RED] with PRIVACY background layer
CIPHER Response:
REASON — Phishing simulation is the most effective way to measure and improve human-layer security. Must be realistic enough to test behavior but ethical — no punitive consequences, learning-focused.
PLAN — Design campaign structure, create 5 scenario templates graduated by difficulty, define metrics, build the feedback loop.
Campaign Structure
Platform options: GoPhish (open-source, self-hosted), KnowBe4, Proofpoint Security Awareness, Cofense PhishMe.
For self-hosted (GoPhish):
# Deploy GoPhish
docker run -d --name gophish \
-p 3333:3333 \ # Admin panel
-p 8080:80 \ # Phishing landing pages
-v gophish-data:/opt/gophish/data \
gophish/gophish
# Access admin: https://localhost:3333
# Default creds in docker logs: docker logs gophish | grep password
Scenario Templates (Graduated Difficulty)
Level 1 — Easy to Spot (Baseline measurement):
From: IT Helpdesk <it.support@examp1e-corp.com> ← Note: typo in domain
Subject: Your Password Expires in 24 Hours
Dear Employee,
Your corporate password will expire in 24 hours. Click below to update
your password immediately to avoid losing access to all systems.
[UPDATE PASSWORD NOW] → landing page
Red flags: urgency, external domain with typo, generic greeting
Level 2 — Moderate (Tests attention to detail):
From: Microsoft 365 <no-reply@microsoft-365-admin.com> ← Lookalike domain
Subject: Action Required: Unusual Sign-in Activity on Your Account
We detected a sign-in to your account from a new device:
Location: Moscow, Russia
Device: Chrome on Linux
Time: [current date] 3:42 AM
If this wasn't you, secure your account immediately:
[Review Recent Activity] → credential harvesting page
Red flags: external domain (not microsoft.com), creates fear, urgency
Level 3 — Hard (Spear-phish with context):
From: [Actual CEO Name] <[ceo-name]@example-corp.net> ← Similar but wrong domain
Subject: Q4 Compensation Review — Confidential
Hi [First Name],
Attached is the Q4 compensation adjustment spreadsheet for your review
before the board meeting Thursday. Please review your team's allocations
and confirm by EOD.
This is confidential — please do not forward.
[View Spreadsheet] → macro-enabled document or credential page
Red flags: wrong domain (.net vs .com), unusual request from CEO directly,
"confidential" pressure tactic, attachment/link
Level 4 — Advanced (Business process exploitation):
From: [Real vendor name] <invoicing@[vendor-lookalike].com>
Subject: RE: Invoice #INV-2024-3847 — Updated Banking Details
Hi [First Name],
Following our phone conversation, please find our updated banking
details for future payments. Our bank has changed due to a corporate
restructuring.
New details:
Bank: [plausible bank]
Account: [number]
Routing: [number]
Please update in your AP system before processing the pending invoice.
[Updated W-9 Form.pdf] → landing page
Red flags: unsolicited banking change, references non-existent phone call,
targets finance/AP specifically
Level 5 — Expert (Multi-channel, highly targeted):
Pre-text: Leave a voicemail for the target referencing a "document"
Then send the email.
From: [target's actual manager] <[manager-name]@example-corp.com>
← Spoofed display name, different reply-to
Subject: FW: Contract draft for [real project name]
[First Name],
As discussed on the call, here's the contract draft. Legal needs your
review by end of week.
[contract-draft-v3-FINAL.docx] → landing page mimicking SharePoint
Red flags: reply-to differs from From, attachment via link not attachment,
relies on social engineering from voicemail to create familiarity
This level tests whether employees verify sender identity across channels.
Landing Page Design
<!-- GoPhish landing page — credential harvester that immediately redirects to training -->
<!-- This captures: did they enter credentials? But NEVER stores real passwords -->
<html>
<body>
<form method="POST" action="">
<h2>Sign in to Microsoft 365</h2>
<!-- GoPhish tracks form submission but can be configured to NOT store passwords -->
<input type="email" name="email" placeholder="Email">
<input type="password" name="password" placeholder="Password">
<button type="submit">Sign In</button>
</form>
</body>
</html>
<!-- After submission, redirect to training page explaining what happened -->
<!-- GoPhish: Settings > Landing Page > Redirect to: https://training.example-corp.com/phishing-caught -->
Campaign Execution Plan
| Week | Target Group | Scenario Level | Goal |
|---|---|---|---|
| 1 | All employees | Level 1 | Establish baseline click rate |
| 3 | Engineering | Level 3 | Test technical employees |
| 3 | Finance/HR | Level 4 | Test business process attacks |
| 5 | Sales/Marketing | Level 2 | Moderate difficulty |
| 7 | C-suite + Directors | Level 5 | Test high-value targets |
| 10 | All employees | Level 2 | Measure improvement from baseline |
Metrics and Reporting
┌─────────────────────────────────────────────────────────┐
│ PHISHING SIMULATION METRICS │
├──────────────────┬──────────────────────────────────────┤
│ Email open rate │ % who opened the phishing email │
│ Click rate │ % who clicked the phishing link │
│ Submit rate │ % who entered credentials │
│ Report rate │ % who reported to security/IT │
│ Time to report │ Average time from delivery to report │
│ Repeat offenders │ Users who clicked in multiple rounds │
└──────────────────┴──────────────────────────────────────┘
Industry benchmarks (KnowBe4 2024 data):
Baseline click rate: ~30-35%
After 12 months training: ~5-15%
Target report rate: > 60%
Ethical Guidelines — Non-Negotiable
- No punitive consequences. Clicking is a learning opportunity, not a fireable offense. Naming and shaming destroys trust and makes people hide incidents instead of reporting them.
- Immediate training redirect. When someone clicks, they immediately see a page explaining: what happened, what the red flags were, and how to report real phishing.
- Inform leadership but anonymize individual results. Report by department, not by name, unless someone requires additional 1:1 coaching.
- Do NOT send phishing simulations that could cause genuine distress — no fake termination notices, no fake health scares, no fake legal threats. These are counterproductive and may violate labor laws.
- Coordinate with HR and Legal before launching. Some jurisdictions require employee notification that simulations will occur (not when, but that they will).
Privacy note: The simulation collects employee behavioral data (who clicked, when, from what device). Under GDPR Art. 6(1)(f), this is processable under legitimate interest, but you should include it in your employee privacy notice and ensure the data is used only for training purposes with appropriate retention limits (delete individual-level results after 12 months).
ATT&CK: T1566.001 (Phishing: Spearphishing Attachment), T1566.002 (Phishing: Spearphishing Link), T1534 (Internal Spearphishing)
DETECTION OPPORTUNITIES: Use phishing simulations to validate your email security controls: Does your email gateway flag the lookalike domains? Does DMARC catch the spoofed headers? Do DLP rules detect credential submission to external sites? Each simulation is also a detection test.
Scenario 15: Secrets Management in Cloud Infrastructure
User prompt: "How do we securely store and rotate secrets in our cloud infrastructure? We're on AWS, have about 60 microservices, and I just found hardcoded API keys in three repos."
Mode: [MODE: ARCHITECT]
CIPHER Response:
REASON — Hardcoded secrets are a critical finding (CWE-798). Need immediate remediation plus architectural solution to prevent recurrence. 60 microservices = need centralized, scalable secrets management.
PLAN — Triage the hardcoded secrets, deploy secrets management, implement rotation, add preventive controls.
IMMEDIATE: Rotate the Compromised Secrets
Those hardcoded keys in git history are compromised — they exist in every clone, fork, and CI/CD cache. Rotating is mandatory.
# 1. Identify all hardcoded secrets across all repos
# Use Trufflehog (scans git history, not just current files)
trufflehog git https://github.com/your-org/repo1 --json | jq '.RawV2'
trufflehog git https://github.com/your-org/repo2 --json | jq '.RawV2'
# Or scan all org repos at once:
trufflehog github --org=your-org --json > all_secrets.json
# 2. For each found secret:
# a. Rotate/revoke the credential at the provider (AWS, Stripe, etc.)
# b. Check CloudTrail/access logs for unauthorized usage during exposure window
# c. Move the new credential to AWS Secrets Manager
# DO NOT try to rewrite git history to remove secrets — it's unreliable
# (the secret has been in every clone since it was committed)
# Instead: rotate the secret, making the exposed one worthless
Architecture: AWS Secrets Manager + IAM Roles
┌───────────────────────────────────────────────────────────┐
│ SECRETS ARCHITECTURE │
│ │
│ Microservice ──IRSA──► AWS Secrets Manager │
│ │ │ │
│ │ (no secrets in env │ Stores: │
│ │ vars, no config │ - DB credentials │
│ │ files, no K8s │ - API keys │
│ │ Secrets) │ - TLS private keys │
│ │ │ - Encryption keys │
│ ▼ │ │
│ Application reads │ Rotation: │
│ secret at runtime ◄──────────│ - Lambda-based auto-rotate │
│ via SDK call │ - 30/60/90 day schedules │
│ │ - Immediate on-demand │
│ │
│ ── ── ── ACCESS CONTROL ── ── ── ── ── ── ── ── ── ── │
│ Each service's IAM role can ONLY access its own secrets │
│ Resource-level permissions: arn:...:secret:svc-name/* │
└───────────────────────────────────────────────────────────┘
Implementation
1. Store secrets in AWS Secrets Manager:
# Create a secret
aws secretsmanager create-secret \
--name "production/payment-service/db" \
--description "Payment service database credentials" \
--secret-string '{"username":"payment_svc","password":"GENERATED_RANDOM_PASSWORD","host":"payment-db.cluster-xxx.us-east-1.rds.amazonaws.com","port":5432,"dbname":"payments"}' \
--kms-key-id alias/secrets-key \
--tags Key=Service,Value=payment-service Key=Environment,Value=production
2. IAM policy — least privilege per service:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadOwnSecrets",
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:production/payment-service/*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Service": "payment-service"
}
}
},
{
"Sid": "DenyAllOtherSecrets",
"Effect": "Deny",
"Action": "secretsmanager:GetSecretValue",
"NotResource": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:production/payment-service/*"
}
]
}
3. Application code — retrieve secrets at runtime:
import json
import boto3
from functools import lru_cache
from botocore.exceptions import ClientError
class SecretsClient:
def __init__(self, region: str = "us-east-1"):
self._client = boto3.client("secretsmanager", region_name=region)
@lru_cache(maxsize=32)
def get_secret(self, secret_name: str) -> dict:
"""Retrieve and parse a secret from AWS Secrets Manager.
Uses LRU cache to avoid repeated API calls. Cache is invalidated
on rotation by restarting the pod (via rotation Lambda webhook).
"""
try:
response = self._client.get_secret_value(SecretId=secret_name)
return json.loads(response["SecretString"])
except ClientError as e:
if e.response["Error"]["Code"] == "ResourceNotFoundException":
raise ValueError(f"Secret {secret_name} not found") from e
raise
def get_db_connection_string(self, service_name: str, env: str = "production") -> str:
secret = self.get_secret(f"{env}/{service_name}/db")
return (
f"postgresql://{secret['username']}:{secret['password']}"
f"@{secret['host']}:{secret['port']}/{secret['dbname']}"
)
# Usage
secrets = SecretsClient()
db_url = secrets.get_db_connection_string("payment-service")
4. Automatic rotation — Lambda-based:
# Lambda rotation function for RDS credentials
import boto3
import json
import string
import secrets as python_secrets
def lambda_handler(event, context):
"""AWS Secrets Manager rotation Lambda for RDS PostgreSQL."""
step = event["Step"]
secret_arn = event["SecretId"]
token = event["ClientRequestToken"]
sm_client = boto3.client("secretsmanager")
if step == "createSecret":
# Generate new password
current = json.loads(
sm_client.get_secret_value(SecretId=secret_arn, VersionStage="AWSCURRENT")["SecretString"]
)
alphabet = string.ascii_letters + string.digits + "!@#$%^&*"
new_password = "".join(python_secrets.choice(alphabet) for _ in range(40))
current["password"] = new_password
sm_client.put_secret_value(
SecretId=secret_arn,
ClientRequestToken=token,
SecretString=json.dumps(current),
VersionStages=["AWSPENDING"]
)
elif step == "setSecret":
# Update the password in RDS
pending = json.loads(
sm_client.get_secret_value(SecretId=secret_arn, VersionStage="AWSPENDING", VersionId=token)["SecretString"]
)
current = json.loads(
sm_client.get_secret_value(SecretId=secret_arn, VersionStage="AWSCURRENT")["SecretString"]
)
# Connect to RDS with current password, change to new password
conn = psycopg2.connect(
host=current["host"], port=current["port"],
user=current["username"], password=current["password"],
dbname=current["dbname"]
)
with conn.cursor() as cur:
cur.execute(
"ALTER USER %s WITH PASSWORD %s",
(pending["username"], pending["password"])
)
conn.commit()
conn.close()
elif step == "testSecret":
# Verify new password works
pending = json.loads(
sm_client.get_secret_value(SecretId=secret_arn, VersionStage="AWSPENDING", VersionId=token)["SecretString"]
)
conn = psycopg2.connect(
host=pending["host"], port=pending["port"],
user=pending["username"], password=pending["password"],
dbname=pending["dbname"]
)
conn.close()
elif step == "finishSecret":
sm_client.update_secret_version_stage(
SecretId=secret_arn, VersionStage="AWSCURRENT",
MoveToVersionId=token, RemoveFromVersionId=_get_current_version(sm_client, secret_arn)
)
# Enable rotation
aws secretsmanager rotate-secret \
--secret-id "production/payment-service/db" \
--rotation-lambda-arn arn:aws:lambda:us-east-1:ACCOUNT:function:rds-rotation \
--rotation-rules '{"AutomaticallyAfterDays": 30}'
Prevention: Stop Secrets from Entering Code
# .pre-commit-config.yaml — catches secrets before they enter git
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.1
hooks:
- id: gitleaks
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
# GitHub organization-level: enable secret scanning and push protection
# Settings > Code security and analysis > Secret scanning: Enable
# Settings > Code security and analysis > Push protection: Enable
# This blocks pushes containing detected secrets patterns (API keys, tokens)
Secrets Hygiene Checklist
| Practice | Implementation |
|---|---|
| No secrets in code | Pre-commit hooks + GitHub push protection |
| No secrets in env vars | Use SDK-based retrieval from Secrets Manager |
| No secrets in K8s Secrets | Use External Secrets Operator or CSI driver |
| No secrets in CI/CD config | Use OIDC federation for AWS auth in GitHub Actions |
| Automatic rotation | Lambda rotation every 30 days for DB, 90 for API keys |
| Audit trail | CloudTrail logs every GetSecretValue call |
| Alerting | CloudWatch alarm on failed GetSecretValue (may indicate stolen role) |
| Least privilege | Each service can only access its own secrets |
ATT&CK: T1552.001 (Unsecured Credentials: Credentials In Files), T1552.005 (Cloud Instance Metadata API) CIS Controls: Control 3.10 (Encrypt sensitive data at rest), Control 16 (Application Software Security)
Scenario 16: Docker Compose Security Review
User prompt: "Review this Docker Compose file for security issues."
version: "3.8"
services:
webapp:
image: myapp:latest
ports:
- "80:8080"
- "443:8443"
environment:
- DB_PASSWORD=supersecret123
- JWT_SECRET=mysecretjwtkey2024
- AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./data:/app/data
privileged: true
network_mode: host
restart: always
db:
image: postgres:latest
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=supersecret123
- POSTGRES_USER=admin
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:latest
ports:
- "6379:6379"
adminer:
image: adminer
ports:
- "8080:8080"
restart: always
volumes:
pgdata:
Mode: [MODE: ARCHITECT] with RED background layer
CIPHER Response:
Findings
[FINDING-001]
Severity : Critical
CWE : CWE-798 (Use of Hard-coded Credentials)
ATT&CK : T1552.001 (Unsecured Credentials: Credentials In Files)
Location : webapp.environment — DB_PASSWORD, JWT_SECRET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Description: Four credentials are hardcoded in plaintext in the compose file.
This file is likely committed to version control, exposing secrets
to everyone with repo access and in git history permanently.
Impact : AWS account compromise (the AWS keys grant whatever permissions
the associated IAM user has), database access, JWT token forgery.
Remediation: Use Docker secrets, .env file (gitignored), or external secrets manager.
Rotate ALL four credentials immediately — they are compromised.
[FINDING-002]
Severity : Critical
CWE : CWE-250 (Execution with Unnecessary Privileges)
ATT&CK : T1611 (Escape to Host)
Location : webapp — privileged: true
Description: Privileged mode disables ALL container security boundaries.
The container runs with full root capabilities on the host,
can load kernel modules, access all devices, and trivially
escape to the host OS.
Impact : Container escape → full host compromise. If this container is
internet-facing (it is, on port 80/443), any webapp vulnerability
equals host root.
Remediation: Remove "privileged: true". Add only specific capabilities needed:
cap_add: [NET_BIND_SERVICE] (if binding <1024 ports).
Better: bind to high port and use a reverse proxy.
[FINDING-003]
Severity : Critical
CWE : CWE-269 (Improper Privilege Management)
ATT&CK : T1611 (Escape to Host)
Location : webapp.volumes — /var/run/docker.sock:/var/run/docker.sock
Description: Mounting the Docker socket into a container grants full control
over the Docker daemon. An attacker can create new privileged
containers, mount the host filesystem, and escape trivially:
docker run -v /:/host --privileged alpine chroot /host
Impact : Full host compromise. Equivalent to host root access.
Remediation: Remove the socket mount. If the app needs to manage containers,
use a Docker API proxy with restricted permissions (e.g., Tecnativa
docker-socket-proxy) or redesign to not require Docker access.
[FINDING-004]
Severity : High
CWE : CWE-668 (Exposure of Resource to Wrong Sphere)
ATT&CK : T1190 (Exploit Public-Facing Application)
Location : webapp — network_mode: host
Description: Host networking mode exposes all host network interfaces and
ports to the container, and all container ports directly on the
host. This bypasses Docker's network isolation, iptables rules,
and makes the container indistinguishable from a host process
on the network.
Impact : Container can access all host services (127.0.0.1), ARP spoof,
sniff network traffic. Combined with privileged mode, this is
essentially running the app directly on the host.
Remediation: Remove network_mode: host. Use the default bridge network or
a custom network with explicit port mappings.
[FINDING-005]
Severity : High
CWE : CWE-668 (Exposure of Resource to Wrong Sphere)
Location : db — ports: "5432:5432"
Description: PostgreSQL is exposed on 0.0.0.0:5432 — accessible from any
network interface, including the internet if this host has a
public IP. Database ports should never be publicly exposed.
Impact : Direct database access from the internet. Combined with the
weak password "supersecret123", brute force or credential
stuffing is trivial.
Remediation: Remove the ports mapping entirely. The webapp can reach the
database via the Docker internal network without port exposure.
[FINDING-006]
Severity : High
CWE : CWE-668 (Exposure of Resource to Wrong Sphere)
Location : redis — ports: "6379:6379", no authentication
Description: Redis is exposed on 0.0.0.0:6379 with no password. Redis has
no authentication by default. Anyone can connect and execute
arbitrary commands, including CONFIG SET to write files to disk
(a known RCE technique).
Impact : Unauthenticated data access, RCE via Redis CONFIG SET exploitation.
Remediation: Remove port exposure. Add requirepass. Use ACLs (Redis 6+).
[FINDING-007]
Severity : High
CWE : CWE-668 (Exposure of Resource to Wrong Sphere)
Location : adminer — ports: "8080:8080"
Description: Adminer (database management UI) is exposed on 0.0.0.0:8080
with no authentication wrapper. It provides a web interface to
any database it can reach — including the PostgreSQL instance
with hardcoded credentials.
Impact : Unauthenticated database administration from the internet.
Complete data breach, data modification, data destruction.
Remediation: Remove from production entirely. If needed for development,
bind to localhost only: "127.0.0.1:8080:8080" and add
authentication (nginx basic auth or VPN-only access).
[FINDING-008]
Severity : Medium
CWE : CWE-1104 (Use of Unmaintained Third Party Components)
Location : All services — using :latest tags
Description: All images use :latest tags which are mutable. The same tag
can point to different images over time. This means:
1. Builds are not reproducible
2. A compromised registry can push malicious :latest
3. You cannot audit which version is running
Impact : Supply chain risk, unpredictable deployments, inability to
roll back to a known-good version.
Remediation: Pin images by version AND digest:
postgres:16.2@sha256:abc123...
[FINDING-009]
Severity : Medium
Location : No healthchecks defined on any service
Description: Without healthchecks, Docker cannot detect when a service is
unhealthy and restart it. For security, this means a compromised
service that stops responding to legitimate requests will remain
running indefinitely.
Remediation: Add healthcheck to each service.
[FINDING-010]
Severity : Medium
Location : No resource limits defined
Description: No CPU or memory limits on any container. A compromised
container can consume all host resources (DoS) or a
cryptominer can use 100% CPU undetected.
Remediation: Add deploy.resources.limits to each service.
Remediated Docker Compose
version: "3.8"
services:
webapp:
image: myapp:1.2.3@sha256:abc123def456...
ports:
- "127.0.0.1:8080:8080" # Behind reverse proxy, not direct
env_file:
- .env # Secrets in .env file (gitignored)
# NO privileged, NO docker.sock, NO host network
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp:size=100M
user: "10001:10001"
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
networks:
- frontend
- backend
restart: unless-stopped
depends_on:
db:
condition: service_healthy
db:
image: postgres:16.2@sha256:def456...
# NO ports exposed — only accessible via internal network
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
POSTGRES_USER: payment_svc
POSTGRES_DB: payments
secrets:
- db_password
volumes:
- pgdata:/var/lib/postgresql/data
security_opt:
- no-new-privileges:true
user: "999:999" # postgres user
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U payment_svc -d payments"]
interval: 10s
timeout: 5s
retries: 5
networks:
- backend
restart: unless-stopped
redis:
image: redis:7.2@sha256:ghi789...
# NO ports exposed
command: >
redis-server
--requirepass $${REDIS_PASSWORD}
--maxmemory 256mb
--maxmemory-policy allkeys-lru
env_file:
- .env
security_opt:
- no-new-privileges:true
read_only: true
user: "999:999"
deploy:
resources:
limits:
cpus: '0.5'
memory: 300M
healthcheck:
test: ["CMD", "redis-cli", "-a", "$${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 3s
retries: 3
networks:
- backend
restart: unless-stopped
# Adminer REMOVED from production
# Use it only in development with: docker compose -f docker-compose.yml -f docker-compose.dev.yml up
reverse-proxy:
image: nginx:1.25@sha256:jkl012...
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./certs:/etc/nginx/certs:ro
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /var/cache/nginx
- /var/run
deploy:
resources:
limits:
cpus: '0.5'
memory: 128M
networks:
- frontend
restart: unless-stopped
secrets:
db_password:
file: ./secrets/db_password.txt # Or use external secrets manager
volumes:
pgdata:
driver: local
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access — only inter-container
# .env (gitignored)
DB_PASSWORD=<generated-random-64-char>
JWT_SECRET=<generated-random-64-char>
REDIS_PASSWORD=<generated-random-32-char>
# NO AWS credentials — use IAM instance roles or IRSA instead
# .gitignore — MUST include
.env
secrets/
*.pem
*.key
DETECTION OPPORTUNITIES: Monitor for Docker socket access from containers (auditd -w /var/run/docker.sock), privileged container creation, and containers running as root. Deploy Falco for runtime container security monitoring.
Scenario 17: Sigma Detection Rules for Critical Use Cases
User prompt: "Help me set up Sigma rules for our critical detection use cases. We need coverage for: credential dumping, command-and-control beaconing, data exfiltration, and privilege escalation on Windows."
Mode: [MODE: PURPLE]
CIPHER Response:
REASON — Detection engineering across four high-priority use cases. Need rules that are production-ready with tuning guidance, not academic exercises.
PLAN — One or two rules per use case, with conversion commands for the operator's SIEM, plus tuning notes.
Use Case 1: Credential Dumping (T1003)
title: LSASS Memory Access via Suspicious Process
id: 3a4b5c6d-7e8f-9a0b-1c2d-3e4f5a6b7c8d
status: stable
description: >
Detects processes accessing lsass.exe memory, which is the primary
method for extracting credentials from Windows. Covers Mimikatz,
procdump, comsvcs.dll MiniDump, and direct API calls.
references:
- https://attack.mitre.org/techniques/T1003/001/
logsource:
category: process_access
product: windows
detection:
selection:
TargetImage|endswith: '\lsass.exe'
GrantedAccess|contains:
- '0x1010' # PROCESS_QUERY_LIMITED_INFORMATION + PROCESS_VM_READ
- '0x1410' # + PROCESS_QUERY_INFORMATION
- '0x1438' # Full dump access
- '0x1F0FFF' # PROCESS_ALL_ACCESS
- '0x1FFFFF' # PROCESS_ALL_ACCESS (alternate)
filter_legitimate:
SourceImage|endswith:
- '\MsMpEng.exe' # Windows Defender
- '\csrss.exe'
- '\lsass.exe' # Self-access
- '\svchost.exe'
- '\wininit.exe'
- '\vmtoolsd.exe' # VMware Tools
- '\taskhostw.exe'
SourceImage|startswith:
- 'C:\Program Files\CrowdStrike\'
- 'C:\Program Files\SentinelOne\'
- 'C:\Program Files\CarbonBlack\'
condition: selection and not filter_legitimate
falsepositives:
- AV/EDR products not in the filter list (add your specific EDR)
- Credential Guard health checks
level: critical
tags:
- attack.credential_access
- attack.t1003.001
title: Credential Dumping via Comsvcs.dll MiniDump
id: 4b5c6d7e-8f9a-0b1c-2d3e-4f5a6b7c8d9e
status: stable
description: >
Detects use of comsvcs.dll MiniDump export function to dump LSASS.
Commonly used as a LOLBin alternative to Mimikatz that evades
signature-based detection.
logsource:
category: process_creation
product: windows
detection:
selection_rundll32:
Image|endswith: '\rundll32.exe'
CommandLine|contains|all:
- 'comsvcs'
- 'MiniDump'
selection_direct:
CommandLine|contains|all:
- 'comsvcs.dll'
- '#24' # MiniDump ordinal number
condition: selection_rundll32 or selection_direct
falsepositives:
- None known — this is almost always malicious
level: critical
tags:
- attack.credential_access
- attack.t1003.001
Use Case 2: Command-and-Control Beaconing (T1071)
title: Potential C2 Beaconing - Regular Interval DNS Queries
id: 5c6d7e8f-9a0b-1c2d-3e4f-5a6b7c8d9e0f
status: experimental
description: >
Detects DNS queries to the same domain occurring at suspiciously regular
intervals, indicating potential C2 beaconing with DNS-based communication.
Requires DNS query logging (Sysmon Event ID 22 or DNS server logs).
logsource:
category: dns_query
product: windows
detection:
selection:
EventID: 22 # Sysmon DNS query
filter_known:
QueryName|endswith:
- '.microsoft.com'
- '.windows.com'
- '.windowsupdate.com'
- '.office.com'
- '.office365.com'
- '.googleapis.com'
- '.gstatic.com'
condition: selection and not filter_known
# NOTE: This rule requires a SIEM-side aggregation to detect regularity.
# The Sigma rule captures the events; the beaconing analysis happens
# in the SIEM query below.
falsepositives:
- Legitimate applications with regular polling intervals (NTP, health checks)
- Content delivery networks
level: medium
tags:
- attack.command_and_control
- attack.t1071.004
SIEM-side beaconing analysis (Splunk):
index=sysmon EventCode=22
| where NOT match(QueryName, "\.(microsoft|windows|office|google|amazonaws)\.com$")
| stats count, values(QueryName) as domains, earliest(_time) as first_seen,
latest(_time) as last_seen by Image, ComputerName
| eval duration=last_seen-first_seen
| eval avg_interval=if(count>1, duration/(count-1), 0)
| where count > 20 AND avg_interval > 30 AND avg_interval < 3600
| eval beacon_score=if(avg_interval>0, round(1-(stdev(eval(mvmap(split(duration,","), tonumber(trim(_value)))))/avg_interval), 2), 0)
| where beacon_score > 0.85
| sort -beacon_score
| table ComputerName Image domains count avg_interval beacon_score
title: Potential C2 via Encrypted Channel to Uncommon Port
id: 6d7e8f9a-0b1c-2d3e-4f5a-6b7c8d9e0f1a
status: experimental
description: >
Detects outbound TLS/SSL connections to non-standard ports (not 443, 8443).
Many C2 frameworks use encrypted channels on uncommon ports to avoid
inspection by TLS-intercepting proxies.
logsource:
category: network_connection
product: windows
detection:
selection:
Initiated: 'true'
DestinationPort|gt: 1024
filter_common_ports:
DestinationPort:
- 443
- 8443
- 8080
- 80
filter_internal:
DestinationIp|startswith:
- '10.'
- '172.16.'
- '172.17.'
- '172.18.'
- '172.19.'
- '172.20.'
- '172.21.'
- '172.22.'
- '172.23.'
- '172.24.'
- '172.25.'
- '172.26.'
- '172.27.'
- '172.28.'
- '172.29.'
- '172.30.'
- '172.31.'
- '192.168.'
- '127.'
condition: selection and not filter_common_ports and not filter_internal
falsepositives:
- VPN clients using non-standard ports
- Gaming clients, video conferencing on UDP high ports
level: medium
tags:
- attack.command_and_control
- attack.t1571
- attack.t1573
Use Case 3: Data Exfiltration (T1048)
title: Large Outbound Data Transfer via HTTP/S
id: 7e8f9a0b-1c2d-3e4f-5a6b-7c8d9e0f1a2b
status: experimental
description: >
Detects processes sending unusually large amounts of data outbound over
HTTP/S. Requires Sysmon network connection events with byte counts or
proxy/firewall logs.
logsource:
category: proxy
product: any
detection:
selection:
cs-bytes|gte: 52428800 # 50MB in a single request
filter_known_uploads:
cs-uri|contains:
- 'upload'
- 'backup'
- '.sharepoint.com'
- '.onedrive.com'
condition: selection and not filter_known_uploads
falsepositives:
- Legitimate large file uploads (CI/CD artifact push, video uploads)
- Cloud backup agents
level: high
tags:
- attack.exfiltration
- attack.t1048.002
title: Data Exfiltration via DNS Tunneling - High Entropy Subdomain Queries
id: 8f9a0b1c-2d3e-4f5a-6b7c-8d9e0f1a2b3c
status: experimental
description: >
Detects DNS queries with unusually long, high-entropy subdomain labels,
which is a strong indicator of DNS tunneling for data exfiltration
(iodine, dns2tcp, dnscat2).
logsource:
category: dns_query
product: windows
detection:
selection:
EventID: 22
filter_short:
QueryName|re: '^[^.]{0,30}\.' # Normal subdomain length
condition: selection and not filter_short
# NOTE: Entropy calculation must happen SIEM-side. This Sigma rule
# selects DNS queries with subdomain labels > 30 chars.
# Add SIEM-side: Shannon entropy > 3.5 on subdomain portion.
falsepositives:
- DKIM TXT record lookups (very long but structured, low entropy)
- CDN hostnames with hash-based subdomains
level: high
tags:
- attack.exfiltration
- attack.t1048.003
Use Case 4: Privilege Escalation (T1068, T1134, T1543)
title: New Service Creation with Suspicious Binary Path
id: 9a0b1c2d-3e4f-5a6b-7c8d-9e0f1a2b3c4d
status: stable
description: >
Detects creation of Windows services pointing to binaries in user-writable
directories (Temp, AppData, Public, ProgramData). Legitimate services
install to Program Files or System32.
logsource:
product: windows
service: system
detection:
selection:
EventID: 7045
filter_legitimate_paths:
ImagePath|startswith:
- 'C:\Windows\System32\'
- 'C:\Windows\SysWOW64\'
- 'C:\Program Files\'
- 'C:\Program Files (x86)\'
- '"C:\Windows\System32\'
- '"C:\Program Files\'
- '"C:\Program Files (x86)\'
condition: selection and not filter_legitimate_paths
falsepositives:
- Some third-party software installs to non-standard paths
- Development/testing environments
level: high
tags:
- attack.privilege_escalation
- attack.persistence
- attack.t1543.003
title: Token Impersonation via Named Pipe
id: 0b1c2d3e-4f5a-6b7c-8d9e-0f1a2b3c4d5e
status: experimental
description: >
Detects potential privilege escalation via token impersonation tools
(PrintSpoofer, JuicyPotato, GodPotato, RoguePotato) that abuse
named pipe impersonation to escalate from service accounts to SYSTEM.
logsource:
category: process_creation
product: windows
detection:
selection_tools:
Image|endswith:
- '\PrintSpoofer.exe'
- '\JuicyPotato.exe'
- '\GodPotato.exe'
- '\RoguePotato.exe'
- '\SharpEfsPotato.exe'
selection_cmdline:
CommandLine|contains:
- 'PrintSpoofer'
- 'JuicyPotato'
- 'GodPotato'
- 'RoguePotato'
- 'EfsPotato'
selection_pipe_impersonation:
CommandLine|contains|all:
- 'CreateProcessAsUser'
- 'ImpersonateNamedPipeClient'
condition: selection_tools or selection_cmdline or selection_pipe_impersonation
falsepositives:
- Security testing tools with these exact names (should be coordinated with SOC)
level: critical
tags:
- attack.privilege_escalation
- attack.t1134.001
- attack.t1068
Conversion Commands
# Convert all rules to Splunk
sigma convert -t splunk -p splunk_cim *.yml
# Convert all rules to Elastic/KQL
sigma convert -t lucene -p ecs_windows *.yml
# Convert to Microsoft Sentinel
sigma convert -t kusto -p microsoft365defender *.yml
# Validate rules before deployment
sigma check *.yml
Deployment Recommendations
| Rule | Tuning Period | Expected FP Rate | Action |
|---|---|---|---|
| LSASS access | 2 weeks | Low (add your EDR to filter) | Alert + auto-isolate endpoint |
| Comsvcs MiniDump | None | Near zero | Alert + auto-isolate |
| DNS beaconing | 4 weeks (build baseline) | Medium (tune polling apps) | Alert for investigation |
| C2 uncommon port | 2 weeks | Medium (VPN/game clients) | Alert for investigation |
| Large outbound transfer | 2 weeks | Medium (tune upload services) | Alert for investigation |
| DNS tunneling | 1 week | Low | Alert + block at DNS |
| Suspicious service | 2 weeks | Low-Medium | Alert + investigate |
| Token impersonation | None | Near zero | Alert + auto-isolate |
Scenario 18: API BOLA Vulnerability Remediation
User prompt: "Our API was breached through BOLA (Broken Object-Level Authorization). An attacker accessed other users' data by changing the ID in the URL: /api/users/123/documents became /api/users/456/documents. How do we prevent this across our entire API?"
Mode: [MODE: ARCHITECT] with RED background layer
CIPHER Response:
REASON — BOLA (IDOR) is OWASP API Top 10 #1 (API1:2023). This is not a single bug — it is a systemic authorization design flaw. Every endpoint that accepts a user-controlled resource identifier is potentially vulnerable.
PLAN — Fix the specific breach, audit all endpoints, implement authorization middleware, add automated testing.
Root Cause Analysis
The vulnerability exists because the API trusts user-supplied identifiers without verifying that the authenticated user is authorized to access the referenced resource.
Vulnerable pattern:
# VULNERABLE — no authorization check on the resource
@app.route('/api/users/<int:user_id>/documents')
@require_auth # Only checks "is the user logged in" — NOT "can they access this resource"
def get_user_documents(user_id: int):
documents = Document.query.filter_by(user_id=user_id).all()
return jsonify([doc.to_dict() for doc in documents])
# Attacker changes user_id from their own (123) to victim's (456) → data breach
Fix 1: Authorization Enforcement Pattern
from functools import wraps
from flask import g, abort, request
from typing import Callable, Any
def authorize_resource_owner(resource_user_id_param: str = "user_id"):
"""Decorator: verify the authenticated user owns the requested resource."""
def decorator(f: Callable) -> Callable:
@wraps(f)
def decorated_function(*args: Any, **kwargs: Any) -> Any:
resource_user_id = kwargs.get(resource_user_id_param)
if resource_user_id is None:
abort(400, description="Missing resource identifier")
# g.current_user is set by the @require_auth middleware
if g.current_user.id != resource_user_id:
# Check if user has admin/support role that permits cross-user access
if not g.current_user.has_role("admin"):
# Log the access attempt — this may be an attack
app.logger.warning(
"BOLA attempt: user=%s tried to access resource for user=%s endpoint=%s",
g.current_user.id, resource_user_id, request.path
)
abort(403) # Return 403 (not 404) — attacker already knows the resource exists
return f(*args, **kwargs)
return decorated_function
return decorator
# FIXED — authorization check on resource ownership
@app.route('/api/users/<int:user_id>/documents')
@require_auth
@authorize_resource_owner("user_id")
def get_user_documents(user_id: int):
documents = Document.query.filter_by(user_id=user_id).all()
return jsonify([doc.to_dict() for doc in documents])
Fix 2: Eliminate User-Controlled IDs (Preferred Architecture)
# BEST PATTERN — use the authenticated user's identity from the token, not the URL
@app.route('/api/me/documents') # /api/me/ instead of /api/users/<id>/
@require_auth
def get_my_documents():
# g.current_user is derived from the JWT token — cannot be tampered
documents = Document.query.filter_by(user_id=g.current_user.id).all()
return jsonify([doc.to_dict() for doc in documents])
# For individual document access — use UUIDs and ownership check
@app.route('/api/documents/<uuid:document_id>')
@require_auth
def get_document(document_id: str):
document = Document.query.filter_by(
id=document_id,
user_id=g.current_user.id # Ownership baked into the query
).first_or_404()
return jsonify(document.to_dict())
Fix 3: Global Authorization Middleware
from flask import Flask, g, request, abort
import re
class AuthorizationMiddleware:
"""Global middleware that enforces resource ownership on all endpoints
matching /api/users/<id>/... pattern."""
USER_RESOURCE_PATTERN = re.compile(r'^/api/users/(\d+)/')
def __init__(self, app: Flask):
self.app = app
app.before_request(self.check_resource_authorization)
def check_resource_authorization(self):
if not hasattr(g, 'current_user') or g.current_user is None:
return # Not authenticated — let auth middleware handle it
match = self.USER_RESOURCE_PATTERN.match(request.path)
if match:
resource_user_id = int(match.group(1))
if resource_user_id != g.current_user.id and not g.current_user.has_role("admin"):
self.app.logger.warning(
"BOLA blocked: auth_user=%s target_user=%s path=%s method=%s ip=%s",
g.current_user.id, resource_user_id, request.path, request.method, request.remote_addr
)
abort(403)
Fix 4: Use UUIDs Instead of Sequential IDs
import uuid
from sqlalchemy.dialects.postgresql import UUID as PG_UUID
class Document(db.Model):
# Use UUID primary keys — not enumerable, not guessable
id = db.Column(PG_UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
user_id = db.Column(PG_UUID(as_uuid=True), db.ForeignKey('users.id'), nullable=False)
# Sequential IDs (1, 2, 3, ...) invite enumeration attacks
# UUIDs (550e8400-e29b-41d4-a716-446655440000) do not
Audit: Find All Vulnerable Endpoints
# Find all routes with user-controlled resource identifiers
grep -rn "/<int:.*_id>" --include="*.py" src/ | grep -i "route\|api"
grep -rn "/<.*_id>" --include="*.py" src/ | grep -i "route\|api"
# Find endpoints missing authorization decorators
# Look for @app.route without @authorize_resource_owner or equivalent
grep -B2 "def.*user_id" --include="*.py" src/ | grep -v "authorize\|ownership\|permission"
# Semgrep rule for BOLA detection
cat > bola_check.yml << 'SEMGREP_EOF'
rules:
- id: potential-bola
pattern: |
@app.route('...<int:$ID>...')
def $FUNC(..., $ID, ...):
...
$MODEL.query.filter_by($FIELD=$ID)
...
message: >
Potential BOLA: endpoint uses URL parameter $ID directly in database
query without ownership verification. Ensure authorization check
verifies the authenticated user owns this resource.
severity: ERROR
languages: [python]
SEMGREP_EOF
semgrep --config bola_check.yml src/
Automated Testing for BOLA
import pytest
from app import create_app
class TestBOLA:
"""Automated BOLA regression tests.
For every endpoint that returns user-specific data, verify that
User A cannot access User B's resources."""
def setup_method(self):
self.app = create_app("testing")
self.client = self.app.test_client()
# Create two test users with separate tokens
self.user_a_token = self._create_user_and_get_token("user_a@test.com")
self.user_b_token = self._create_user_and_get_token("user_b@test.com")
self.user_a_id = self._get_user_id(self.user_a_token)
self.user_b_id = self._get_user_id(self.user_b_token)
@pytest.mark.parametrize("endpoint_template", [
"/api/users/{victim_id}/documents",
"/api/users/{victim_id}/settings",
"/api/users/{victim_id}/payments",
"/api/users/{victim_id}/profile",
])
def test_bola_cross_user_access_denied(self, endpoint_template: str):
"""User A must NOT be able to access User B's resources."""
endpoint = endpoint_template.format(victim_id=self.user_b_id)
response = self.client.get(
endpoint,
headers={"Authorization": f"Bearer {self.user_a_token}"}
)
assert response.status_code in (403, 404), (
f"BOLA VULNERABILITY: User A accessed User B's data at {endpoint}. "
f"Got {response.status_code} instead of 403/404."
)
@pytest.mark.parametrize("method,endpoint_template,body", [
("PUT", "/api/users/{victim_id}/settings", {"notifications": False}),
("DELETE", "/api/users/{victim_id}/documents/1", None),
("POST", "/api/users/{victim_id}/documents", {"title": "pwned"}),
])
def test_bola_cross_user_modification_denied(self, method: str, endpoint_template: str, body: dict | None):
"""User A must NOT be able to modify User B's resources."""
endpoint = endpoint_template.format(victim_id=self.user_b_id)
response = getattr(self.client, method.lower())(
endpoint,
json=body,
headers={"Authorization": f"Bearer {self.user_a_token}"}
)
assert response.status_code in (403, 404), (
f"BOLA VULNERABILITY: User A modified User B's data at {method} {endpoint}."
)
ATT&CK: T1530 (Data from Cloud Storage Object — analogous to unauthorized data access) OWASP API: API1:2023 (Broken Object Level Authorization) CWE: CWE-639 (Authorization Bypass Through User-Controlled Key)
DETECTION OPPORTUNITIES: Log and alert on 403 responses where the authenticated user ID does not match the requested resource's user ID. A burst of such attempts from a single session indicates active exploitation. Deploy rate limiting per-user on resource access endpoints.
Scenario 19: Threat Model for Payment Processing Microservice
User prompt: "Create a threat model for our new payment processing microservice. It accepts credit card payments, communicates with Stripe, stores transaction records in PostgreSQL, publishes events to Kafka, and is deployed on EKS."
Mode: [MODE: ARCHITECT]
CIPHER Response:
REASON — Payment processing = PCI DSS scope. Threat model must cover STRIDE per component, identify trust boundaries, and map to PCI requirements.
PLAN — DFD → trust boundaries → STRIDE analysis → DREAD scoring → mitigations
Data Flow Diagram
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: EKS CLUSTER │
│ │
┌──────────┐ HTTPS/TLS │ ┌────────────┐ gRPC/mTLS ┌────────────────┐ │
│ Customer │◄──────────────►│ │ API Gateway │◄────────────────►│ Payment Service │ │
│ Browser │ │ │ (Ingress) │ │ (Pod) │ │
└──────────┘ │ └────────────┘ └───┬────┬───┬────┘ │
│ │ │ │ │
│ ┌─────────────┘ │ └──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ PostgreSQL │ │ Kafka │ │ Vault │ │
│ │ (RDS) │ │ (MSK) │ │ (Secrets)│ │
│ └────────────┘ └─────┬──────┘ └──────────┘ │
│ │ │
└──────────────────────────────────────────┼──────────────────────┘
│
┌───────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ TRUST BOUNDARY: EXTERNAL │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Stripe API │ │ Fraud Detection │ │
│ │ (Payment │ │ Service │ │
│ │ Processor) │ │ (3rd party) │ │
│ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────┘
Trust Boundaries
| Boundary | From | To | Data Crossing |
|---|---|---|---|
| TB-1 | Customer browser | API Gateway | Card number, CVV, billing address (HTTPS) |
| TB-2 | API Gateway | Payment Service | Tokenized card reference, amount (mTLS) |
| TB-3 | Payment Service | Stripe API | Stripe token, amount, metadata (HTTPS) |
| TB-4 | Payment Service | PostgreSQL | Transaction records, customer ref (TLS) |
| TB-5 | Payment Service | Kafka | Transaction events (mTLS) |
| TB-6 | Payment Service | Vault | Secret retrieval (mTLS + token auth) |
STRIDE Analysis
| Component | Threat | STRIDE | ATT&CK | DREAD Score | Risk |
|---|---|---|---|---|---|
| API Gateway | Attacker replays captured payment request | Spoofing | T1557 | D:7 R:8 E:6 A:9 D:5 = 7.0 | HIGH |
| API Gateway | DDoS prevents legitimate payments | DoS | T1498 | D:8 R:9 E:8 A:10 D:8 = 8.6 | CRITICAL |
| Payment Service | SQLi to extract transaction records | Tampering, Info Disclosure | T1190 | D:9 R:7 E:6 A:9 D:6 = 7.4 | HIGH |
| Payment Service | Attacker modifies payment amount in transit | Tampering | T1565 | D:9 R:5 E:4 A:9 D:3 = 6.0 | HIGH |
| Payment Service | BOLA — access other users' transaction history | Info Disclosure | T1530 | D:8 R:9 E:7 A:8 D:7 = 7.8 | HIGH |
| Payment Service | Privilege escalation to process refunds without authorization | Elevation | T1068 | D:9 R:6 E:5 A:7 D:4 = 6.2 | HIGH |
| Payment Service | Transaction processed but no audit log created | Repudiation | — | D:7 R:6 E:7 A:8 D:5 = 6.6 | MEDIUM |
| PostgreSQL | Database backup exfiltrated | Info Disclosure | T1530 | D:9 R:5 E:4 A:9 D:4 = 6.2 | HIGH |
| Kafka | Unauthorized consumer reads payment events | Info Disclosure | T1040 | D:7 R:7 E:5 A:8 D:5 = 6.4 | MEDIUM |
| Kafka | Malicious event injected into payment topic | Tampering | T1565 | D:8 R:6 E:5 A:7 D:4 = 6.0 | HIGH |
| Stripe API | Stripe API key exfiltrated, used for unauthorized refunds | Spoofing, Info Disclosure | T1552 | D:9 R:8 E:7 A:9 D:6 = 7.8 | CRITICAL |
| Vault | Token theft allows secret extraction | Elevation | T1552 | D:9 R:5 E:4 A:9 D:3 = 6.0 | HIGH |
| Customer Browser | Card skimming via XSS in checkout page | Info Disclosure | T1185 | D:9 R:7 E:6 A:10 D:7 = 7.8 | CRITICAL |
Mitigations Table
| Threat | Mitigation | PCI DSS Req | Owner | Status |
|---|---|---|---|---|
| Replay attacks | Idempotency keys on all payment endpoints + request timestamp validation (reject >5 min old) | Req 6.5.10 | Backend | TODO |
| DDoS | AWS WAF + Shield Advanced on ALB, rate limiting per customer at API gateway (10 req/min for /payments) | Req 6.6 | Platform | TODO |
| SQLi | Parameterized queries only (ORM), WAF SQL injection rule set, input validation on all fields | Req 6.5.1 | Backend | TODO |
| Amount tampering | Re-validate amount server-side against order record, never trust client-submitted amounts | Req 6.5.1 | Backend | TODO |
| BOLA | Authorization middleware, /me/ pattern, ownership checks in DB queries | Req 7.1 | Backend | TODO |
| Unauthorized refunds | Separate refund permission, dual-approval for refunds > $500, audit log all refund operations | Req 7.1, 10.2 | Backend | TODO |
| Missing audit logs | Structured logging of every payment operation with immutable log shipping to S3 (write-once) | Req 10.2, 10.3 | Platform | TODO |
| DB backup exfil | Encrypt backups with CMK, restrict IAM access to backup bucket, enable S3 access logging | Req 3.4, 7.1 | Platform | TODO |
| Kafka unauthorized access | Kafka ACLs per topic, mTLS for producer/consumer auth, separate service accounts per consumer | Req 7.1 | Platform | TODO |
| Kafka message injection | Producer ACLs restrict write access, message signing with HMAC | Req 7.1 | Platform | TODO |
| Stripe key exfil | Store in Vault with short TTL lease, use restricted API keys (not secret key), monitor Stripe dashboard for unusual API usage | Req 3.4, 8.2 | Backend | TODO |
| Vault token theft | Short-lived tokens (1 hour TTL), Kubernetes auth method (no static tokens), audit logging | Req 8.2 | Platform | TODO |
| XSS/card skimming | Content Security Policy (strict), Subresource Integrity, Stripe.js (PCI-compliant iframe — card data never touches your server) | Req 6.5.7 | Frontend | TODO |
PCI DSS Scope Reduction Strategy
Critical recommendation: Use Stripe Elements or Stripe.js Checkout. This keeps card numbers in Stripe's PCI-certified iframe — they never touch your server, reducing your PCI scope from SAQ D to SAQ A or SAQ A-EP. [CONFIRMED — Stripe documentation]
WITHOUT Stripe.js:
Customer → Your Server (card number) → Stripe
PCI Scope: SAQ D (full audit, ~300 requirements)
WITH Stripe.js:
Customer → Stripe iframe (card number) → Stripe → Your Server (token only)
PCI Scope: SAQ A-EP (~30 requirements)
This single architectural decision eliminates 90% of your PCI compliance burden.
Monitoring and Detection
# Alert on payment anomalies
alerts:
- name: "High-value transaction spike"
condition: "count(transactions > $10000) in 5min > 10"
severity: critical
- name: "Refund rate anomaly"
condition: "refund_count / transaction_count > 0.1 in 1h"
severity: high
- name: "Failed payment spike (card testing)"
condition: "count(payment_status=failed) by source_ip > 20 in 5min"
severity: high
- name: "Payment from TOR exit node"
condition: "source_ip IN tor_exit_nodes AND endpoint=/api/payments"
severity: medium
Scenario 20: OT/ICS Security Monitoring
User prompt: "We need to set up security monitoring for our OT/ICS environment. We have a water treatment facility with Siemens PLCs, SCADA system running on Windows Server 2016, a flat network between IT and OT, and no current security monitoring."
Mode: [MODE: ARCHITECT] with BLUE background layer
CIPHER Response:
REASON — Critical infrastructure (water treatment) with no security monitoring and a flat IT/OT network. This is a high-severity risk posture. The Oldsmar, FL water treatment attack (2021) demonstrated that attackers can manipulate chemical dosing levels with lethal consequences.
PLAN — Network segmentation FIRST, then passive monitoring, then detection rules. Cannot disrupt process availability.
CRITICAL WARNING
A flat network between IT and OT means any compromised office workstation is one hop from the PLCs controlling water treatment chemicals. This is the highest priority remediation item. [CONFIRMED — NIST 800-82 Section 5.2, IEC 62443]
Phase 1: Network Segmentation (Weeks 1-4) — Non-Negotiable
CURRENT STATE (DANGEROUS):
┌─────────────────────────────────────────┐
│ FLAT NETWORK │
│ Office PCs ←→ SCADA Server ←→ PLCs │
│ Internet ←→ Email ←→ PLC Programming │
└─────────────────────────────────────────┘
TARGET STATE (Purdue Model):
┌────────────────────────────────────────────────────────────────────┐
│ LEVEL 5: Enterprise │ Corporate IT, Internet access │
├────────────────────────┤ │
│ LEVEL 4: IT/Business │ Email, ERP, business applications │
│ FIREWALL ═══════╪══════════════════════════ │
│ (IT/OT DMZ) │ Historian (read-only), patch server │
│ LEVEL 3.5: DMZ │ Jump host, AV update server │
│ FIREWALL ═══════╪══════════════════════════ │
│ LEVEL 3: Operations │ SCADA server, HMI workstations │
├────────────────────────┤ │
│ LEVEL 2: Control │ Engineering workstation, PLC programming│
├────────────────────────┤ │
│ LEVEL 1: Basic Control │ PLCs, RTUs, safety controllers │
├────────────────────────┤ │
│ LEVEL 0: Process │ Sensors, actuators, valves, pumps │
└────────────────────────┴──────────────────────────────────────────┘
Firewall rules between IT and OT DMZ:
# IT/OT DMZ Firewall — ONLY allow specific, required traffic
# DEFAULT: DENY ALL
# Allow historian to READ process data (unidirectional if possible)
permit tcp OT_HISTORIAN IT_ANALYTICS_SERVER 1433 (SQL read-only replica)
# Allow patch downloads from DMZ patch server to OT
permit tcp DMZ_PATCH_SERVER OT_SCADA 445 (SMB — scheduled maintenance windows only)
# Allow jump host access (for remote maintenance)
permit tcp IT_JUMP_HOST OT_SCADA 3389 (RDP — MFA required, session recorded)
# DENY all other IT→OT traffic
deny ip IT_NETWORK OT_NETWORK any log
# DENY all OT→IT traffic (OT should never initiate outbound to IT)
deny ip OT_NETWORK IT_NETWORK any log
# DENY all OT→Internet traffic
deny ip OT_NETWORK any any log
If budget/timeline allows: Deploy a unidirectional security gateway (data diode) between Levels 3 and 3.5. This physically prevents any traffic from IT to OT while allowing process data (historian) to flow outward. Products: Waterfall Security, Owl Cyber Defense. [CONFIRMED — IEC 62443-3-3 SR 5.1]
Phase 2: Passive Network Monitoring (Weeks 2-6)
Deploy passive OT network monitoring. These tools sniff traffic passively — they do not inject packets or scan. Scanning active PLCs can crash them.
Tool options:
- Claroty — Commercial, deep Siemens protocol support
- Nozomi Networks — Commercial, good for water/wastewater
- Dragos Platform — Commercial, best threat intel for ICS
- Zeek + GRASSMARLIN — Open source, requires more expertise
# Deploy Zeek on a network TAP (passive — mirror port on OT switch)
# SPAN/mirror port configuration on OT switch (Cisco example):
# monitor session 1 source interface Gi0/1 - Gi0/24
# monitor session 1 destination interface Gi0/48
# Zeek ICS protocol parsers
# Install Zeek with ICS protocol analyzers
zeek -i ot_monitor_interface local frameworks/notice
# GRASSMARLIN — passive OT network mapper
# Discovers all ICS assets and communication patterns without sending any traffic
java -jar grassmarlin.jar
Asset inventory — discover what you have:
# Passively discover all OT assets from network captures
# Do NOT run nmap or active scans against OT networks
# Instead, use packet captures to identify devices
# Extract unique MAC/IP pairs from passive capture
tshark -r ot_capture.pcap -T fields -e eth.src -e ip.src -e eth.dst -e ip.dst | sort -u
# Identify Siemens S7 communications
tshark -r ot_capture.pcap -Y "s7comm" -T fields -e ip.src -e ip.dst -e s7comm.param.func
# Identify Modbus communications
tshark -r ot_capture.pcap -Y "modbus" -T fields -e ip.src -e ip.dst -e modbus.func_code
Phase 3: Detection Rules for OT (Weeks 4-8)
title: Unauthorized S7 Communication to Siemens PLC
id: 1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d
status: stable
description: >
Detects S7comm protocol traffic to Siemens PLCs from unauthorized source IPs.
Only the engineering workstation and SCADA server should communicate with PLCs.
logsource:
category: network_connection
product: zeek
detection:
selection:
dest_port: 102 # ISO-TSAP / S7comm
filter_authorized:
src_ip:
- '10.100.2.10' # Engineering workstation
- '10.100.3.20' # SCADA server
condition: selection and not filter_authorized
falsepositives:
- New authorized engineering workstation not yet added to allowlist
level: critical
tags:
- attack.lateral_movement
- attack.t1021
- ics.t0843 # ICS ATT&CK: Program Download
title: PLC Program Download Detected
id: 2b3c4d5e-6f7a-8b9c-0d1e-2f3a4b5c6d7e
status: stable
description: >
Detects S7comm program download function to a PLC. This modifies the PLC
logic and should only occur during scheduled maintenance windows.
Outside maintenance = potential sabotage.
logsource:
category: network_connection
product: zeek
detection:
selection:
protocol: s7comm
s7comm_function:
- 'download'
- 'plc_stop'
- 'plc_control'
condition: selection
falsepositives:
- Scheduled maintenance (coordinate with control engineering team)
level: critical
tags:
- ics.t0843 # Program Download
- ics.t0855 # Unauthorized Command Message
title: New Device on OT Network
id: 3c4d5e6f-7a8b-9c0d-1e2f-3a4b5c6d7e8f
status: stable
description: >
Detects a previously unseen MAC address or IP address on the OT network
segment. The OT network should have a static, known asset inventory.
Any new device is suspicious (rogue device, attacker pivot point).
logsource:
category: network_connection
product: zeek
detection:
selection:
# Compare against known asset inventory lookup
src_ip|not_in_lookup: ot_asset_inventory.csv
condition: selection
falsepositives:
- Legitimate new device installation (should be pre-registered in inventory)
level: high
tags:
- ics.t0842 # Network Sniffing
Phase 4: SCADA Server Hardening (Windows Server 2016)
# 1. Endpoint protection — deploy EDR if the SCADA vendor approves
# CRITICAL: test in staging first. Some EDR agents interfere with SCADA software.
# 2. Application whitelisting — SCADA servers should only run known software
# Windows Defender Application Control (WDAC) or AppLocker
New-CIPolicy -FilePath "C:\WDAC\SCADAPolicy.xml" -Level Publisher -ScanPath "C:\Program Files\Siemens" -UserPEs -Fallback Hash
# 3. Disable unnecessary services
Get-Service | Where-Object {$_.Status -eq "Running"} | Select-Object Name, DisplayName | Export-Csv baseline_services.csv
# Review and disable: Print Spooler, Remote Desktop (if not needed), Windows Search, etc.
# 4. USB control — block unauthorized USB devices
# GPO: Computer Configuration > Administrative Templates > System > Device Installation
# "Prevent installation of devices not described by other policy settings" = Enabled
# Allowlist only approved USB devices by hardware ID
# 5. Enable audit logging (critical for SCADA servers)
auditpol /set /subcategory:"Logon" /success:enable /failure:enable
auditpol /set /subcategory:"Process Creation" /success:enable
auditpol /set /subcategory:"Object Access" /success:enable /failure:enable
# Ship logs to SIEM in the IT DMZ (one-way if using data diode)
# 6. Patch management — CRITICAL: never auto-update OT systems
# Test patches in a staging environment that mirrors production
# Apply patches during scheduled maintenance windows only
# Maintain a separate WSUS server in the IT/OT DMZ
Phase 5: Incident Response for OT
[INCIDENT TYPE: OT/ICS Compromise] Runbook
Triage (0-15 min):
1. Is there an immediate safety risk? (chemical levels, pressure, flow)
YES → Activate manual overrides, switch to local manual control
NO → Continue assessment
2. Identify affected level in Purdue model
3. Determine if process is still operating within safe parameters
Containment:
- DO NOT power off PLCs unless safety is at risk (process disruption)
- Isolate at network level: disconnect the IT/OT DMZ firewall connection
- If attacker is in Level 3 (SCADA): disable remote access, lock accounts
- If attacker is in Level 1/2 (PLCs): switch to manual control, isolate affected PLC
Evidence Preservation:
- Capture PLC program (ladder logic) — compare against known-good baseline
- Export SCADA server event logs
- Capture network traffic from the TAP/mirror port
- DO NOT forensically image PLC memory unless you have ICS forensics expertise
Recovery:
- Reload PLC program from verified offline backup (not from potentially
compromised engineering workstation)
- Rebuild SCADA server from clean image
- Re-validate all setpoints and safety limits before resuming automated control
Escalation Triggers:
- Chemical dosing levels changed → Immediate physical safety response + EPA notification
- Multiple PLCs affected → Assume sophisticated adversary, engage ICS IR firm (Dragos, Mandiant)
- Safety controller (SIS) tampered → STOP EVERYTHING, full manual control
Compliance and Standards
| Standard | Relevance |
|---|---|
| NIST 800-82 Rev 3 | Guide to ICS Security — primary reference |
| IEC 62443 | Industrial automation security — zone/conduit model |
| AWWA Guidance | Water sector specific cybersecurity guidance |
| CISA ICS-CERT | Free vulnerability advisories for ICS products |
| TSA Security Directive | If pipeline operations are in scope |
ATT&CK for ICS: T0843 (Program Download), T0855 (Unauthorized Command), T0842 (Network Sniffing), T0826 (Loss of Availability), T0831 (Manipulation of Control)
Key risk [CONFIRMED]: Windows Server 2016 reaches end of extended support October 2027. Plan migration to Server 2022 during next maintenance cycle. Until then, ensure all available patches are applied and compensating controls (application whitelisting, network isolation) are in place.
Immediate action items (priority order):
- Segment IT from OT with a firewall (this week)
- Deploy network TAP and passive monitoring (this month)
- Harden the SCADA server (this month)
- Build OT asset inventory (this quarter)
- Deploy ICS-specific detection rules (this quarter)
- Evaluate data diode for historian traffic (next quarter)
- Conduct OT tabletop exercise with operations team (next quarter)