Security Operations Runbooks — Comprehensive Operational Procedures
Security Operations Runbooks — Comprehensive Operational Procedures
Classification: INTERNAL — SOC Operations Version: 2.0 Last Updated: 2026-03-14 Owner: Security Operations Center Lead Review Cadence: Quarterly or after any major incident References: NIST SP 800-61r2, NCSC Incident Management Collection, CERT SG IRM-2022, CISA BOD 22-01, CIS Controls v8, MITRE ATT&CK v14
Table of Contents
- Daily SOC Operations
- Vulnerability Management
- Threat Intelligence Operations
- Security Monitoring Health
- Access Review
- Penetration Testing Program
- Security Awareness Program
- Change Management Security Review
- Third-Party Risk Assessment
- Business Continuity / DR Testing
1. Daily SOC Operations
1.1 Shift Structure and Coverage
Shift Model (24x7 coverage):
| Shift | Hours (Local) | Minimum Staffing | Roles Required |
|---|---|---|---|
| Day (Alpha) | 0600–1400 | 3 analysts + 1 lead | L1 x2, L2 x1, Shift Lead x1 |
| Swing (Bravo) | 1400–2200 | 3 analysts + 1 lead | L1 x2, L2 x1, Shift Lead x1 |
| Night (Charlie) | 2200–0600 | 2 analysts + 1 lead | L1 x1, L2 x1, Shift Lead x1 |
On-Call Escalation Chain (outside staffed hours or for escalation):
- L3 Analyst (on-call rotation, 15-minute SLA to acknowledge page)
- SOC Manager (30-minute SLA)
- CISO / Deputy CISO (critical/emergency severity only)
1.2 Shift Handoff Procedure
Handoff Duration: 15 minutes minimum, 30 minutes for active incidents.
Outgoing Shift Responsibilities:
- Complete the Shift Handoff Report in the SOC wiki/ticketing system covering:
- Active incidents — current status, last action taken, next action required
- Escalated tickets — who owns them, expected resolution timeline
- Open investigations — hypothesis, evidence collected, blockers
- SIEM/tool health — any degraded systems, ingestion delays, known gaps
- Intelligence updates — new IOCs ingested, threat advisories received
- Pending tasks — anything time-sensitive the incoming shift must handle
- Environmental notes — scheduled maintenance windows, expected noise
- Verbally brief the incoming Shift Lead on top 3 priorities
- Do not leave until incoming Shift Lead explicitly acknowledges handoff
Incoming Shift Responsibilities:
- Review handoff report before assuming watch
- Verify access to all SOC tools (SIEM, SOAR, ticketing, threat intel platform)
- Check SIEM dashboard for current alert volume and any anomalies
- Review overnight/off-hours automated alerts that may need human review
- Confirm on-call roster is accurate for current shift
- Acknowledge handoff in the ticketing system (timestamp recorded)
Handoff Failure Protocol: If outgoing shift departs without proper handoff, the incoming Shift Lead documents the gap and notifies the SOC Manager within 1 hour. This is a reportable process failure.
1.3 Alert Triage Workflow
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Alert Fires │────▶│ L1 Triage │────▶│ L2 Analyze │────▶│ L3 / Hunt │
│ (SIEM/SOAR) │ │ (≤15 min) │ │ (≤60 min) │ │ (≤4 hours) │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Close as │ │ Escalate │ │ Incident │
│ False Pos │ │ to L3 │ │ Declared │
└───────────┘ └───────────┘ └───────────┘
L1 Triage (Target: 15 minutes per alert)
- Receive alert — open ticket automatically created by SOAR or manually
- Contextualize:
- What asset is affected? (lookup in CMDB — crown jewel? internet-facing?)
- What user is associated? (lookup in IAM — privileged? service account?)
- What is the detection logic? (read the rule description, understand what triggered)
- Enrich:
- Check source/destination IPs against threat intel feeds
- Check file hashes against VirusTotal / internal sandbox results
- Check domain/URL reputation
- Review related alerts for the same source/user/asset in the last 24–72 hours
- Classify:
- True Positive — confirmed malicious activity → escalate to L2
- Benign True Positive — detection fired correctly but activity is authorized → document and close, consider tuning
- False Positive — detection fired incorrectly → document, close, submit tuning request
- Insufficient Data — cannot determine → escalate to L2 with documented analysis
- Document: Every triage decision includes:
- Analyst name and timestamp
- Evidence reviewed (screenshots, log excerpts, enrichment results)
- Classification rationale
- Action taken
L2 Analysis (Target: 60 minutes for initial assessment)
- Validate L1 triage — confirm or adjust classification
- Deep investigation:
- Full timeline reconstruction (process trees, network connections, file modifications)
- Lateral movement indicators — check authentication logs for the source host/user
- Data access patterns — did the entity access sensitive data stores?
- Persistence mechanisms — scheduled tasks, services, registry, crontabs, startup items
- C2 indicators — beaconing patterns, DNS tunneling, unusual protocol usage
- Scope assessment: Is this an isolated event or part of a broader campaign?
- Determine response:
- Containable at L2 → execute containment (host isolation, account disable) per playbook
- Requires L3 / IR team → escalate with full investigation package
- Requires immediate escalation → invoke incident declaration process
- Update ticket with full investigation narrative, IOCs extracted, scope assessment
L3 / Threat Hunting (Target: 4-hour initial scope, ongoing as needed)
- Hypothesis-driven hunting based on L2 findings
- Retroactive search — query historical data for IOCs and TTPs identified
- Malware analysis — static and dynamic analysis if samples recovered
- Forensic acquisition — memory and disk imaging if required for legal/HR
- Incident declaration if criteria met (see 1.5)
1.4 Alert Priority and SLA Matrix
| Priority | Criteria | Initial Triage | Investigation Complete | Escalation |
|---|---|---|---|---|
| P1 — Critical | Active breach, data exfiltration, ransomware execution, crown jewel compromise | 5 min | 30 min | Immediate to SOC Manager + CISO |
| P2 — High | Confirmed malicious activity, C2 communication, privilege escalation on production | 15 min | 2 hours | Within 30 min to SOC Manager |
| P3 — Medium | Suspicious activity requiring investigation, policy violation, anomalous behavior | 30 min | 8 hours | Within 4 hours to Shift Lead |
| P4 — Low | Informational alerts, minor policy violations, known risk acceptances | 4 hours | 24 hours | Standard queue |
1.5 Incident Declaration Criteria
An incident is declared when ANY of the following conditions are met:
- Confirmed unauthorized access to systems containing PII, PHI, financial data, or intellectual property
- Active command-and-control communication from an internal host
- Ransomware execution or attempted execution on any production system
- Confirmed compromise of privileged credentials (domain admin, root, cloud admin)
- Exfiltration of sensitive data (confirmed or high-confidence indicators)
- Compromise of security infrastructure (SIEM, EDR, PAM, CA, DNS)
- Third-party notification of breach involving organizational data
- Law enforcement notification of organizational compromise
- Over 80% of staff unable to work due to cyber event (NCSC Critical threshold)
Declaration authority: Shift Lead (L2+), SOC Manager, CISO, or any member of the Incident Response Team.
1.6 Escalation Procedures
Level 0: L1 Analyst → Shift Lead (same shift)
Level 1: Shift Lead → SOC Manager
Level 2: SOC Manager → CISO + IR Team Lead
Level 3: CISO → Executive Leadership + Legal + PR + External IR Retainer
Level 4: Executive Leadership → Regulatory Notification + Law Enforcement + Board
Escalation Rules:
- Never skip levels except for P1/Critical events (direct to Level 2 minimum)
- If you cannot reach the next escalation level within 15 minutes, proceed to the level above
- All escalations are documented with timestamp, recipient, and information conveyed
- Escalation is NEVER wrong — analysts are empowered to escalate based on gut feeling if they cannot articulate the specific criteria. The cost of a false escalation is far lower than a missed incident.
1.7 Communication Protocols
| Audience | Channel | Frequency | Content |
|---|---|---|---|
| SOC Internal | Secure chat (Mattermost/Slack private channel) | Real-time | Alert discussion, triage coordination |
| SOC → IT Ops | Ticketing system + phone for P1/P2 | As needed | Containment requests, log requests |
| SOC → Management | Email + scheduled briefing | Daily summary, immediate for P1 | Alert metrics, incident status, risk posture |
| SOC → CISO | Secure email + phone for P1 | Weekly metrics, immediate for incidents | Operational metrics, active incidents, emerging threats |
| Incident Comms | Dedicated bridge line + war room chat | During active incidents | Status updates per ICS cadence (every 30-60 min) |
| External (regulators) | Legal-approved channels only | Per regulatory requirements | CISO/Legal approval required before ANY external communication |
Communication Security During Incidents:
- Assume the attacker can read internal email — use out-of-band communication for incident response
- Dedicated incident chat channel created per incident (not the daily operations channel)
- Phone calls for sensitive tactical decisions
- Never discuss incident details in public channels, social media, or unsecured platforms
- Ref: NCSC guidance — "evaluate the possibility that the attacker might react to your actions"
1.8 Daily SOC Metrics Dashboard
Track and report daily:
| Metric | Target | Red Threshold |
|---|---|---|
| Mean Time to Triage (MTTT) | ≤15 min | >30 min |
| Mean Time to Respond (MTTR) | ≤4 hours (P2) | >8 hours |
| Alert volume (total) | Baseline ±20% | >50% above baseline |
| False positive rate | <40% | >60% |
| Escalation rate (L1→L2) | 15–25% | >40% or <5% |
| Open tickets >24 hours | <10 | >25 |
| SIEM ingestion lag | <5 min | >15 min |
| Detection coverage (ATT&CK) | >60% of priority techniques | <40% |
1.9 End-of-Day Shift Report Template
SHIFT REPORT — [Date] [Shift Alpha/Bravo/Charlie]
Shift Lead: [Name]
Analysts on Duty: [Names]
ALERT SUMMARY:
Total alerts received: ___
True positives: ___
False positives: ___
Benign true positives: ___
Escalated to L2: ___
Escalated to L3/IR: ___
Incidents declared: ___
ACTIVE INCIDENTS:
[INC-XXXX] — Status: [Contain/Analyze/Remediate] — Next Action: ___
NOTABLE EVENTS:
- [Brief description of anything unusual, even if not an incident]
SIEM/TOOL HEALTH:
- [Any degraded services, ingestion gaps, tool outages]
INTELLIGENCE UPDATES:
- [New threat advisories, IOCs ingested, relevant news]
TUNING REQUESTS SUBMITTED:
- [Rule ID] — [Reason for tuning request]
PENDING FOR NEXT SHIFT:
- [Prioritized list of items requiring attention]
2. Vulnerability Management
2.1 Program Governance
| Role | Responsibility |
|---|---|
| Vulnerability Management Lead | Program ownership, metrics reporting, exception approvals |
| Scan Engineers | Scanner configuration, scan execution, result validation |
| Patch Engineers | Patch testing, deployment, rollback procedures |
| Asset Owners | Remediation within SLA, exception requests, risk acceptance |
| CISO | Program oversight, risk acceptance authority for Critical/High |
| Change Advisory Board | Patch deployment approval for production systems |
2.2 Scanning Schedule
| Scan Type | Scope | Frequency | Window |
|---|---|---|---|
| External perimeter (authenticated) | All internet-facing assets | Weekly | Saturday 0200–0600 |
| External perimeter (unauthenticated) | All public IPs/domains | Weekly | Wednesday 0200–0600 |
| Internal network (authenticated) | All internal subnets | Bi-weekly | Sunday 0100–0700 |
| Internal network (credentialed) | Servers, workstations | Monthly (full) | First Sunday of month |
| Container image scanning | All container registries | On every image build (CI/CD) | Continuous |
| Cloud configuration (CSPM) | All cloud accounts | Daily | Continuous |
| Web application (DAST) | All web applications | Monthly | Coordinated with app owners |
| Database scanning | All database instances | Monthly | Second Sunday of month |
| Code scanning (SAST) | All repositories | On every PR/merge | Continuous (CI/CD integrated) |
| Ad-hoc / emergency | As defined per advisory | Within 24 hours of advisory | ASAP |
Scanner Configuration Standards:
- Credentialed scans use dedicated service accounts with least-privilege read access
- Service account credentials stored in PAM, rotated every 90 days
- Scan exclusions require documented approval from VM Lead (tracked in exception register)
- Scanner signatures/plugins updated before every scheduled scan
- Network-based and agent-based scanning used in combination for maximum coverage
2.3 Vulnerability Triage and Prioritization
Do not use CVSS alone. Use a composite scoring model:
PRIORITY SCORE = f(CVSS Base, EPSS Probability, KEV Status, Asset Criticality, Exploitability, Exposure)
Prioritization Matrix
| Factor | Weight | Data Source |
|---|---|---|
| CVSS Base Score | 20% | NVD / Scanner |
| EPSS Score (probability of exploitation in 30 days) | 25% | FIRST EPSS API |
| CISA KEV Listed | 20% (binary: +20 if yes) | CISA KEV Catalog JSON feed |
| Asset Criticality | 20% | CMDB tier classification |
| Network Exposure | 15% | Internet-facing = highest; air-gapped = lowest |
Asset Criticality Tiers
| Tier | Description | Examples |
|---|---|---|
| Tier 1 — Crown Jewels | Revenue-generating, contains regulated data, security infrastructure | Payment systems, customer DB, AD/PKI, SIEM, PAM |
| Tier 2 — Business Critical | Significant business impact if compromised | ERP, email, file servers, CI/CD |
| Tier 3 — Business Support | Moderate impact | Development systems, internal tools, printers |
| Tier 4 — Low Impact | Minimal business impact | Lab systems, test environments |
Composite Priority Assignment
| Priority | Criteria | Remediation SLA |
|---|---|---|
| P1 — Emergency | CISA KEV + internet-facing, OR CVSS 9.0+ with active exploit + Tier 1 asset | 24–48 hours |
| P2 — Critical | CISA KEV (any asset), OR CVSS 9.0+ + EPSS >0.5 + Tier 1/2, OR CVSS 7.0+ with weaponized exploit | 7 days |
| P3 — High | CVSS 7.0–8.9 + EPSS >0.1, OR CVSS 9.0+ + Tier 3/4 | 30 days |
| P4 — Medium | CVSS 4.0–6.9, OR CVSS 7.0+ with low EPSS + Tier 3/4 | 90 days |
| P5 — Low | CVSS <4.0, informational findings | 180 days or accept risk |
CISA KEV Integration (ref: BOD 22-01):
- Ingest CISA KEV catalog daily (JSON feed:
https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json) - Auto-flag any vulnerability matching a KEV entry
- KEV vulnerabilities on federal systems: remediate by CISA-specified due date (typically 14–21 days)
- KEV vulnerabilities on non-federal systems: treat as minimum P2 regardless of CVSS
- If patch unavailable: apply vendor mitigation, or discontinue product use, or document compensating control with CISO approval
2.4 Patching Workflow
┌────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────┐
│ Vuln │ │ Triage & │ │ Patch Test │ │ CAB │ │ Deploy │
│ Identified │───▶│ Prioritize │───▶│ (Non-Prod) │───▶│ Approval │───▶│ to Prod │
└────────────┘ └──────────────┘ └──────────────┘ └────────────┘ └───────────┘
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ Test Fail │ │ Verify & │
│ → Vendor │ │ Rescan │
│ Escalate │ └───────────┘
└───────────┘
Step-by-Step:
- Identification: Scanner results ingested into vulnerability management platform
- Deduplication: Consolidate findings across scanners; normalize to CVE identifiers
- Prioritization: Apply composite scoring (Section 2.3)
- Assignment: Auto-assign to asset owner via CMDB integration; ticket created
- Patch Testing:
- Deploy patch to non-production environment matching production configuration
- Execute smoke tests (application functionality, service availability, integration points)
- Regression testing for Tier 1 systems (minimum 24 hours soak time)
- Document test results in patch ticket
- Change Request: Submit RFC through change management process
- P1/Emergency: Emergency change process (CAB chair + asset owner verbal approval)
- P2–P5: Standard change process (next scheduled CAB)
- Deployment:
- Deploy in maintenance window (or immediately for P1)
- Phased rollout: canary (5%) → pilot (25%) → full deployment
- Rollback plan documented and tested before deployment
- Verification:
- Rescan within 72 hours of deployment
- Confirm vulnerability no longer detected
- Close ticket with evidence of remediation
- Reporting: Update vulnerability dashboard; age-out resolved findings
2.5 Exception and Risk Acceptance Process
When a vulnerability cannot be remediated within SLA:
- Asset owner submits Exception Request containing:
- CVE identifier(s) and current CVSS/EPSS/KEV status
- Business justification for delay (not "we're busy")
- Compensating controls currently in place
- Proposed remediation date (maximum extension: 90 days for P1/P2, 180 days for P3/P4)
- Risk owner signature (must be Director level or above)
- VM Lead reviews compensating controls for adequacy:
- Network segmentation isolating the vulnerable system?
- WAF/IPS rules blocking known exploit vectors?
- Enhanced monitoring/detection rules deployed?
- Access restrictions tightened?
- Approval authority:
- P3/P4/P5: VM Lead can approve
- P2: CISO approval required
- P1: CISO + CTO/CIO approval required
- CISA KEV: No exceptions for federal systems; non-federal requires CISO + CTO
- Approved exceptions reviewed monthly; auto-expire at proposed date
- Expired exceptions without remediation escalate to CISO automatically
Risk Acceptance (distinct from exception):
- Permanent acceptance of a vulnerability with no intent to remediate
- Requires formal risk assessment documenting residual risk
- CISO approval required for all risk acceptances
- Re-evaluated annually or when threat landscape changes (new exploit, KEV listing)
2.6 Vulnerability Disclosure Handling
Ref: OWASP Vulnerability Disclosure Cheat Sheet, RFC 9116
Inbound Vulnerability Reports:
- Receiving mechanism:
security@[org].commonitored by security team/.well-known/security.txtpublished on all web properties (RFC 9116)- Bug bounty platform (if applicable) — HackerOne/Bugcrowd
- PGP key available for encrypted submissions
- Acknowledgment: Within 1 business day; confirm receipt, provide ticket number
- Triage: Within 5 business days; validate the finding, assign severity
- Communication cadence: Update reporter every 14 days minimum until resolution
- Remediation timeline: 90 days for coordinated disclosure (aligned with industry standard)
- Publication:
- Security advisory published with: summary, impact, affected versions, patched versions, CVE ID, workarounds, timeline
- Credit researcher (with permission) — ref: OWASP guidance on researcher incentives
- Coordinate publication date with reporter
- Legal considerations:
- Safe harbor language in security.txt and disclosure policy
- No legal action against good-faith security researchers
- Engage legal counsel if disclosure involves regulated data
2.7 Reporting and Metrics
Monthly Vulnerability Report (to CISO / Risk Committee):
| Metric | Description |
|---|---|
| Total open vulnerabilities | By severity: Critical, High, Medium, Low |
| New vulnerabilities discovered | This month vs. last month, trend |
| Vulnerabilities remediated | Count + median days to remediate by severity |
| SLA compliance rate | % remediated within SLA, by priority |
| Overdue vulnerabilities | Count + aging breakdown (30/60/90/180+ days) |
| Exception count | Active exceptions by priority, approaching expiration |
| CISA KEV coverage | % of applicable KEVs remediated within deadline |
| Scanner coverage | % of assets scanned in last 30 days vs. total CMDB |
| EPSS trending | Top 10 vulnerabilities by EPSS score across the estate |
| Risk reduction trend | Aggregate risk score trend over 6 months |
3. Threat Intelligence Operations
3.1 Intelligence Requirements
Priority Intelligence Requirements (PIRs) — reviewed quarterly with CISO and business leadership:
| PIR | Description | Collection Sources |
|---|---|---|
| PIR-1 | What threat actors are targeting our industry vertical? | ISAC feeds, vendor reports, government advisories |
| PIR-2 | What TTPs are being used against organizations with our technology stack? | ATT&CK updates, vendor advisories, peer sharing |
| PIR-3 | Are any of our assets/credentials exposed on dark web/paste sites? | Dark web monitoring, credential breach services |
| PIR-4 | What vulnerabilities are being actively exploited in the wild? | CISA KEV, EPSS, vendor advisories, exploit-db |
| PIR-5 | What geopolitical events may impact our threat landscape? | OSINT, government advisories, news monitoring |
| PIR-6 | Are there supply chain threats affecting our third-party ecosystem? | Vendor notifications, SBOM monitoring, dependency alerts |
3.2 Intelligence Feed Management
Feed Inventory
| Feed | Type | Format | Ingestion | Confidence | Cost |
|---|---|---|---|---|---|
| CISA AIS / KEV | Government | STIX/JSON | Automated (TIP) | High | Free |
| MITRE ATT&CK | Framework | STIX | Monthly manual review | High | Free |
| AlienVault OTX | Community | STIX/CSV | Automated (TIP) | Medium | Free |
| Abuse.ch (URLhaus, MalwareBazaar, ThreatFox) | Community | CSV/API | Automated (TIP) | Medium-High | Free |
| Industry ISAC | Sector-specific | STIX/Email | Automated + manual | High | Membership |
| Commercial TI (vendor) | Commercial | API | Automated (TIP) | High | Licensed |
| Internal (IR-derived) | Internal | STIX | Manual from IR findings | Highest | N/A |
| Dark web monitoring | Commercial | API/Alert | Automated | Medium | Licensed |
Feed Health Checks (Weekly)
- All automated feeds ingesting successfully (check TIP dashboard)
- Feed freshness — no feed older than its expected update interval
- False positive rate per feed — track and adjust confidence weighting
- Deduplication functioning — no redundant IOCs across feeds
- Expiration policies enforced — IOCs aged out per type (IPs: 30 days, domains: 90 days, hashes: 1 year)
3.3 IOC Processing Workflow
┌──────────┐ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────┐
│ IOC │ │ Validate & │ │ Enrich & │ │ Deploy to │ │ Monitor │
│ Received│───▶│ Deduplicate│───▶│ Contextualize│───▶│ Controls │───▶│ & Assess │
└──────────┘ └─────────────┘ └──────────────┘ └────────────┘ └───────────┘
Step-by-Step:
- Receive: IOC arrives via feed, manual submission, IR finding, or peer sharing
- Validate:
- Is this a legitimate IOC? (not a sinkhole, CDN IP, Google DNS, etc.)
- Is it already in our TIP? Check for duplicates
- Is it within our scope? (relevant to our technology stack and threat profile)
- Enrich:
- IP/Domain: WHOIS, passive DNS, geolocation, ASN, hosting provider, historical resolution
- Hash: AV detection ratio, sandbox detonation results, first/last seen dates
- URL: URL reputation, redirect chain, hosted content analysis
- Context: Associated threat actor, campaign, malware family, ATT&CK TTPs
- Score: Assign confidence level:
- High (80–100): Confirmed by multiple sources, seen in active IR, corroborated by vendor
- Medium (40–79): Single reliable source, community-validated
- Low (1–39): Unvalidated, single source, no corroboration
- Deploy:
- High confidence → auto-deploy to blocking controls (firewall, proxy, EDR, email gateway)
- Medium confidence → deploy to detection/alerting only (SIEM watchlist)
- Low confidence → TIP storage only, monitor for corroboration
- Monitor: Track hits against deployed IOCs; a hit upgrades investigation priority
- Expire: IOCs have defined TTLs based on type and confidence; review before expiration
3.4 Threat Briefings
| Briefing Type | Audience | Frequency | Content |
|---|---|---|---|
| Daily Intel Summary | SOC analysts | Daily (0800) | New IOCs, advisories, active campaigns relevant to org |
| Weekly Threat Brief | SOC + IT Security | Weekly (Monday) | Threat landscape summary, new TTPs, detection gaps, action items |
| Monthly Strategic Brief | CISO + Security Leadership | Monthly | Threat trends, actor profiles, risk posture changes, PIR updates |
| Quarterly Board Brief | Executive / Board | Quarterly | Strategic threat landscape, industry benchmarking, investment recommendations |
| Flash Alert | All security staff | As needed (within 2 hours) | Critical zero-day, active exploitation, immediate action required |
Flash Alert Criteria:
- Zero-day affecting technology in our environment with active exploitation
- Credible threat intelligence indicating imminent targeting of our organization
- Major supply chain compromise affecting our vendors/dependencies
- Credential breach containing organizational accounts
- CISA emergency directive
3.5 Threat Actor Tracking
Maintain profiles for threat actors relevant to the organization:
ACTOR PROFILE: [Name / Alias]
Also Known As: [Alternative names across vendor reporting]
Attribution: [Nation-state / Criminal / Hacktivist / Unknown]
Motivation: [Financial / Espionage / Disruption / Ideological]
Target Verticals:[Industries targeted]
Known TTPs: [ATT&CK technique IDs with descriptions]
Known IOCs: [Reference to TIP collection]
Relevance: [Why this actor matters to our organization]
Last Activity: [Date and brief description]
Detection Status:[Which TTPs we can detect vs. gaps]
Source: [Intelligence sources for this profile]
3.6 Intelligence Sharing
- Inbound: ISAC membership, government partnerships (CISA AIS), peer organizations, commercial feeds
- Outbound: Contribute sanitized IOCs and TTPs back to ISACs and peer groups
- All outbound sharing reviewed for sensitive organizational data before release
- Use TLP (Traffic Light Protocol) markings on all shared intelligence:
- TLP:RED — Named recipients only
- TLP:AMBER+STRICT — Organization only
- TLP:AMBER — Organization + clients as needed
- TLP:GREEN — Community sharing
- TLP:CLEAR — Unrestricted
- Legal review: Annual review of sharing agreements and liability protections
4. Security Monitoring Health
4.1 SIEM Health Checks
Daily Checks (automated where possible, manual verification)
| Check | Method | Pass Criteria | Failure Action |
|---|---|---|---|
| Ingestion pipeline running | SIEM dashboard / API | All collectors showing active | Page on-call SIEM engineer |
| Event volume within baseline | Compare to 7-day rolling average | Within ±30% of baseline | Investigate: source offline? blocked? changed? |
| Ingestion latency | Timestamp comparison (event time vs. index time) | <5 minutes for 95th percentile | Investigate pipeline bottleneck |
| Parser/normalization errors | Error log review | <0.1% parse failure rate | Fix parser, re-ingest affected data |
| Storage capacity | Disk/license usage | <80% capacity; >90 days retention available | Capacity planning escalation |
| Correlation engine running | Rule execution logs | All rules executing on schedule | Restart correlation engine; escalate if persistent |
| Alert delivery functioning | Test alert generation | Alerts reaching SOAR/ticketing within 2 minutes | Check integration, escalate |
Weekly Checks
| Check | Method | Pass Criteria |
|---|---|---|
| Log source inventory vs. CMDB | Compare active sources to expected sources | All Tier 1/2 assets sending logs |
| Failed authentication log sources | Query for sources with auth failures | Service account credentials valid |
| New log sources detected | Review auto-discovered sources | All new sources classified and parsed |
| Retention policy compliance | Verify data age across indexes | Meeting retention requirements (regulatory + internal) |
| Search performance | Run standardized query set, measure response time | Queries complete within defined thresholds |
| Backup verification | Confirm SIEM config backup completed | Backup successful, restorable |
Monthly Checks
| Check | Method | Pass Criteria |
|---|---|---|
| Full log source gap analysis | CMDB audit vs. active log sources | >95% of required sources actively logging |
| Detection rule review | Review all rules for relevance, performance, tuning | No rules disabled without documentation |
| License utilization | EPS/GB usage vs. license | <90% license utilization |
| Disaster recovery test | Failover to DR SIEM instance | DR instance functional, data replicating |
4.2 Log Source Validation
Required Log Sources by Asset Type
| Asset Type | Required Logs | Minimum Fields |
|---|---|---|
| Windows Servers | Security (4624/4625/4648/4672/4688/4720/4732/7045), Sysmon, PowerShell (4104) | Timestamp, hostname, user, event ID, source/dest IP |
| Linux Servers | auditd (execve, connect, open), auth.log, syslog | Timestamp, hostname, user, command, source IP |
| Firewalls | Traffic logs (allow + deny), VPN auth | Timestamp, source/dest IP:port, action, bytes, protocol |
| Web Proxies | HTTP/HTTPS access logs | Timestamp, source IP, URL, user-agent, response code, bytes |
| DNS Servers | Query logs | Timestamp, source IP, queried domain, response, record type |
| Email Gateway | Inbound/outbound, spam/malware verdicts | Timestamp, sender, recipient, subject, verdict, attachment hashes |
| EDR | Process creation, network connections, file writes, registry | Full telemetry per agent capability |
| Cloud (AWS/Azure/GCP) | CloudTrail/Activity Log/Audit Log, VPC Flow Logs, GuardDuty/Defender/SCC | Timestamp, principal, action, resource, source IP, result |
| Identity Provider | Authentication, MFA events, token issuance | Timestamp, user, result, source IP, MFA method |
| Database | Authentication, privilege escalation, schema changes, query logs (sensitive DBs) | Timestamp, user, query type, source, result |
| PAM | Session recordings, credential checkout, privilege escalation | Timestamp, user, target, action, session ID |
Log Source Validation Test (Monthly)
For each critical log source:
- Generate a known event (e.g., failed login, file creation, process execution)
- Confirm the event appears in SIEM within expected latency
- Verify all required fields are parsed and normalized correctly
- Document test result in validation register
4.3 Detection Rule Testing
Detection Rule Lifecycle
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────┐
│ Develop │ │ Test in │ │ Deploy to │ │ Monitor │ │ Tune / │
│ Rule │───▶│ Lab/Staging │───▶│ Production │───▶│ Perf/FP │───▶│ Retire │
└──────────┘ └──────────────┘ └──────────────┘ └────────────┘ └───────────┘
Development Standards (ref: Sigma rule format):
- Every rule has: unique ID, description, log source, detection logic, false positive notes, severity, ATT&CK tags
- Rules stored in version control (Git) with change history
- Peer review required before production deployment
Testing Protocol:
- Unit test: Execute the specific attack/behavior in a lab environment; confirm detection fires
- False positive assessment: Run rule against 30 days of production data in audit mode; review all matches
- Performance test: Measure query execution time; ensure no SIEM performance degradation
- Integration test: Confirm alert flows through SOAR to ticketing correctly
- Documentation: Update detection catalog with test results, known FP scenarios, tuning guidance
Automated Detection Testing (Purple Team Continuous):
- Deploy atomic red team tests or MITRE Caldera scenarios on a schedule (weekly/monthly)
- Map test results to detection rules
- Track detection rate over time:
Detection Rate = (Attacks Detected / Attacks Executed) × 100
4.4 Detection Coverage Assessment
ATT&CK Coverage Mapping
Maintain a living ATT&CK Navigator layer showing:
- Green: Technique covered by validated detection rule with low FP rate
- Yellow: Technique partially covered (e.g., only some sub-techniques, high FP rate, specific OS only)
- Red: Technique not covered, log source available
- Gray: Technique not covered, log source NOT available (requires infrastructure investment)
Quarterly Coverage Review:
- Export current detection rule inventory with ATT&CK mappings
- Map against ATT&CK Navigator
- Identify top 10 coverage gaps weighted by:
- Threat actor relevance (do our tracked actors use this technique?)
- MITRE ATT&CK frequency data (how commonly is this technique used?)
- Log source availability (can we detect this today?)
- Create detection engineering backlog from gap analysis
- Report coverage percentage to CISO:
Coverage = (Covered Techniques / Total Relevant Techniques) × 100
Target Coverage by Tactic:
| Tactic | Minimum Coverage Target | Priority |
|---|---|---|
| Initial Access | 70% | High |
| Execution | 80% | Critical |
| Persistence | 75% | Critical |
| Privilege Escalation | 75% | Critical |
| Defense Evasion | 50% | High |
| Credential Access | 80% | Critical |
| Discovery | 40% | Medium |
| Lateral Movement | 75% | Critical |
| Collection | 50% | High |
| Command and Control | 70% | High |
| Exfiltration | 60% | High |
| Impact | 70% | High |
4.5 Log Retention Requirements
| Log Type | Minimum Retention (Hot/Searchable) | Archive (Cold) | Regulatory Driver |
|---|---|---|---|
| Security events | 90 days | 1 year | SOX, PCI DSS, internal policy |
| Authentication logs | 180 days | 2 years | GDPR (accountability), HIPAA |
| Network flow data | 30 days | 1 year | Internal policy |
| Firewall logs | 90 days | 1 year | PCI DSS 10.7 |
| Email logs | 90 days | 1 year | Internal policy |
| Cloud audit logs | 90 days | 2 years | CSA CCM, internal policy |
| PAM session recordings | 180 days | 3 years | SOX, internal policy |
| DNS query logs | 30 days | 180 days | Internal policy |
| Incident-related logs | Full incident duration | 7 years | Legal hold, regulatory |
5. Access Review
5.1 Review Schedule
| Review Type | Scope | Frequency | Owner | Deadline |
|---|---|---|---|---|
| Privileged access review | All admin/root/elevated accounts | Quarterly | IAM Team + Asset Owners | 30 days from initiation |
| Standard user access review | All user accounts | Semi-annually | HR + Managers | 45 days from initiation |
| Service account review | All non-human accounts | Quarterly | IAM Team + App Owners | 30 days from initiation |
| Third-party/vendor access | All external accounts | Quarterly | Vendor Management + IAM | 30 days from initiation |
| Emergency/break-glass review | All emergency access usage | Monthly | IAM Team + SOC | 7 days from initiation |
| Cloud IAM review | All cloud roles/policies | Quarterly | Cloud Security + IAM | 30 days from initiation |
| SaaS application access | All SaaS licenses/permissions | Semi-annually | IT + App Owners | 45 days from initiation |
5.2 Privileged Account Audit Procedure
Pre-Review Preparation (IAM Team, Week 1):
- Export current privileged account inventory from PAM, AD, cloud IAM:
- Domain Admins, Enterprise Admins, Schema Admins, Account Operators
- Local administrator accounts on servers
- Root/sudo accounts on Linux/Unix
- Cloud admin roles (AWS IAM admin, Azure Global Admin, GCP Organization Admin)
- Database DBA accounts
- Security tool admin accounts (SIEM, EDR, firewall management)
- Network device admin accounts
- Cross-reference against HR system:
- Identify accounts belonging to terminated employees → immediate disable
- Identify accounts belonging to transferred employees → verify role-appropriate access
- Identify accounts with no HR match → investigate (orphaned accounts)
- Generate review packages per manager/asset owner
Review Execution (Managers/Asset Owners, Weeks 2–3):
For each account under review, certify:
- The account is still needed (business justification documented)
- The privilege level is appropriate for current role (least privilege)
- The account has been used within the last 90 days (detect dormant accounts)
- MFA is enforced (no exceptions for privileged accounts)
- Password/credential rotation is compliant with policy
- Shared account usage is documented and has a named owner
Decision options:
- Certify: Access confirmed appropriate → no action
- Modify: Access level needs adjustment → IAM team implements within 7 days
- Revoke: Access no longer needed → IAM team disables within 24 hours
- Flag for investigation: Suspicious or unexplained access → SOC reviews
Post-Review (IAM Team, Week 4):
- Compile review results and non-response tracking
- Non-respondent managers: escalate to their Director after 7 days, VP after 14 days
- Auto-disable unreviewed accounts after 30 days (with 7-day warning to manager)
- Generate compliance report for auditors
- Update access baseline for next review cycle
- Track metrics:
- % of accounts reviewed on time
- % of accounts modified or revoked
- % of orphaned accounts discovered
- % of non-compliant privileged accounts (no MFA, shared, dormant)
5.3 Service Account Inventory and Audit
Service Account Register (mandatory fields):
| Field | Description |
|---|---|
| Account name | Unique identifier |
| Owner | Named human owner (not a team) |
| Application/service | What system uses this account |
| Privilege level | Permissions granted (specific, not "admin") |
| Authentication method | Password, certificate, API key, managed identity |
| Credential rotation | Last rotated, rotation schedule, rotation method |
| Network access | What systems/networks can this account access |
| Monitoring | Is usage monitored? Alert on anomalous behavior? |
| Last used | Date of last authenticated activity |
| Interactive logon | Allowed? (should be No for most service accounts) |
| Expiration | Does the account expire? Date? |
| Documentation | Link to runbook/architecture diagram showing account purpose |
Service Account Audit Checks:
- No service accounts with interactive logon capability (unless documented exception)
- No service accounts with domain admin or equivalent privileges (unless documented exception with compensating controls)
- All service accounts have a named human owner (not "IT Team")
- Credentials rotated per policy (90 days for passwords, 1 year for certificates)
- Usage logs reviewed for anomalous patterns (off-hours, unusual targets, volume spikes)
- Service accounts not used as shared accounts by humans
- Dormant service accounts (no activity >90 days) disabled pending owner confirmation
- All service accounts excluded from password resets during general reset events (to prevent service outages) but included in scheduled rotation
5.4 Segregation of Duties (SoD) Checks
Review the following SoD conflicts during access reviews:
| Conflict | Risk | Control |
|---|---|---|
| Same person: approve + execute financial transactions | Fraud | Require dual approval |
| Same person: develop code + deploy to production | Unauthorized changes | Enforce CI/CD pipeline separation |
| Same person: create user accounts + assign admin privileges | Unauthorized privilege escalation | Require separate approvers |
| Same person: administer security tools + clear audit logs | Evidence tampering | Log forwarding to immutable store |
| Same person: manage backups + authorize restores | Data manipulation | Dual control on restores |
6. Penetration Testing Program
6.1 Program Structure
| Test Type | Scope | Frequency | Performed By |
|---|---|---|---|
| External network pentest | Internet-facing infrastructure | Annually + after major changes | External vendor |
| Internal network pentest | Internal network, assume breach | Annually | External vendor |
| Web application pentest | All production web apps | Annually per app + after major releases | External vendor or internal red team |
| Mobile application pentest | All published mobile apps | Annually per app + after major releases | External vendor |
| Cloud configuration review | All cloud accounts | Annually | External vendor or internal |
| Social engineering (phishing) | All employees | Bi-annually (coordinate with awareness program) | External vendor or internal red team |
| Physical security assessment | Office locations, data centers | Annually | External vendor |
| Red team exercise | Full-scope, objective-based | Annually | External vendor (different from pentest vendor) |
| Purple team exercise | Collaborative, detection-focused | Quarterly | Internal red + blue team |
| Wireless assessment | All office wireless networks | Annually | External vendor or internal |
6.2 Scoping Process
Scoping Meeting Agenda (4–6 weeks before test start):
- Objectives: What are we trying to prove/disprove? (not just "find vulnerabilities")
- Example: "Can an external attacker reach the payment database from the internet?"
- Example: "Can a compromised employee workstation lead to domain admin?"
- Scope boundaries:
- In-scope IP ranges, domains, applications, cloud accounts
- Explicitly out-of-scope systems (partner systems, shared infrastructure, production databases with real customer data)
- Authorized attack vectors (remote, physical, social engineering)
- Authorized techniques (exploit development? DoS? data exfiltration to tester infrastructure?)
- Rules of engagement:
- Testing windows (business hours only? 24x7?)
- Notification requirements (SOC informed? blind test?)
- Escalation contacts (tester → organization contact for emergencies)
- Data handling (how tester handles any sensitive data encountered)
- Stop conditions (what triggers immediate test halt)
- Environment details:
- Technology stack documentation
- Network diagrams (provided or tester discovers)
- Test accounts/credentials (for authenticated testing)
- Known compensating controls
- Deliverables:
- Report format and template
- Interim notifications for critical findings (within 24 hours of discovery)
- Executive summary + technical findings + remediation recommendations
- Retest included? (should be)
- Legal:
- Signed authorization letter (scope, dates, authorized testers by name)
- NDA in place
- Liability and indemnification clauses
- Data destruction requirements post-engagement
6.3 Vendor Management
Vendor Selection Criteria:
| Criteria | Minimum Requirement |
|---|---|
| Certifications | OSCP/OSCE/OSEP/GPEN/GXPN for lead tester minimum |
| Experience | 5+ years for lead, 2+ years for team members; industry-relevant experience |
| Insurance | Professional liability and cyber liability insurance |
| Methodology | Published methodology (OWASP, PTES, OSSTMM, or equivalent) |
| Reporting | Sample report review before engagement |
| References | 3+ references from similar-size organizations |
| Background checks | All testers background-checked |
| Conflict of interest | Not the same firm that does your compliance audit |
| Rotation | Rotate primary vendor every 2–3 years; maintain 2+ qualified vendors |
Vendor Rotation Rationale: Fresh eyes find different things. Rotating vendors prevents blind spots from familiarity while maintaining relationship continuity through an approved vendor panel.
6.4 Finding Remediation Tracking
Finding Severity → Remediation SLA:
| Severity | SLA | Retest |
|---|---|---|
| Critical | 15 days | Within 30 days of remediation |
| High | 30 days | Within 45 days of remediation |
| Medium | 90 days | Next annual pentest |
| Low | 180 days | Next annual pentest |
| Informational | Best effort | N/A |
Tracking Process:
- Findings imported into vulnerability management platform from pentest report
- Each finding assigned to asset/application owner with SLA deadline
- Owner develops remediation plan within 7 days of assignment
- Remediation executed per plan; evidence of fix documented
- Retest performed by pentest vendor (for Critical/High) to confirm fix
- Finding closed with evidence of remediation and retest results
- Overdue findings escalated per vulnerability management exception process (Section 2.5)
Metrics:
- % of findings remediated within SLA (by severity)
- Average days to remediate (by severity)
- Year-over-year finding comparison (are we improving?)
- Repeat findings (same vulnerability found in consecutive tests — indicates systemic issue)
6.5 Pentest Report Review Checklist
- All in-scope systems were tested (coverage confirmation)
- Methodology documented (not just results)
- Findings include proof-of-concept / evidence (not theoretical)
- Business impact clearly articulated (not just technical severity)
- Remediation recommendations are specific and actionable
- False positives identified and excluded
- Positive findings documented (what's working well)
- Executive summary suitable for non-technical audience
- All test data/access removed from systems
- Testers confirm data destruction of any sensitive data obtained
7. Security Awareness Program
7.1 Training Schedule
| Training Type | Audience | Frequency | Delivery | Duration |
|---|---|---|---|---|
| New hire security orientation | All new employees | Within 5 days of start | In-person or live virtual | 60 min |
| Annual security awareness | All employees | Annually (completion required) | LMS (self-paced) | 30–45 min |
| Role-based training: Developers | Software engineers | Annually | LMS + hands-on labs | 4 hours |
| Role-based training: IT Admins | System/network admins | Annually | LMS + hands-on labs | 4 hours |
| Role-based training: Executives | C-suite, Directors | Annually | In-person briefing | 30 min |
| Role-based training: Finance | Finance/accounting staff | Annually | LMS | 30 min (BEC focus) |
| Phishing simulation | All employees | Monthly | Automated platform | N/A |
| Targeted re-training | Phishing simulation failures | Within 7 days of failure | LMS (mandatory) | 15 min |
| Security champion deep-dive | Designated security champions per department | Quarterly | Workshop | 2 hours |
| Incident response tabletop | IT + Security + Leadership | Semi-annually | Facilitated exercise | 2–4 hours |
7.2 Phishing Simulation Program
Simulation Design:
| Difficulty | Description | Examples | Expected Click Rate |
|---|---|---|---|
| Level 1 — Easy | Obvious phishing indicators | Foreign prince, poor grammar, suspicious attachment | <5% |
| Level 2 — Medium | Moderate sophistication | Fake password reset, package delivery, IT notification | <10% |
| Level 3 — Hard | Realistic, targeted | Spoofed internal sender, contextually relevant, brand-perfect | <15% |
| Level 4 — Expert | Spear phishing quality | Personalized, references real projects/events, bypasses quick checks | <25% |
Simulation Cadence:
- Month 1: Level 2 (baseline measurement)
- Month 2: Level 1 (confidence builder)
- Month 3: Level 3 (challenge)
- Month 4: Level 2 (different template)
- Repeat cycle, gradually increasing difficulty as organization matures
Simulation Rules:
- Never simulate during organizational crises, layoffs, or sensitive periods
- Never use real threats as simulation content (it desensitizes to actual attacks)
- Never publicly shame or discipline based on simulation results (counterproductive)
- Always provide immediate teachable moment when user clicks (landing page with education)
- Track but do not penalize reporting of simulations as suspicious (this is the desired behavior)
Response to Simulation Failure:
- First failure: Automated educational redirect + brief training module (15 min)
- Second failure (within 12 months): Manager notification + additional training
- Third failure (within 12 months): Manager + HR notification + 1:1 coaching session
- Chronic failure: Restrict email access to supervised mode; additional controls on endpoint
7.3 Metrics and Reporting
| Metric | Target | Measurement |
|---|---|---|
| Phishing click rate | <10% overall | Monthly simulation results |
| Phishing report rate | >60% | Users who report via phish button vs. total simulations sent |
| Training completion rate | >95% | LMS tracking, within 30 days of assignment |
| Time to report (real phish) | <10 minutes | Time from delivery to user report |
| Repeat clicker rate | <5% | Users who click in 2+ consecutive simulations |
| Security incident from user error | Decreasing trend | Incident root cause analysis |
| Security champion engagement | >80% attendance | Workshop attendance tracking |
Monthly Report (to CISO):
- Phishing simulation results with trend (6-month rolling)
- Training completion rates by department
- Top 5 departments by click rate (for targeted intervention)
- Repeat clicker count and trend
- Real phishing reports vs. simulations reported (measures reporting culture health)
- Program recommendations and planned changes
7.4 Continuous Improvement Cycle
Quarter 1: Assess — Baseline measurements, survey employee security knowledge
Quarter 2: Develop — Create/update content based on assessment gaps and current threats
Quarter 3: Deploy — Roll out updated training, increase simulation difficulty
Quarter 4: Measure — Analyze full-year metrics, benchmark against industry, plan next year
Content Update Triggers (outside regular cycle):
- New threat vector gaining traction (e.g., QR code phishing, deepfake voice)
- Organizational incident caused by human factor
- Regulatory change requiring new training content
- Significant change in metrics (click rate spike)
8. Change Management Security Review
8.1 Security Review Integration with Change Management
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ RFC │ │ Security │ │ CAB │ │ Post-Change │
│ Submitted │────▶│ Review Gate │────▶│ Approval │────▶│ Validation │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│
┌─────▼─────┐
│ Risk │
│ Accepted │
│ / Denied │
└───────────┘
8.2 Change Classification and Review Requirements
| Change Type | Security Review Required | Review Depth | Approver |
|---|---|---|---|
| Standard (pre-approved, low risk) | No (pre-assessed template) | N/A — template reviewed annually | Auto-approved |
| Normal — Low Risk | Checklist review (15 min) | Security checklist completion | Security analyst |
| Normal — Medium Risk | Security assessment (1–3 days) | Architecture review + threat assessment | Security architect |
| Normal — High Risk | Full security review (1–2 weeks) | Threat model + pentest + config review | Security architect + CISO |
| Emergency | Post-implementation review (within 48 hours) | Retrospective security assessment | Security on-call → Security architect |
8.3 Security Review Triggers
A security review is mandatory when the change involves ANY of the following:
- New internet-facing service or endpoint
- Changes to authentication or authorization mechanisms
- New data flows involving PII, PHI, financial, or classified data
- Changes to encryption (at rest or in transit)
- New third-party integration or API connection
- Changes to network segmentation or firewall rules
- New cloud service or account provisioning
- Changes to privileged access or admin interfaces
- Infrastructure changes to security tooling (SIEM, EDR, PAM, PKI)
- Changes to backup or disaster recovery infrastructure
- Operating system or middleware upgrades on production systems
- New software deployment to production (first-time deployment)
- Changes to CI/CD pipeline security controls
- Database schema changes affecting sensitive data tables
- DNS changes for production domains
- Certificate changes or PKI modifications
8.4 Security Review Checklist
For Infrastructure Changes:
- Network diagram updated showing new/modified data flows
- Firewall rules follow least privilege (specific source → specific destination, no ANY/ANY)
- Encryption in transit enforced (TLS 1.2+ minimum, TLS 1.3 preferred)
- Encryption at rest enabled for sensitive data stores
- Hardening baseline applied (CIS Benchmark or organizational standard)
- Vulnerability scan completed on new infrastructure before production deployment
- Logging configured and validated in SIEM (ref: Section 4.2)
- Monitoring alerts configured for security-relevant events
- Backup and recovery tested
- Access control configured per least privilege
- Patch management integrated (new system enrolled in vulnerability scanning)
- DNS configuration reviewed (no dangling records, DNSSEC where applicable)
- Certificate management integrated (expiration monitoring)
For Application Changes:
- SAST scan completed, no Critical/High findings unresolved
- DAST scan completed (for web applications), no Critical/High findings unresolved
- SCA scan completed, no known-exploited dependencies (KEV check)
- OWASP Top 10 assessment performed
- Authentication/authorization changes reviewed by security architect
- Input validation implemented for all user-controlled data
- Secrets management verified (no hardcoded credentials, API keys, or tokens)
- Logging implemented for security events (auth, authz, data access, errors)
- Error handling does not leak sensitive information
- API rate limiting and abuse controls implemented
- CORS, CSP, and security headers configured appropriately
- Privacy impact assessment completed if new PII processing
For Cloud Changes:
- IAM roles follow least privilege (no wildcard permissions)
- No resources publicly accessible unless explicitly required and documented
- Cloud security posture management (CSPM) alerts resolved
- Service control policies / organization policies enforced
- Network security groups / security groups restrict traffic appropriately
- Cloud-native logging enabled (CloudTrail, Activity Log, Audit Log)
- Tagging applied for cost allocation, ownership, and data classification
- Terraform/IaC security scanned (tfsec, checkov, or equivalent)
8.5 Emergency Change Security Process
When a change must bypass normal security review:
- Change implemented under emergency change process with verbal approval from CAB chair
- Security team notified immediately (email + chat) with change details
- Security on-call performs rapid risk assessment (30-minute target):
- What changed?
- Does it expose new attack surface?
- Were security controls maintained?
- Are logs being generated?
- Full security review completed within 48 hours post-implementation
- Any security findings from post-review tracked as P2 vulnerabilities
- If critical security gap identified: rollback or immediate mitigation required
8.6 Post-Change Security Validation
Within 24 hours of production deployment:
- Verify new/modified systems appear in vulnerability scanner scope
- Verify logs are flowing to SIEM with correct parsing
- Verify monitoring alerts are functional
- Run targeted vulnerability scan against changed systems
- Verify no unintended changes to security group/firewall rules
- Document security review completion in change ticket
9. Third-Party Risk Assessment
9.1 Vendor Risk Classification
| Tier | Criteria | Assessment Depth | Review Frequency |
|---|---|---|---|
| Tier 1 — Critical | Processes PII/PHI/financial data, OR has network access, OR single point of failure for business operations | Full security assessment + on-site/virtual audit | Annually |
| Tier 2 — High | Accesses internal systems (non-sensitive), OR provides business-critical SaaS, OR handles confidential (non-regulated) data | Detailed questionnaire + evidence review | Annually |
| Tier 3 — Medium | Limited data access, replaceable service, no direct system access | Standard questionnaire | Every 2 years |
| Tier 4 — Low | No data access, no system access, commodity service | Self-attestation | Every 3 years |
9.2 Assessment Process
Pre-Engagement (Before Contract)
- Vendor classification: Apply tiering criteria above
- Initial assessment:
- Tier 1–2: Issue security questionnaire (SIG Lite or SIG Core, or organizational questionnaire)
- Request evidence: SOC 2 Type II report, ISO 27001 certificate, pentest executive summary, insurance certificate
- For cloud/SaaS: Review shared responsibility model, data processing agreement, SLA
- Risk evaluation:
- Review questionnaire responses against organizational security baseline
- Identify gaps and compensating controls
- Assess residual risk
- Risk decision:
- Accept: Vendor meets security requirements
- Conditional accept: Vendor has gaps but commits to remediation timeline (tracked)
- Reject: Unacceptable risk, cannot be mitigated
Ongoing Monitoring
| Activity | Frequency | Tier Applicability |
|---|---|---|
| Automated security rating monitoring (BitSight/SecurityScorecard) | Continuous | Tier 1–2 |
| Review vendor security advisories/breach notifications | As published | All tiers |
| SOC 2 / ISO 27001 report review | Annually (on renewal) | Tier 1–2 |
| Questionnaire re-assessment | Per tier schedule | All tiers |
| Dark web monitoring for vendor breach indicators | Continuous | Tier 1 |
| SBOM review (for software vendors) | On major releases | Tier 1 (software) |
| Verify cyber insurance currency | Annually | Tier 1–2 |
Incident Handling for Third-Party Breaches
When a vendor reports a security incident:
- Receive notification: Document date, time, nature of incident, data involved
- Scope assessment (within 4 hours):
- What data of ours was affected?
- What systems of ours are connected to the vendor?
- Is the vendor still actively compromised?
- Containment (within 8 hours):
- Disable/restrict vendor access to our systems if active compromise
- Rotate credentials shared with vendor
- Block vendor IP ranges if warranted
- Increase monitoring on vendor-connected systems
- Communication:
- Internal: Brief CISO, Legal, affected business units
- External: Coordinate with vendor IR team; establish communication cadence
- Regulatory: Assess notification obligations (GDPR Art. 33: 72-hour window)
- Investigation:
- Review vendor access logs for unauthorized activity during breach window
- Assess whether our data was exfiltrated
- Determine if breach constitutes reportable incident for our organization
- Remediation tracking:
- Require vendor to provide root cause analysis and remediation plan
- Track vendor remediation to completion
- Re-assess vendor risk tier; update risk register
- Post-incident:
- Lessons learned: Do we need to reduce vendor access, add monitoring, or change tier?
- Contract review: Do we need to invoke breach notification clauses?
- Consider vendor replacement if vendor response was inadequate
9.3 Contract Security Requirements
Minimum contractual security clauses (for Tier 1–2 vendors):
| Clause | Requirement |
|---|---|
| Data protection | Compliance with applicable data protection laws (GDPR, CCPA, HIPAA) |
| Data processing agreement | Required for any vendor processing personal data |
| Breach notification | Notify within 24 hours of becoming aware of a breach affecting our data |
| Right to audit | Organization reserves right to audit vendor security controls |
| Subprocessor management | Vendor must notify of subprocessor changes; organization retains right to object |
| Data location | Data processing locations specified; restrictions on cross-border transfers |
| Encryption | Data encrypted in transit (TLS 1.2+) and at rest (AES-256+) |
| Access control | Vendor personnel access on least-privilege basis; MFA enforced |
| Background checks | Vendor personnel with data access must be background-checked |
| Insurance | Cyber liability insurance with minimum coverage of $X million |
| Termination data handling | Data return/deletion within 30 days of contract termination, with certification |
| Security certifications | Maintain SOC 2 Type II or ISO 27001 (or equivalent) throughout contract |
| Incident response | Vendor maintains documented IR plan; cooperates with organization during incidents |
| SLA for security patches | Critical/High vulnerabilities patched within 7/30 days respectively |
| Business continuity | Vendor maintains BC/DR plan with defined RTO/RPO |
9.4 Vendor Offboarding Security Checklist
When a vendor relationship ends:
- All vendor access credentials disabled/revoked (VPN, API keys, accounts)
- All vendor-specific firewall rules removed
- Vendor certificates revoked (if applicable)
- Data return confirmed (received backup/export of organizational data)
- Data deletion confirmed (vendor provides written certification of data destruction)
- CMDB updated to reflect vendor removal
- SIEM log sources from vendor infrastructure decommissioned or redirected
- Vendor removed from third-party risk register (or marked as "offboarded")
- Final SOC 2 / security assessment reviewed before contract end
- Any open risk items from vendor transferred to new vendor or closed
- DNS records pointing to vendor infrastructure updated/removed (prevent dangling records)
10. Business Continuity / DR Testing
10.1 Testing Schedule
| Test Type | Frequency | Participants | Duration | Success Criteria |
|---|---|---|---|---|
| Tabletop exercise — cyber scenario | Semi-annually | Leadership + IT + Security + Legal + PR + HR | 2–4 hours | All roles demonstrate understanding of procedures; gaps identified and tracked |
| Tabletop exercise — non-cyber (natural disaster, pandemic) | Annually | Leadership + Facilities + HR + IT | 2–4 hours | Business continuity plans validated; communication chains tested |
| Technical DR test — data restoration | Quarterly | IT Operations + DBA team | 4–8 hours | Data restored from backup within RTO; integrity verified |
| Technical DR test — full failover | Annually | IT Operations + App Owners + Security | 1–2 days (planned weekend) | Production workloads running on DR site within RTO; RPO met |
| Communication cascade test | Semi-annually | All staff (notification) + Leadership (response) | 30 minutes | 90% of staff acknowledge notification within 1 hour |
| Backup validation | Monthly | IT Operations | 2–4 hours | Random sample of backups restored and validated |
| Alternate site activation | Annually | IT + Business units | Full business day | Staff can perform critical functions from alternate location |
| Vendor BC/DR test coordination | Annually | Vendor management + critical vendors | Coordinated with vendor | Vendor demonstrates their DR capability for services we consume |
10.2 Tabletop Exercise Design
Pre-Exercise Preparation (4–6 weeks before)
- Define objectives: What do we want to test/validate?
- Decision-making processes
- Communication protocols
- Escalation procedures
- Regulatory notification workflows
- Technical recovery procedures
- Coordination between teams
- Design scenario: Create realistic, plausible scenario with injects (escalating complications)
- Prepare materials: Scenario document, inject cards, evaluation criteria, participant guide
- Invite participants: Confirmed attendance from all required roles
- Brief facilitator: External facilitator preferred for objectivity
Cyber Scenario Template
SCENARIO: [Name]
CATEGORY: [Ransomware / Data Breach / Supply Chain / Insider Threat / DDoS / Nation-State]
DIFFICULTY: [Introductory / Intermediate / Advanced]
BACKGROUND:
[Set the scene: day of week, time, what's happening in the organization]
INJECT 1 (T+0): Initial Detection
[First indicator of compromise. Who detects it? How?]
DISCUSSION: What is your initial response? Who do you contact?
INJECT 2 (T+2 hours): Scope Expands
[Additional systems affected, or new information reveals broader compromise]
DISCUSSION: How does this change your response? What containment actions?
INJECT 3 (T+6 hours): Business Impact
[Customer-facing systems affected, or data exfiltration confirmed]
DISCUSSION: Who needs to be informed? External communications?
INJECT 4 (T+12 hours): Escalation
[Media inquiry, regulatory notification deadline approaching, ransom demand]
DISCUSSION: Decision points for leadership. Legal obligations?
INJECT 5 (T+24 hours): Recovery Decisions
[Recovery options with trade-offs: speed vs. forensic preservation]
DISCUSSION: How do we prioritize recovery? What's our communication to customers?
INJECT 6 (T+72 hours): Post-Incident
[Incident contained. What now?]
DISCUSSION: Post-incident review, regulatory reporting, customer notification,
security improvements, lessons learned.
Sample Scenarios (rotate across exercises)
| Scenario | Focus Areas |
|---|---|
| Ransomware hitting production servers on Friday night | IR procedures, backup integrity, payment decision, comms |
| Customer database exfiltration discovered by third party | Data breach notification, regulatory compliance, forensics |
| Critical vendor compromise (SolarWinds-style) | Supply chain response, vendor coordination, scope assessment |
| Insider threat — privileged admin exfiltrating data | HR/Legal coordination, evidence preservation, access revocation |
| DDoS during peak business period | Business continuity, cloud scaling, ISP coordination |
| Business email compromise — CEO impersonation for wire transfer | Financial controls, verification procedures, employee awareness |
| Zero-day in widely-deployed software (Log4Shell-style) | Vulnerability response, asset inventory, patching at scale |
| Cloud account compromise via stolen API keys | Cloud IR, IAM review, secret rotation, blast radius |
10.3 Technical DR Test Procedure
Pre-Test
- Scope: Define which systems are being tested (full environment or subset)
- RTO/RPO targets: Document the recovery time and point objectives being tested
- Rollback plan: How to revert if DR test causes issues
- Notification: Inform all stakeholders of test window
- Monitoring: Enhanced monitoring during test (SOC aware, extra logging)
Test Execution
- Simulate failure: Disconnect primary site / shut down primary systems (for full failover) OR initiate recovery from backup (for restoration test)
- Record timeline:
- T+0: Failure initiated
- T+X: DR process triggered (manual or automatic?)
- T+X: First system available on DR site
- T+X: All critical systems available
- T+X: Users able to access systems
- T+X: Full service restored
- Validate functionality:
- All critical applications accessible
- Data integrity verified (compare checksums, record counts, transaction integrity)
- Authentication and authorization functioning
- Network connectivity between DR systems correct
- External access (VPN, web) functioning
- Security controls operational on DR site (EDR, SIEM logging, firewall rules)
- Monitoring and alerting functional
- Measure against RTO/RPO:
- Did recovery complete within RTO? If not, where were the delays?
- What was the actual data loss (RPO)? Compare to target.
Post-Test
- Failback: Return to primary site (this is also tested)
- Verify: All systems back on primary, data synchronized, no data loss from test
- Document: Test report with timeline, pass/fail criteria, findings
10.4 Lessons Learned Framework
After every exercise or test (within 7 days):
EXERCISE/TEST REPORT
Date: [Date]
Type: [Tabletop / Technical DR / Communication / Backup Validation]
Scenario: [Brief description]
Participants: [Roles, not just names]
Facilitator: [Name]
OBJECTIVES:
[What we set out to test]
RESULTS:
Objective 1: [PASS/PARTIAL/FAIL] — [Details]
Objective 2: [PASS/PARTIAL/FAIL] — [Details]
...
TIMELINE (for technical tests):
RTO Target: [X hours] RTO Actual: [X hours] [MET/MISSED]
RPO Target: [X hours] RPO Actual: [X hours] [MET/MISSED]
FINDINGS:
[FINDING-001] [Severity]
Description: [What went wrong or could be improved]
Impact: [Business impact if this occurred in a real event]
Recommendation: [Specific remediation]
Owner: [Who will fix this]
Due Date: [When]
WHAT WORKED WELL:
- [Positive observations — reinforce these behaviors]
GAPS IDENTIFIED:
- [Process gaps, documentation gaps, tool gaps, skill gaps]
ACTION ITEMS:
[ID] [Action] [Owner] [Due Date] [Status]
001 [Action] [Name] [Date] [Open]
...
Action Item Tracking:
- All action items entered into a tracking system (not just the report)
- Monthly review of open action items by BC/DR program owner
- Overdue items escalated to CISO
- Action items verified as complete before next exercise (prevents recurring gaps)
10.5 RTO/RPO Reference Table
| System Category | RTO Target | RPO Target | Backup Method | DR Strategy |
|---|---|---|---|---|
| Tier 1 — Revenue/Customer-facing | 4 hours | 1 hour | Synchronous replication | Active-active or hot standby |
| Tier 2 — Business Critical | 8 hours | 4 hours | Asynchronous replication | Warm standby |
| Tier 3 — Business Support | 24 hours | 24 hours | Daily backup | Cold standby or rebuild from IaC |
| Tier 4 — Low Impact | 72 hours | 48 hours | Daily backup | Rebuild from IaC/backup |
| Security Infrastructure (SIEM, EDR, PAM) | 4 hours | 1 hour | Synchronous replication | Hot standby (security cannot be offline during incident) |
10.6 Crisis Communication Template
Internal Notification (initial):
SUBJECT: [SEVERITY] Security Incident — [Brief Description]
A security incident has been declared at [TIME] on [DATE].
CURRENT STATUS: [Active / Contained / Resolved]
IMPACT: [Brief description of business impact]
AFFECTED SYSTEMS: [List]
IMMEDIATE ACTIONS:
- [What employees should do / not do]
- [Who to contact with questions]
- [Where to find updates]
DO NOT discuss this incident on social media, personal email, or with
external parties. All external communications will go through [PR/Legal].
Next update will be provided at [TIME].
— [CISO / Incident Commander Name]
External Communication (customer-facing, Legal-approved):
SUBJECT: Security Notice — [Organization Name]
We are writing to inform you of a security incident that [organization]
identified on [DATE].
WHAT HAPPENED: [Brief, factual description]
WHAT INFORMATION WAS INVOLVED: [Specific data types affected]
WHAT WE ARE DOING: [Steps taken to address the incident]
WHAT YOU CAN DO: [Specific protective actions for customers]
For questions, contact [dedicated email/phone] or visit [dedicated webpage].
We take the security of your information seriously and sincerely apologize
for any inconvenience.
Appendix A: Regulatory Notification Requirements Quick Reference
| Regulation | Notification Deadline | Notify Whom | Trigger |
|---|---|---|---|
| GDPR Art. 33 | 72 hours from awareness | Supervisory authority | Personal data breach likely to result in risk to individuals |
| GDPR Art. 34 | Without undue delay | Affected individuals | High risk to rights and freedoms |
| HIPAA | 60 days from discovery | HHS OCR + affected individuals + media (if >500) | Unsecured PHI breach |
| PCI DSS | Immediately | Acquiring bank + card brands | Cardholder data compromise |
| SEC (public companies) | 4 business days | SEC (8-K filing) | Material cybersecurity incident |
| State breach laws (US) | Varies (30–90 days) | State AG + affected individuals | PII breach (state-specific definitions) |
| NIS2 (EU) | 24 hours (early warning), 72 hours (full) | CSIRT / competent authority | Significant incident |
| DORA (EU financial) | 4 hours (initial), 72 hours (intermediate) | Competent authority | Major ICT-related incident |
| CCPA/CPRA | Without unreasonable delay | Affected CA residents | Personal information breach |
Note: Legal counsel must be consulted before any external notification. This table is a reference, not legal advice.
Appendix B: RACI Matrix for Security Operations
| Activity | SOC Analyst | SOC Lead | SOC Manager | CISO | IT Ops | Legal | HR | PR |
|---|---|---|---|---|---|---|---|---|
| Alert triage | R | A | I | - | - | - | - | - |
| Incident declaration | C | R | A | I | I | - | - | - |
| Incident containment | R | A | I | I | C | - | - | - |
| Regulatory notification | I | I | C | A | - | R | - | C |
| External communication | - | - | C | A | - | C | - | R |
| Employee communication | - | - | C | A | - | C | R | C |
| Vulnerability scanning | R | A | I | - | C | - | - | - |
| Patch deployment | C | I | I | - | R/A | - | - | - |
| Access review | C | R | A | I | C | - | C | - |
| Pentest coordination | C | R | A | I | C | C | - | - |
| Vendor risk assessment | C | C | R | A | C | C | - | - |
| DR test execution | C | C | I | I | R/A | - | - | - |
| Security awareness | C | C | R | A | C | - | C | C |
| Change security review | R | A | I | C | C | - | - | - |
| Threat intel processing | R | A | I | I | - | - | - | - |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Appendix C: Tool Stack Reference
| Function | Primary Tool | Backup/Alternative | Owner |
|---|---|---|---|
| SIEM | [Tool Name] | [Alternative] | SOC |
| SOAR | [Tool Name] | Manual playbooks | SOC |
| EDR | [Tool Name] | [Alternative] | Endpoint Security |
| Vulnerability Scanner | [Tool Name] | [Alternative] | VM Team |
| Threat Intelligence Platform | [Tool Name] | Manual feed processing | TI Team |
| Ticketing | [Tool Name] | Email (degraded mode) | IT |
| PAM | [Tool Name] | Manual credential vault | IAM |
| CSPM | [Tool Name] | Manual cloud audit | Cloud Security |
| Email Security Gateway | [Tool Name] | [Alternative] | Email Admin |
| WAF | [Tool Name] | Cloud provider WAF | Network Security |
| Network Firewall | [Tool Name] | [Alternative] | Network Security |
| Backup Solution | [Tool Name] | [Alternative] | IT Ops |
| Communication (incident) | [Tool Name] — out of band | Phone bridge | SOC |
| Phishing Simulation | [Tool Name] | Manual campaigns | Security Awareness |
| Code Scanning (SAST/SCA) | [Tool Name] | [Alternative] | AppSec |
Fill in tool names per organizational deployment. This table serves as a quick reference for operators and for BC/DR planning (knowing dependencies).
Appendix D: Key Contacts Template
| Role | Primary | Backup | Phone | Escalation Trigger | |
|---|---|---|---|---|---|
| SOC Shift Lead (current) | [Name] | [Name] | [Phone] | [Email] | First line for all security events |
| SOC Manager | [Name] | [Name] | [Phone] | [Email] | P1/P2 incidents, staffing issues |
| CISO | [Name] | [Name] | [Phone] | [Email] | Declared incidents, regulatory notifications |
| CTO / CIO | [Name] | [Name] | [Phone] | [Email] | Business-critical system decisions |
| General Counsel | [Name] | [Name] | [Phone] | [Email] | Breach notification, law enforcement, legal holds |
| Head of PR/Communications | [Name] | [Name] | [Phone] | [Email] | External communications, media inquiries |
| Head of HR | [Name] | [Name] | [Phone] | [Email] | Insider threat, employee-related incidents |
| External IR Retainer | [Firm] | [Firm] | [Phone] | [Email] | When internal capacity exceeded |
| External Legal Counsel | [Firm] | [Firm] | [Phone] | [Email] | Regulatory, litigation, law enforcement |
| Cyber Insurance Broker | [Firm] | [Firm] | [Phone] | [Email] | Incident notification per policy |
| Law Enforcement (FBI/CISA) | [Field office] | [CISA region] | [Phone] | [Email] | Nation-state, critical infrastructure, ransomware |
| Regulatory Body | [Agency] | [Agency] | [Phone] | [Email] | Per notification requirements |
Document generated from operational security frameworks including NIST SP 800-61r2, NCSC Incident Management Collection, CERT Societe Generale IRM-2022, CISA BOD 22-01, OWASP Vulnerability Disclosure Cheat Sheet, CIS Controls v8, and MITRE ATT&CK v14. Customize all bracketed fields and tool references for organizational deployment.