Security Metrics, KPIs & Measurement — Deep Dive

CIPHER Training Module — Security Program Measurement & Executive Reporting Generated: 2026-03-14

Foundations: Why Measure Security
Vulnerability Management Metrics
SOC & Detection Metrics
Application Security Metrics
Risk Metrics & Quantification
Compliance Metrics
Detection Engineering Metrics
Security Awareness Metrics
OWASP SAMM Maturity Scoring
Executive Dashboard Design
Scoring Systems Deep Dive: CVSS v4.0 & EPSS
Anti-Patterns & Pitfalls

1. Foundations: Why Measure Security

Security measurement serves three purposes: operational improvement, risk communication, and resource justification. Metrics without these anchors are vanity metrics.

The Measurement Hierarchy

Level 4: Business Risk Outcomes     (Board / C-Suite)
Level 3: Program Effectiveness      (CISO / VP)
Level 2: Operational Efficiency      (Directors / Managers)
Level 1: Activity & Volume           (Team Leads / Analysts)

Cardinal rule: every Level 1 metric must roll up into a Level 3 or 4 narrative, or it should not exist. Activity metrics (scans run, tickets opened) are inputs, not outcomes.

Metric Design Principles

Actionable: if the number moves, someone knows what to do
Comparable: consistent measurement over time enables trend analysis
Contextual: raw numbers without baselines are meaningless
Owned: every metric has a single accountable owner
Lagging vs. Leading: track both — lagging confirms reality, leading predicts it

NIST CSF 2.0 as Measurement Backbone

NIST CSF 2.0 (February 2024) provides the structural framework for organizing security metrics across six core functions:

Function	Measurement Focus
Govern (new in 2.0)	Policy coverage, risk appetite adherence, program maturity, board reporting cadence
Identify	Asset inventory completeness, risk assessment currency, data classification coverage
Protect	Control implementation rate, access review completion, training coverage
Detect	MTTD, detection coverage by ATT&CK, alert fidelity, log source coverage
Respond	MTTR, containment time, playbook execution rate, communication SLAs
Recover	RTO/RPO achievement, backup test success rate, service restoration time

CSF Implementation Tiers (1-4: Partial, Risk Informed, Repeatable, Adaptive) provide maturity scoring across each function. Organizations create Current and Target Profiles, and the gap between them becomes the measurement target.

[CONFIRMED] — CSF 2.0's addition of the Govern function reflects the industry shift toward treating cybersecurity as a governance concern, not purely a technical one. Source: NIST CSF 2.0, February 2024.

2. Vulnerability Management Metrics

Core KPIs

Mean Time to Detect (MTTD)

Time from vulnerability introduction (or public disclosure) to organizational awareness.

MTTD = Avg(Detection_Timestamp - Disclosure_Timestamp)

Segment by:

Discovery method (scanner, pentest, bug bounty, vendor advisory, OSINT)
Asset criticality tier
Vulnerability severity

Target benchmarks:

Critical CVEs: < 24 hours from NVD publication
Scanner-detectable: < scan interval + 4 hours processing
Zero-days: measured against threat intel feed latency

Mean Time to Remediate (MTTR)

Time from detection to verified remediation.

MTTR = Avg(Remediation_Verified_Timestamp - Detection_Timestamp)

SLA tiers (industry-standard starting points, adjust per risk appetite):

Severity	CVSS-BT Range	SLA Target	Stretch Goal
Critical	9.0-10.0	15 days	7 days
High	7.0-8.9	30 days	15 days
Medium	4.0-6.9	90 days	45 days
Low	0.1-3.9	180 days	90 days

SLA Adherence Rate:

SLA_Adherence = (Vulns_Remediated_Within_SLA / Total_Vulns_Due) * 100

Track this as a trend line, not a point-in-time number. A declining SLA adherence rate is the leading indicator that the vulnerability program is drowning.

Patch Compliance Rate

Patch_Compliance = (Systems_Patched_Within_SLA / Total_Systems_Requiring_Patch) * 100

Segment by:

OS type (Windows, Linux, macOS, firmware)
Environment (production, staging, development, OT/ICS)
Business unit / asset owner
Patch type (security, feature, emergency)

Risk-Adjusted Vulnerability Backlog

Raw backlog counts are misleading. Weight by exploitability and asset criticality:

Risk_Score = CVSS-BT_Score * Asset_Criticality_Weight * EPSS_Probability

Where Asset_Criticality_Weight:

Tier 1 (crown jewels): 3.0x
Tier 2 (business critical): 2.0x
Tier 3 (standard): 1.0x
Tier 4 (low impact): 0.5x

Track risk-weighted backlog as a single number over time. The goal is risk reduction, not ticket count reduction.

Vulnerability Recurrence Rate

Recurrence_Rate = (Vulns_Reopened_or_Reintroduced / Total_Vulns_Closed) * 100

A high recurrence rate indicates systemic issues: missing root cause analysis, incomplete patching, or configuration drift. This metric separates teams that fix symptoms from teams that fix problems.

Vulnerability Disclosure Metrics

Per OWASP Vulnerability Disclosure Cheat Sheet guidance:

Metric	Description	Target
Acknowledgment Time	Time to acknowledge researcher report	< 1 business day
Triage Time	Time to confirm/deny vulnerability	< 5 business days
Fix Timeline Communication	Time to provide researcher with fix ETA	< 10 business days
Disclosure Window	Time from report to public disclosure	90 days (Project Zero standard)
Researcher Satisfaction	NPS or survey score from reporters	> 70 NPS

[CONFIRMED] — Google Project Zero's 90-day disclosure standard has become the de facto industry benchmark. Organizations without a defined disclosure policy face uncoordinated disclosure risk. Source: OWASP Vulnerability Disclosure Cheat Sheet.

Advanced: EPSS-Informed Prioritization

Replace severity-only triage with probability-weighted prioritization:

Strategy	Effort (% of vulns actioned)	Coverage (exploited vulns caught)	Efficiency (precision)
CVSS >= 7	57.4%	82.2%	3.96%
EPSS >= 10%	2.7%	63.2%	65.2%
EPSS >= 1% + CVSS >= 7	~15%	~85%	~15%

[CONFIRMED] — EPSS data from October 2023 demonstrates that CVSS-only prioritization forces teams to action 57% of all vulnerabilities while achieving only 4% efficiency. EPSS at 10% threshold reduces effort to 2.7% with 65% efficiency. Source: FIRST EPSS model documentation.

Practical guidance: EPSS explicitly rejects universal thresholds. Organizations must select thresholds matching their risk tolerance:

Resource-constrained teams: higher thresholds (e.g., EPSS >= 50%) for maximum efficiency per remediation dollar
Mission-critical environments: lower thresholds (e.g., EPSS >= 1-5%) accepting higher effort for broader coverage
Optimal: combine EPSS probability with CVSS severity and asset criticality for a composite risk score

3. SOC & Detection Metrics

Operational KPIs

Alert Volume & Triage Rate

Daily_Alert_Volume = Total alerts generated per 24-hour period
Triage_Rate = Alerts_Triaged / Total_Alerts * 100

Track by:

Source (SIEM, EDR, NDR, cloud, identity)
Severity
Shift/analyst
Disposition (TP, FP, benign-TP, inconclusive)

Healthy range: analysts should triage 15-25 alerts per shift (8 hours) with adequate investigation depth. If volume exceeds this, detection tuning is the fix — not more analysts.

False Positive Rate

FP_Rate = False_Positives / (True_Positives + False_Positives) * 100

Target: < 30% across the detection stack. Rules with > 80% FP rate should be disabled, tuned, or replaced.

Track FP rate per rule/use case. Aggregate FP rate masks individual rule problems. A SOC with 25% aggregate FP rate might have 5 rules at 95% FP generating 40% of all alerts.

Mean Time to Detect (MTTD) — SOC Context

MTTD = Avg(Alert_Timestamp - Compromise_Timestamp)

This is the hardest SOC metric to measure honestly because compromise timestamp is often unknown until post-incident forensics. Proxies:

Time from red team action to detection (during exercises)
Time from threat intel IOC publication to detection rule deployment
Dwell time from incident investigations (retrospective)

Industry benchmark: median dwell time is ~10 days (Mandiant M-Trends 2025), down from 16 days in 2023. Organizations with mature detection programs target < 24 hours for priority TTPs.

Mean Time to Respond (MTTR) — SOC Context

MTTR = Avg(Containment_Timestamp - Alert_Timestamp)

Segment into sub-metrics:

Time to Acknowledge: alert generated to analyst pickup
Time to Investigate: pickup to determination (TP/FP/escalation)
Time to Contain: determination to containment action executed
Time to Resolve: containment to full remediation

Sub-metric	P1 Target	P2 Target	P3 Target
Acknowledge	5 min	15 min	1 hour
Investigate	30 min	2 hours	8 hours
Contain	1 hour	4 hours	24 hours
Resolve	24 hours	72 hours	2 weeks

Analyst Productivity

Cases_Per_Analyst_Per_Month = Total_Cases_Closed / FTE_Analysts
Escalation_Rate = Cases_Escalated_to_Tier2_or_IR / Total_Cases
Automation_Rate = Cases_Auto_Resolved / Total_Cases

Warning: do not incentivize cases-closed velocity. This drives premature closure and shallow investigation. Balance with quality metrics (reopened cases, missed detections found in retrospective analysis).

Log Source Coverage

Log_Coverage = Active_Log_Sources / Total_Expected_Log_Sources * 100

Map against your asset inventory. A SIEM receiving logs from 60% of production systems has a 40% blind spot. Track:

Coverage by asset tier (crown jewels must be 100%)
Coverage by log type (authentication, process execution, network, file, cloud API)
Log latency (time from event to SIEM indexing)
Log completeness (are you getting ALL event types, or just a subset?)

4. Application Security Metrics

Vulnerability Density

Vuln_Density = Vulnerabilities / KLOC (thousands of lines of code)

Or per application:

App_Vuln_Density = Open_Vulnerabilities / Application_Count

Segment by:

Severity (critical/high vs. medium/low)
Vulnerability class (injection, auth, crypto, config)
Age (< 30 days, 30-90, 90-180, > 180)
Source (SAST, DAST, SCA, pentest, bug bounty)

Benchmark: mature programs target < 1 critical/high per 10 KLOC for new code.

Fix Rate & Velocity

Fix_Rate = Vulns_Fixed_This_Period / Vulns_Open_Start_of_Period * 100
Net_New_Rate = New_Vulns_Introduced / Vulns_Fixed

Net New Rate is the critical metric. If > 1.0, the backlog is growing. Track this weekly for active development teams. A sustained Net New Rate > 1.0 means the AppSec program is losing ground regardless of how many vulns it fixes.

SAST/DAST Coverage

SAST_Coverage = Repos_With_SAST_Enabled / Total_Active_Repos * 100
DAST_Coverage = Apps_With_DAST_Scans / Total_Deployed_Apps * 100
Pipeline_Integration = Pipelines_With_Security_Gates / Total_CI_CD_Pipelines * 100

Target: 100% SAST on all active repos, DAST on all deployed web applications. Track "scan success rate" separately — a pipeline with SAST enabled but consistently failing/skipped is not covered.

Dependency Health

Dependency_Age = Avg(Current_Date - Latest_Dependency_Release_Date)
Vulnerable_Dependencies = Deps_With_Known_CVEs / Total_Dependencies * 100
Direct_vs_Transitive = Vulnerable_Transitive_Deps / Total_Vulnerable_Deps * 100

Track dependency age as a leading indicator. Dependencies > 2 years behind latest release are significantly more likely to have unpatched vulnerabilities and harder to upgrade (breaking changes accumulate).

Security Debt

Security_Debt_Days = Sum(Estimated_Remediation_Hours_Per_Vuln) for all open vulns
Debt_Ratio = Security_Debt_Days / Total_Development_Capacity_Days

Express security debt in developer-days. This translates to language leadership understands: "We have 340 developer-days of security debt. At current allocation (2 devs), that's 170 business days — roughly 8 months of dedicated work."

5. Risk Metrics & Quantification

Quantitative Risk Measurement

Per the risk-measurement framework (magoo/risk-measurement), effective security risk measurement replaces subjective heat maps with calibrated probability estimates.

Core principle: "Risk Measurement is written to help you measure complicated risks using a process that's simple enough to work out on the back of a napkin and powerful enough to organize a rocket launch."

[CONFIRMED] — Quantitative risk analysis using calibrated estimation, probability distributions, and Monte Carlo simulation produces more defensible risk assessments than qualitative red/yellow/green matrices. Source: magoo/risk-measurement.

Key Quantitative Approaches

FAIR (Factor Analysis of Information Risk):

Decomposes risk into Loss Event Frequency (LEF) and Loss Magnitude (LM)
LEF = Threat Event Frequency * Vulnerability (probability of successful attack)
LM = Primary Loss + Secondary Loss (regulatory fines, reputation damage)
Uses Monte Carlo simulation to produce loss distribution curves
Output: "There is a 90% probability that annual losses from this risk scenario will be between $500K and $12M"

Calibrated Estimation:

Experts provide 90% confidence intervals instead of point estimates
Training improves calibration (most untrained estimators are overconfident)
Track estimation accuracy over time: Brier scores, calibration curves
Key KPI: % of actual outcomes falling within stated confidence intervals

Risk Exposure Metrics

Aggregate Risk Exposure

Total_Risk_Exposure = Sum(Probability_i * Impact_i) for all identified risks

Track monthly. The trend matters more than the absolute number.

Risk Reduction Rate

Risk_Reduction = (Risk_Exposure_Previous - Risk_Exposure_Current) / Risk_Exposure_Previous * 100

Attribute risk reduction to specific controls/investments. This directly answers "what did we get for the $2M we spent on security this year?"

Residual Risk

Residual_Risk = Inherent_Risk - Risk_Mitigated_by_Controls

Every risk register entry should carry:

Inherent risk score (before controls)
Control effectiveness rating (0-100%)
Residual risk score (after controls)
Risk appetite threshold

Risk Appetite Adherence

Risks_Within_Appetite = Risks_Below_Threshold / Total_Identified_Risks * 100

Risks exceeding appetite require documented acceptance with named executive owner and review date. Track:

Number of accepted risks exceeding appetite
Age of risk acceptances (stale acceptances are unmanaged risks)
Acceptance owner distribution (concentration = governance problem)

Risk Treatment Metrics

Metric	Formula	Target
Treatment Plan Completion	Plans_On_Track / Total_Plans * 100	> 85%
Risk Exception Age	Avg days since exception granted	< 180 days
Risk Assessment Currency	Assessments_Current / Total_Required * 100	> 90%
Third-Party Risk Coverage	Vendors_Assessed / Critical_Vendors * 100	100% for Tier 1

6. Compliance Metrics

Control Coverage

Control_Coverage = Controls_Implemented / Controls_Required * 100

Map against your applicable frameworks (NIST 800-53, CIS Controls, ISO 27001, SOC 2, PCI DSS). Track per framework and per control family.

Control Effectiveness

Implementation is not effectiveness. A firewall rule that exists but permits all traffic has 100% implementation and 0% effectiveness.

Control_Effectiveness = Controls_Verified_Effective / Controls_Implemented * 100

Verification methods:

Automated testing (configuration validation, policy checks)
Internal audit findings
Pentest/red team results
Incident post-mortems (did the control work when tested by a real attacker?)

Audit Metrics

Metric	Description	Target
Audit Finding Count	Open findings by severity	Trending down
Finding Remediation Rate	Findings closed within SLA	> 90%
Repeat Findings	Same finding across consecutive audits	0
Days to Remediate	Avg time from finding to closure	< 90 days for high
Evidence Collection Time	Time to produce audit evidence	< 2 days per request
Audit Readiness Score	Pre-audit self-assessment	> 85%

Repeat findings are the most important audit metric. A repeat finding means the organization knew about a problem, committed to fixing it, and failed. This is a governance failure, not a technical one.

Regulatory Compliance Posture

For regulated industries, track:

Regulatory_Readiness = (Controls_Meeting_Requirement / Total_Regulatory_Requirements) * 100

Per regulation (GDPR, HIPAA, PCI DSS, SOX, etc.):

Requirements mapped to controls
Control evidence freshness
Gap count and severity
Remediation timeline for gaps
Regulatory examination findings (if applicable)

7. Detection Engineering Metrics

ATT&CK Coverage

ATT&CK_Coverage = Techniques_With_Detection / Total_Applicable_Techniques * 100

Do not aim for 100%. Not all techniques are equally relevant to your environment. Weight by:

Threat intelligence (what TTPs do your likely adversaries use?)
Environment applicability (T1546.015 COM hijacking is irrelevant in a Linux-only shop)
Detection feasibility (some techniques are inherently difficult to detect)

Coverage Depth Score

For each covered technique, assess detection quality:

Level	Description	Score
0	No detection	0
1	Log visibility exists but no rule	1
2	Detection rule exists, not validated	2
3	Rule validated against simulated attack	3
4	Rule tuned with known FP patterns documented	4
5	Rule integrated into automated response	5

Coverage_Depth = Sum(Technique_Scores) / (Total_Applicable_Techniques * 5) * 100

Rule Performance Metrics

Per detection rule:

Metric	Formula	Healthy Range
True Positive Rate	TP / (TP + FN)	> 70%
Precision	TP / (TP + FP)	> 50%
Alert Volume	Alerts per day/week	Manageable by team
Time to Triage	Avg investigation time	< 30 min for P1
Last Validated	Date of last purple team test	< 90 days
Evasion Resistance	Variants detected / variants tested	> 60%

Detection Gap Trend

Track monthly:

New_Detections_Added - Detections_Retired = Net_Detection_Change
Gap_Closure_Rate = Gaps_Closed / Gaps_Identified * 100

Map gaps against threat intelligence. A gap for a TTP your adversaries actively use is critical. A gap for a theoretical technique nobody targets your industry with is informational.

Detection-as-Code Metrics

Metric	Description	Target
Rules in Version Control	% of rules managed in Git	100%
Rules with Tests	Rules with automated validation	> 80%
Deployment Automation	Rules auto-deployed via CI/CD	> 90%
Rule Review Cadence	Rules reviewed/updated per quarter	100% per year
Mean Time to Deploy	Rule creation to production	< 4 hours for priority

8. Security Awareness Metrics

Phishing Simulation Metrics

Metric	Formula	Target
Click Rate	Users_Clicked / Users_Targeted * 100	< 5%
Report Rate	Users_Reporting / Users_Targeted * 100	> 70%
Click-to-Report Ratio	Reports / Clicks	> 3:1
Repeat Clickers	Users_Clicking_Multiple_Campaigns / Total_Clickers	< 10%
Credential Submission Rate	Users_Submitting_Creds / Users_Clicked * 100	< 20% of clickers
Time to First Report	Fastest report after send	< 2 min

Click rate alone is a terrible metric. A 3% click rate with 5% report rate is worse than 8% click rate with 75% report rate. The second organization has a human detection layer; the first does not.

Training Metrics

Metric	Description	Target
Completion Rate	Users completing required training	> 95%
On-Time Completion	Users completing before deadline	> 90%
Knowledge Assessment Score	Post-training quiz scores	> 80% avg
Knowledge Retention	Score on re-test after 6 months	> 70%
Behavior Change	Reduction in risky behaviors post-training	Measurable improvement

Awareness Program Effectiveness

The true measure is behavior change, not training completion:

Reduction in security incidents caused by human error
Increase in suspicious activity reports from employees
Decrease in policy violations (USB usage, shadow IT, data handling)
Improvement in secure development practices (for technical staff)

9. OWASP SAMM Maturity Scoring

Model Structure

OWASP SAMM (Software Assurance Maturity Model) organizes application security into 5 business functions with 15 security practices:

[CONFIRMED] — SAMM 2.0 structure. Source: owaspsamm.org/model.

Business Function	Security Practices
Governance	Strategy & Metrics, Policy & Compliance, Education & Guidance
Design	Threat Assessment, Security Requirements, Secure Architecture
Implementation	Secure Build, Secure Deployment, Defect Management
Verification	Architecture Assessment, Requirements-driven Testing, Security Testing
Operations	Incident Management, Environment Management, Operational Management

Maturity Levels

Each practice has 3 maturity levels:

Level	Characterization	Typical State
1	Initial / Ad-hoc	Basic practices exist, inconsistently applied. Individuals doing security work without organizational support.
2	Managed / Defined	Practices are documented, consistent, and organization-wide. Processes exist and are followed.
3	Optimized / Measured	Continuous improvement based on metrics. Practices are automated, measured, and feed back into program improvement.

Scoring Methodology

Each practice is scored 0-3 based on maturity level achieved. The organization's overall SAMM score is the average across all 15 practices:

SAMM_Score = Sum(Practice_Scores) / 15

Individual function scores:

Function_Score = Sum(Function_Practice_Scores) / 3

Assessment Approach

Scope: define which applications/business units to assess
Evaluate: for each practice, determine current maturity level using SAMM's activity questions
Score: assign 0-3 per practice based on evidence
Target: set target maturity per practice (not everything needs to be level 3)
Roadmap: prioritize improvement activities based on gap between current and target
Reassess: periodic reassessment (annually recommended) to measure progress

SAMM Metrics Integration

Each practice at Level 3 implies metrics-driven management. Key metrics per function:

Function	Key Metrics at Maturity
Governance	Security budget as % of IT spend, training coverage, policy review cadence
Design	% of projects with threat models, security requirements coverage
Implementation	Secure build pipeline adoption, dependency vulnerability rate, defect fix rate
Verification	% of apps with security testing, finding density trends, architecture review coverage
Operations	Incident response time, environment hardening score, patching compliance

Using SAMM for Program Measurement

Track SAMM scores over time as a program maturity indicator:

Quarter    Governance  Design  Implementation  Verification  Operations  Overall
Q1 2025    1.3         0.7     1.0             0.7           1.0         0.9
Q2 2025    1.7         1.0     1.3             1.0           1.0         1.2
Q3 2025    2.0         1.3     1.7             1.3           1.3         1.5
Q4 2025    2.0         1.7     2.0             1.7           1.7         1.8

This provides a defensible, industry-standard maturity narrative for executive reporting.

10. Executive Dashboard Design

What to Show

The Executive Security Dashboard (1 page)

Section 1: Risk Posture (top of page, most prominent)

Overall risk exposure trend (last 12 months)
Risk appetite adherence: % of risks within appetite
Top 5 risks with owner and status
Risk reduction attributed to security investments

Section 2: Threat Landscape (contextualizes the risk)

Active threats relevant to the organization (from threat intel)
Incidents this period: count, severity, business impact
Near misses / blocked attacks (demonstrates value)

Section 3: Program Health (3-5 key operational metrics)

Vulnerability MTTR trend (are we getting faster?)
Detection coverage score (ATT&CK-based)
SLA adherence for critical/high vulnerabilities
Compliance posture across applicable frameworks
SAMM maturity score trend

Section 4: Investment & Capacity

Security spend vs. industry benchmark
Key initiative status (on track / at risk / blocked)
Staffing: current FTEs, open positions, attrition

Design Principles

Trend lines over point-in-time numbers: a single number is meaningless without context. Show 6-12 months of trend.
Red/yellow/green with thresholds: define what red means BEFORE building the dashboard. If everything is always green, thresholds are wrong.
Comparison baselines: compare to last quarter, last year, and industry benchmarks where available.
Narrative, not just numbers: each metric needs a one-sentence "so what?" annotation.
Drill-down capability: executives see the top level; managers can drill into operational detail.

What NOT to Show

Avoid	Why	Show Instead
Raw alert volume	Meaningless without context; bigger number != more secure	Alert-to-incident ratio, FP rate trend
Total vulnerabilities found	Penalizes organizations that scan more	Vulnerability density, MTTR, net new rate
Scan count / tool inventory	Activity, not outcome	Coverage percentage, gap analysis
Compliance checklist completion	Checkbox security; presence != effectiveness	Control effectiveness rate, audit finding trend
Vanity metrics (phishing emails blocked)	Inflated numbers that mean nothing	Phishing click rate trend, report rate
Technical jargon	Executives don't care about Sigma rules	"We can detect X% of techniques used by threat actors targeting our industry"
Too many metrics	Dilutes attention	Maximum 8-10 KPIs per dashboard

Reporting Cadence

Audience	Cadence	Format	Depth
Board of Directors	Quarterly	3-5 slides	Strategic risk, major incidents, program maturity
C-Suite / ELT	Monthly	1-page dashboard + narrative	Risk posture, program health, investment ROI
VP / Directors	Weekly	Operational dashboard	Team metrics, SLA adherence, capacity
Team Leads	Daily	Automated dashboards	Alert queues, backlog, sprint progress

11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS

CVSS v4.0 — Key Changes from v3.1

[CONFIRMED] — Source: FIRST CVSS v4.0 Specification Document.

Metric Groups

Group	Purpose	Affects Score?
Base	Intrinsic vulnerability characteristics, constant over time	Yes
Threat	Current exploit status; replaces v3.1 Temporal	Yes
Environmental	Organization-specific context, compensating controls	Yes
Supplemental	Additional context for prioritization	No

New Nomenclature (Mandatory)

Label	Metrics Included	Use Case
CVSS-B	Base only	NVD-published scores
CVSS-BT	Base + Threat	Score adjusted for exploit availability
CVSS-BE	Base + Environmental	Score adjusted for org-specific context
CVSS-BTE	Base + Threat + Environmental	Fully contextualized score

Critical implication: when someone says "CVSS 9.1" you must ask "CVSS-B, BT, BE, or BTE?" A CVSS-B 9.1 is very different from a CVSS-BTE 9.1.

New Metrics

Metric	Group	Values	Purpose
Attack Requirements (AT)	Base	None / Present	Deployment conditions beyond security hardening (race conditions, network positioning)
Automatable (AU)	Supplemental	No / Yes	Can attacker automate all kill chain steps?
Provider Urgency (U)	Supplemental	Red / Amber / Green / Clear	Vendor's severity assessment
Recovery (R)	Supplemental	Automatic / User / Irrecoverable	System resilience post-exploitation
Value Density (V)	Supplemental	Diffuse / Concentrated	Resource concentration per exploit
Vulnerability Response Effort (RE)	Supplemental	Low / Moderate / High	Remediation difficulty for consumers
Safety (S)	Supplemental + Environmental	Negligible / Present	Human injury risk (IEC 61508)

Scoring Methodology Change

v4.0 replaces v3.1's linear formula with a MacroVector equivalence class system:

Vectors cluster into MacroVectors (equivalence sets of comparable qualitative severity)
Six equivalence groups (EQ1-EQ6) determined through expert evaluation
Score = MacroVector lookup score, interpolated by "severity distance" within the class
Produces scores rounded to one decimal place

Practical impact: v4.0 scores are NOT directly comparable to v3.1 scores. Organizations transitioning must re-baseline their SLA thresholds.

EPSS — Exploit Prediction Scoring System

[CONFIRMED] — Source: FIRST EPSS Model Documentation.

What EPSS Measures

Daily probability estimate that a published CVE will see exploitation activity in the next 30 days.

Model Inputs

Category	Sources
Vulnerability metadata	CPE, CWE, CVSS vectors (via NVD)
Temporal signals	Days since CVE publication
Known exploitation	CISA KEV, Google Project Zero, Zero Day Initiative
Public exploits	Exploit-DB, GitHub, Metasploit
Security tools	Nuclei, Intrigue, sn1per, jaeles templates
Exploitation evidence	Honeypots, IDS/IPS sensors, host-based detection (from data partners)

Model Methodology

Trains on 12 months of historical data
Validates against 2 months of unseen "future" data
Daily refresh of all probability estimates
Measures attempted exploitation (not successful exploitation)
Recognizes exploitation is "bursty, sporadic, sometimes isolated, localized and ephemeral"

Using CVSS + EPSS Together

The optimal vulnerability prioritization strategy combines both:

Priority_Score = f(CVSS-BTE_Score, EPSS_Probability, Asset_Criticality)

Decision matrix:

EPSS	CVSS High (>=7)	CVSS Low (<7)
High (>=10%)	Immediate action	Investigate — likely exploited despite low severity
Low (<10%)	Standard SLA — severe but unlikely to be exploited	Backlog — lowest priority

12. Anti-Patterns & Pitfalls

Metric Anti-Patterns

Anti-Pattern	Why It's Harmful	Alternative
Measuring activity, not outcomes	"We ran 500 scans" says nothing about security posture	Measure coverage, finding trends, MTTR
Vanity metrics	Inflated numbers (blocked attacks, threats stopped) create false confidence	Measure what got through, not what was blocked
Gaming incentives	Analysts close tickets prematurely to hit KPIs	Balance volume metrics with quality metrics
Measuring everything	200 metrics = no metrics. Attention is finite	Maximum 8-10 KPIs per audience level
Point-in-time snapshots	A single number without trend is meaningless	Always show trend (minimum 6 months)
Watermelon metrics	Green on outside, red inside — aggregate masks problems	Segment by severity, team, asset tier
Comparing unlike things	Comparing MTTR across orgs with different definitions	Standardize definitions before benchmarking
Severity inflation	Everything is critical = nothing is critical	Enforce severity criteria, audit regularly
Denominator blindness	"We fixed 1000 vulns!" but 5000 new ones appeared	Always show rates and ratios, not raw counts

Common Measurement Failures

No baseline: measuring improvement requires knowing where you started
Undefined thresholds: red/yellow/green without defined criteria is opinion, not measurement
Lagging-only programs: if you only measure what already happened, you cannot predict or prevent
Metric rot: dashboards built once and never updated become decoration, not instrumentation
Disconnected metrics: operational metrics that don't roll up to risk metrics that don't connect to business outcomes

The Goodhart's Law Warning

"When a measure becomes a target, it ceases to be a good measure."

Every metric you publish will be optimized for. If you measure MTTR, teams will game MTTR. Counterbalances:

Use metric pairs (MTTR + recurrence rate; fix rate + net new rate)
Rotate emphasis metrics periodically
Validate metrics against ground truth (red team results, real incidents)
Separate metrics used for improvement from metrics used for performance evaluation

Quick Reference: Starter Metric Set

For organizations building a security metrics program from scratch, start with these 10:

#	Metric	Level	Owner
1	Risk Exposure Trend	Strategic	CISO
2	Vulnerability MTTR by Severity	Operational	Vuln Mgmt Lead
3	SLA Adherence Rate	Operational	Vuln Mgmt Lead
4	Detection Coverage (ATT&CK)	Strategic	Detection Eng Lead
5	SOC False Positive Rate	Operational	SOC Manager
6	MTTD (from red team exercises)	Strategic	CISO
7	Patch Compliance Rate	Operational	IT Ops / Infra Lead
8	AppSec Net New Vuln Rate	Operational	AppSec Lead
9	Compliance Control Effectiveness	Strategic	GRC Lead
10	Phishing Report Rate	Operational	Awareness Lead

Add SAMM maturity scoring when the program is ready for formal maturity assessment (typically 12-18 months after program establishment).

Sources

FIRST CVSS v4.0 Specification Document — https://www.first.org/cvss/v4.0/specification-document
FIRST EPSS Model Documentation — https://www.first.org/epss/ and https://www.first.org/epss/model
OWASP Vulnerability Disclosure Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Vulnerability_Disclosure_Cheat_Sheet.html
OWASP SAMM v2.0 — https://owaspsamm.org/model/
NIST Cybersecurity Framework 2.0 — https://www.nist.gov/cyberframework
Risk Measurement (magoo) — https://magoo.github.io/risk-measurement
Netflix Sketchy (deprecated) — Visual URL threat detection for SOC automation context

Security Metrics, KPIs & Measurement — Deep Dive

CIPHER Training Module — Security Program Measurement & Executive Reporting Generated: 2026-03-14

Foundations: Why Measure Security
Vulnerability Management Metrics
SOC & Detection Metrics
Application Security Metrics
Risk Metrics & Quantification
Compliance Metrics
Detection Engineering Metrics
Security Awareness Metrics
OWASP SAMM Maturity Scoring
Executive Dashboard Design
Scoring Systems Deep Dive: CVSS v4.0 & EPSS
Anti-Patterns & Pitfalls

1. Foundations: Why Measure Security

Security measurement serves three purposes: operational improvement, risk communication, and resource justification. Metrics without these anchors are vanity metrics.

The Measurement Hierarchy

Level 4: Business Risk Outcomes     (Board / C-Suite)
Level 3: Program Effectiveness      (CISO / VP)
Level 2: Operational Efficiency      (Directors / Managers)
Level 1: Activity & Volume           (Team Leads / Analysts)

Cardinal rule: every Level 1 metric must roll up into a Level 3 or 4 narrative, or it should not exist. Activity metrics (scans run, tickets opened) are inputs, not outcomes.

Metric Design Principles

Actionable: if the number moves, someone knows what to do
Comparable: consistent measurement over time enables trend analysis
Contextual: raw numbers without baselines are meaningless
Owned: every metric has a single accountable owner
Lagging vs. Leading: track both — lagging confirms reality, leading predicts it

NIST CSF 2.0 as Measurement Backbone

NIST CSF 2.0 (February 2024) provides the structural framework for organizing security metrics across six core functions:

Function	Measurement Focus
Govern (new in 2.0)	Policy coverage, risk appetite adherence, program maturity, board reporting cadence
Identify	Asset inventory completeness, risk assessment currency, data classification coverage
Protect	Control implementation rate, access review completion, training coverage
Detect	MTTD, detection coverage by ATT&CK, alert fidelity, log source coverage
Respond	MTTR, containment time, playbook execution rate, communication SLAs
Recover	RTO/RPO achievement, backup test success rate, service restoration time

2. Vulnerability Management Metrics

Core KPIs

Mean Time to Detect (MTTD)

Time from vulnerability introduction (or public disclosure) to organizational awareness.

MTTD = Avg(Detection_Timestamp - Disclosure_Timestamp)

Segment by:

Discovery method (scanner, pentest, bug bounty, vendor advisory, OSINT)
Asset criticality tier
Vulnerability severity

Target benchmarks:

Critical CVEs: < 24 hours from NVD publication
Scanner-detectable: < scan interval + 4 hours processing
Zero-days: measured against threat intel feed latency

Mean Time to Remediate (MTTR)

Time from detection to verified remediation.

MTTR = Avg(Remediation_Verified_Timestamp - Detection_Timestamp)

SLA tiers (industry-standard starting points, adjust per risk appetite):

Severity	CVSS-BT Range	SLA Target	Stretch Goal
Critical	9.0-10.0	15 days	7 days
High	7.0-8.9	30 days	15 days
Medium	4.0-6.9	90 days	45 days
Low	0.1-3.9	180 days	90 days

SLA Adherence Rate:

SLA_Adherence = (Vulns_Remediated_Within_SLA / Total_Vulns_Due) * 100

Track this as a trend line, not a point-in-time number. A declining SLA adherence rate is the leading indicator that the vulnerability program is drowning.

Patch Compliance Rate

Patch_Compliance = (Systems_Patched_Within_SLA / Total_Systems_Requiring_Patch) * 100

Segment by:

OS type (Windows, Linux, macOS, firmware)
Environment (production, staging, development, OT/ICS)
Business unit / asset owner
Patch type (security, feature, emergency)

Risk-Adjusted Vulnerability Backlog

Raw backlog counts are misleading. Weight by exploitability and asset criticality:

Risk_Score = CVSS-BT_Score * Asset_Criticality_Weight * EPSS_Probability

Where Asset_Criticality_Weight:

Tier 1 (crown jewels): 3.0x
Tier 2 (business critical): 2.0x
Tier 3 (standard): 1.0x
Tier 4 (low impact): 0.5x

Track risk-weighted backlog as a single number over time. The goal is risk reduction, not ticket count reduction.

Vulnerability Recurrence Rate

Recurrence_Rate = (Vulns_Reopened_or_Reintroduced / Total_Vulns_Closed) * 100

A high recurrence rate indicates systemic issues: missing root cause analysis, incomplete patching, or configuration drift. This metric separates teams that fix symptoms from teams that fix problems.

Vulnerability Disclosure Metrics

Per OWASP Vulnerability Disclosure Cheat Sheet guidance:

Metric	Description	Target
Acknowledgment Time	Time to acknowledge researcher report	< 1 business day
Triage Time	Time to confirm/deny vulnerability	< 5 business days
Fix Timeline Communication	Time to provide researcher with fix ETA	< 10 business days
Disclosure Window	Time from report to public disclosure	90 days (Project Zero standard)
Researcher Satisfaction	NPS or survey score from reporters	> 70 NPS

Advanced: EPSS-Informed Prioritization

Replace severity-only triage with probability-weighted prioritization:

Strategy	Effort (% of vulns actioned)	Coverage (exploited vulns caught)	Efficiency (precision)
CVSS >= 7	57.4%	82.2%	3.96%
EPSS >= 10%	2.7%	63.2%	65.2%
EPSS >= 1% + CVSS >= 7	~15%	~85%	~15%

Practical guidance: EPSS explicitly rejects universal thresholds. Organizations must select thresholds matching their risk tolerance:

Resource-constrained teams: higher thresholds (e.g., EPSS >= 50%) for maximum efficiency per remediation dollar
Mission-critical environments: lower thresholds (e.g., EPSS >= 1-5%) accepting higher effort for broader coverage
Optimal: combine EPSS probability with CVSS severity and asset criticality for a composite risk score

3. SOC & Detection Metrics

Operational KPIs

Alert Volume & Triage Rate

Daily_Alert_Volume = Total alerts generated per 24-hour period
Triage_Rate = Alerts_Triaged / Total_Alerts * 100

Track by:

Source (SIEM, EDR, NDR, cloud, identity)
Severity
Shift/analyst
Disposition (TP, FP, benign-TP, inconclusive)

Healthy range: analysts should triage 15-25 alerts per shift (8 hours) with adequate investigation depth. If volume exceeds this, detection tuning is the fix — not more analysts.

False Positive Rate

FP_Rate = False_Positives / (True_Positives + False_Positives) * 100

Target: < 30% across the detection stack. Rules with > 80% FP rate should be disabled, tuned, or replaced.

Track FP rate per rule/use case. Aggregate FP rate masks individual rule problems. A SOC with 25% aggregate FP rate might have 5 rules at 95% FP generating 40% of all alerts.

Mean Time to Detect (MTTD) — SOC Context

MTTD = Avg(Alert_Timestamp - Compromise_Timestamp)

This is the hardest SOC metric to measure honestly because compromise timestamp is often unknown until post-incident forensics. Proxies:

Time from red team action to detection (during exercises)
Time from threat intel IOC publication to detection rule deployment
Dwell time from incident investigations (retrospective)

Industry benchmark: median dwell time is ~10 days (Mandiant M-Trends 2025), down from 16 days in 2023. Organizations with mature detection programs target < 24 hours for priority TTPs.

Mean Time to Respond (MTTR) — SOC Context

MTTR = Avg(Containment_Timestamp - Alert_Timestamp)

Segment into sub-metrics:

Time to Acknowledge: alert generated to analyst pickup
Time to Investigate: pickup to determination (TP/FP/escalation)
Time to Contain: determination to containment action executed
Time to Resolve: containment to full remediation

Sub-metric	P1 Target	P2 Target	P3 Target
Acknowledge	5 min	15 min	1 hour
Investigate	30 min	2 hours	8 hours
Contain	1 hour	4 hours	24 hours
Resolve	24 hours	72 hours	2 weeks

Analyst Productivity

Cases_Per_Analyst_Per_Month = Total_Cases_Closed / FTE_Analysts
Escalation_Rate = Cases_Escalated_to_Tier2_or_IR / Total_Cases
Automation_Rate = Cases_Auto_Resolved / Total_Cases

Log Source Coverage

Log_Coverage = Active_Log_Sources / Total_Expected_Log_Sources * 100

Map against your asset inventory. A SIEM receiving logs from 60% of production systems has a 40% blind spot. Track:

Coverage by asset tier (crown jewels must be 100%)
Coverage by log type (authentication, process execution, network, file, cloud API)
Log latency (time from event to SIEM indexing)
Log completeness (are you getting ALL event types, or just a subset?)

4. Application Security Metrics

Vulnerability Density

Vuln_Density = Vulnerabilities / KLOC (thousands of lines of code)

Or per application:

App_Vuln_Density = Open_Vulnerabilities / Application_Count

Segment by:

Severity (critical/high vs. medium/low)
Vulnerability class (injection, auth, crypto, config)
Age (< 30 days, 30-90, 90-180, > 180)
Source (SAST, DAST, SCA, pentest, bug bounty)

Benchmark: mature programs target < 1 critical/high per 10 KLOC for new code.

Fix Rate & Velocity

Fix_Rate = Vulns_Fixed_This_Period / Vulns_Open_Start_of_Period * 100
Net_New_Rate = New_Vulns_Introduced / Vulns_Fixed

SAST/DAST Coverage

SAST_Coverage = Repos_With_SAST_Enabled / Total_Active_Repos * 100
DAST_Coverage = Apps_With_DAST_Scans / Total_Deployed_Apps * 100
Pipeline_Integration = Pipelines_With_Security_Gates / Total_CI_CD_Pipelines * 100

Dependency Health

Dependency_Age = Avg(Current_Date - Latest_Dependency_Release_Date)
Vulnerable_Dependencies = Deps_With_Known_CVEs / Total_Dependencies * 100
Direct_vs_Transitive = Vulnerable_Transitive_Deps / Total_Vulnerable_Deps * 100

Security Debt

Security_Debt_Days = Sum(Estimated_Remediation_Hours_Per_Vuln) for all open vulns
Debt_Ratio = Security_Debt_Days / Total_Development_Capacity_Days

5. Risk Metrics & Quantification

Quantitative Risk Measurement

Per the risk-measurement framework (magoo/risk-measurement), effective security risk measurement replaces subjective heat maps with calibrated probability estimates.

Key Quantitative Approaches

FAIR (Factor Analysis of Information Risk):

Decomposes risk into Loss Event Frequency (LEF) and Loss Magnitude (LM)
LEF = Threat Event Frequency * Vulnerability (probability of successful attack)
LM = Primary Loss + Secondary Loss (regulatory fines, reputation damage)
Uses Monte Carlo simulation to produce loss distribution curves
Output: "There is a 90% probability that annual losses from this risk scenario will be between $500K and $12M"

Calibrated Estimation:

Experts provide 90% confidence intervals instead of point estimates
Training improves calibration (most untrained estimators are overconfident)
Track estimation accuracy over time: Brier scores, calibration curves
Key KPI: % of actual outcomes falling within stated confidence intervals

Risk Exposure Metrics

Aggregate Risk Exposure

Total_Risk_Exposure = Sum(Probability_i * Impact_i) for all identified risks

Track monthly. The trend matters more than the absolute number.

Risk Reduction Rate

Risk_Reduction = (Risk_Exposure_Previous - Risk_Exposure_Current) / Risk_Exposure_Previous * 100

Attribute risk reduction to specific controls/investments. This directly answers "what did we get for the $2M we spent on security this year?"

Residual Risk

Residual_Risk = Inherent_Risk - Risk_Mitigated_by_Controls

Every risk register entry should carry:

Inherent risk score (before controls)
Control effectiveness rating (0-100%)
Residual risk score (after controls)
Risk appetite threshold

Risk Appetite Adherence

Risks_Within_Appetite = Risks_Below_Threshold / Total_Identified_Risks * 100

Risks exceeding appetite require documented acceptance with named executive owner and review date. Track:

Number of accepted risks exceeding appetite
Age of risk acceptances (stale acceptances are unmanaged risks)
Acceptance owner distribution (concentration = governance problem)

Risk Treatment Metrics

Metric	Formula	Target
Treatment Plan Completion	Plans_On_Track / Total_Plans * 100	> 85%
Risk Exception Age	Avg days since exception granted	< 180 days
Risk Assessment Currency	Assessments_Current / Total_Required * 100	> 90%
Third-Party Risk Coverage	Vendors_Assessed / Critical_Vendors * 100	100% for Tier 1

6. Compliance Metrics

Control Coverage

Control_Coverage = Controls_Implemented / Controls_Required * 100

Map against your applicable frameworks (NIST 800-53, CIS Controls, ISO 27001, SOC 2, PCI DSS). Track per framework and per control family.

Control Effectiveness

Implementation is not effectiveness. A firewall rule that exists but permits all traffic has 100% implementation and 0% effectiveness.

Control_Effectiveness = Controls_Verified_Effective / Controls_Implemented * 100

Verification methods:

Automated testing (configuration validation, policy checks)
Internal audit findings
Pentest/red team results
Incident post-mortems (did the control work when tested by a real attacker?)

Audit Metrics

Metric	Description	Target
Audit Finding Count	Open findings by severity	Trending down
Finding Remediation Rate	Findings closed within SLA	> 90%
Repeat Findings	Same finding across consecutive audits	0
Days to Remediate	Avg time from finding to closure	< 90 days for high
Evidence Collection Time	Time to produce audit evidence	< 2 days per request
Audit Readiness Score	Pre-audit self-assessment	> 85%

Regulatory Compliance Posture

For regulated industries, track:

Regulatory_Readiness = (Controls_Meeting_Requirement / Total_Regulatory_Requirements) * 100

Per regulation (GDPR, HIPAA, PCI DSS, SOX, etc.):

Requirements mapped to controls
Control evidence freshness
Gap count and severity
Remediation timeline for gaps
Regulatory examination findings (if applicable)

7. Detection Engineering Metrics

ATT&CK Coverage

ATT&CK_Coverage = Techniques_With_Detection / Total_Applicable_Techniques * 100

Do not aim for 100%. Not all techniques are equally relevant to your environment. Weight by:

Threat intelligence (what TTPs do your likely adversaries use?)
Environment applicability (T1546.015 COM hijacking is irrelevant in a Linux-only shop)
Detection feasibility (some techniques are inherently difficult to detect)

Coverage Depth Score

For each covered technique, assess detection quality:

Level	Description	Score
0	No detection	0
1	Log visibility exists but no rule	1
2	Detection rule exists, not validated	2
3	Rule validated against simulated attack	3
4	Rule tuned with known FP patterns documented	4
5	Rule integrated into automated response	5

Coverage_Depth = Sum(Technique_Scores) / (Total_Applicable_Techniques * 5) * 100

Rule Performance Metrics

Per detection rule:

Metric	Formula	Healthy Range
True Positive Rate	TP / (TP + FN)	> 70%
Precision	TP / (TP + FP)	> 50%
Alert Volume	Alerts per day/week	Manageable by team
Time to Triage	Avg investigation time	< 30 min for P1
Last Validated	Date of last purple team test	< 90 days
Evasion Resistance	Variants detected / variants tested	> 60%

Detection Gap Trend

Track monthly:

New_Detections_Added - Detections_Retired = Net_Detection_Change
Gap_Closure_Rate = Gaps_Closed / Gaps_Identified * 100

Map gaps against threat intelligence. A gap for a TTP your adversaries actively use is critical. A gap for a theoretical technique nobody targets your industry with is informational.

Detection-as-Code Metrics

Metric	Description	Target
Rules in Version Control	% of rules managed in Git	100%
Rules with Tests	Rules with automated validation	> 80%
Deployment Automation	Rules auto-deployed via CI/CD	> 90%
Rule Review Cadence	Rules reviewed/updated per quarter	100% per year
Mean Time to Deploy	Rule creation to production	< 4 hours for priority

8. Security Awareness Metrics

Phishing Simulation Metrics

Metric	Formula	Target
Click Rate	Users_Clicked / Users_Targeted * 100	< 5%
Report Rate	Users_Reporting / Users_Targeted * 100	> 70%
Click-to-Report Ratio	Reports / Clicks	> 3:1
Repeat Clickers	Users_Clicking_Multiple_Campaigns / Total_Clickers	< 10%
Credential Submission Rate	Users_Submitting_Creds / Users_Clicked * 100	< 20% of clickers
Time to First Report	Fastest report after send	< 2 min

Training Metrics

Metric	Description	Target
Completion Rate	Users completing required training	> 95%
On-Time Completion	Users completing before deadline	> 90%
Knowledge Assessment Score	Post-training quiz scores	> 80% avg
Knowledge Retention	Score on re-test after 6 months	> 70%
Behavior Change	Reduction in risky behaviors post-training	Measurable improvement

Awareness Program Effectiveness

The true measure is behavior change, not training completion:

Reduction in security incidents caused by human error
Increase in suspicious activity reports from employees
Decrease in policy violations (USB usage, shadow IT, data handling)
Improvement in secure development practices (for technical staff)

9. OWASP SAMM Maturity Scoring

Model Structure

OWASP SAMM (Software Assurance Maturity Model) organizes application security into 5 business functions with 15 security practices:

[CONFIRMED] — SAMM 2.0 structure. Source: owaspsamm.org/model.

Business Function	Security Practices
Governance	Strategy & Metrics, Policy & Compliance, Education & Guidance
Design	Threat Assessment, Security Requirements, Secure Architecture
Implementation	Secure Build, Secure Deployment, Defect Management
Verification	Architecture Assessment, Requirements-driven Testing, Security Testing
Operations	Incident Management, Environment Management, Operational Management

Maturity Levels

Each practice has 3 maturity levels:

Level	Characterization	Typical State
1	Initial / Ad-hoc	Basic practices exist, inconsistently applied. Individuals doing security work without organizational support.
2	Managed / Defined	Practices are documented, consistent, and organization-wide. Processes exist and are followed.
3	Optimized / Measured	Continuous improvement based on metrics. Practices are automated, measured, and feed back into program improvement.

Scoring Methodology

Each practice is scored 0-3 based on maturity level achieved. The organization's overall SAMM score is the average across all 15 practices:

SAMM_Score = Sum(Practice_Scores) / 15

Individual function scores:

Function_Score = Sum(Function_Practice_Scores) / 3

Assessment Approach

Scope: define which applications/business units to assess
Evaluate: for each practice, determine current maturity level using SAMM's activity questions
Score: assign 0-3 per practice based on evidence
Target: set target maturity per practice (not everything needs to be level 3)
Roadmap: prioritize improvement activities based on gap between current and target
Reassess: periodic reassessment (annually recommended) to measure progress

SAMM Metrics Integration

Each practice at Level 3 implies metrics-driven management. Key metrics per function:

Function	Key Metrics at Maturity
Governance	Security budget as % of IT spend, training coverage, policy review cadence
Design	% of projects with threat models, security requirements coverage
Implementation	Secure build pipeline adoption, dependency vulnerability rate, defect fix rate
Verification	% of apps with security testing, finding density trends, architecture review coverage
Operations	Incident response time, environment hardening score, patching compliance

Using SAMM for Program Measurement

Track SAMM scores over time as a program maturity indicator:

Quarter    Governance  Design  Implementation  Verification  Operations  Overall
Q1 2025    1.3         0.7     1.0             0.7           1.0         0.9
Q2 2025    1.7         1.0     1.3             1.0           1.0         1.2
Q3 2025    2.0         1.3     1.7             1.3           1.3         1.5
Q4 2025    2.0         1.7     2.0             1.7           1.7         1.8

This provides a defensible, industry-standard maturity narrative for executive reporting.

10. Executive Dashboard Design

What to Show

The Executive Security Dashboard (1 page)

Section 1: Risk Posture (top of page, most prominent)

Overall risk exposure trend (last 12 months)
Risk appetite adherence: % of risks within appetite
Top 5 risks with owner and status
Risk reduction attributed to security investments

Section 2: Threat Landscape (contextualizes the risk)

Active threats relevant to the organization (from threat intel)
Incidents this period: count, severity, business impact
Near misses / blocked attacks (demonstrates value)

Section 3: Program Health (3-5 key operational metrics)

Vulnerability MTTR trend (are we getting faster?)
Detection coverage score (ATT&CK-based)
SLA adherence for critical/high vulnerabilities
Compliance posture across applicable frameworks
SAMM maturity score trend

Section 4: Investment & Capacity

Security spend vs. industry benchmark
Key initiative status (on track / at risk / blocked)
Staffing: current FTEs, open positions, attrition

Design Principles

Trend lines over point-in-time numbers: a single number is meaningless without context. Show 6-12 months of trend.
Red/yellow/green with thresholds: define what red means BEFORE building the dashboard. If everything is always green, thresholds are wrong.
Comparison baselines: compare to last quarter, last year, and industry benchmarks where available.
Narrative, not just numbers: each metric needs a one-sentence "so what?" annotation.
Drill-down capability: executives see the top level; managers can drill into operational detail.

What NOT to Show

Avoid	Why	Show Instead
Raw alert volume	Meaningless without context; bigger number != more secure	Alert-to-incident ratio, FP rate trend
Total vulnerabilities found	Penalizes organizations that scan more	Vulnerability density, MTTR, net new rate
Scan count / tool inventory	Activity, not outcome	Coverage percentage, gap analysis
Compliance checklist completion	Checkbox security; presence != effectiveness	Control effectiveness rate, audit finding trend
Vanity metrics (phishing emails blocked)	Inflated numbers that mean nothing	Phishing click rate trend, report rate
Technical jargon	Executives don't care about Sigma rules	"We can detect X% of techniques used by threat actors targeting our industry"
Too many metrics	Dilutes attention	Maximum 8-10 KPIs per dashboard

Reporting Cadence

Audience	Cadence	Format	Depth
Board of Directors	Quarterly	3-5 slides	Strategic risk, major incidents, program maturity
C-Suite / ELT	Monthly	1-page dashboard + narrative	Risk posture, program health, investment ROI
VP / Directors	Weekly	Operational dashboard	Team metrics, SLA adherence, capacity
Team Leads	Daily	Automated dashboards	Alert queues, backlog, sprint progress

11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS

CVSS v4.0 — Key Changes from v3.1

[CONFIRMED] — Source: FIRST CVSS v4.0 Specification Document.

Metric Groups

Group	Purpose	Affects Score?
Base	Intrinsic vulnerability characteristics, constant over time	Yes
Threat	Current exploit status; replaces v3.1 Temporal	Yes
Environmental	Organization-specific context, compensating controls	Yes
Supplemental	Additional context for prioritization	No

New Nomenclature (Mandatory)

Label	Metrics Included	Use Case
CVSS-B	Base only	NVD-published scores
CVSS-BT	Base + Threat	Score adjusted for exploit availability
CVSS-BE	Base + Environmental	Score adjusted for org-specific context
CVSS-BTE	Base + Threat + Environmental	Fully contextualized score

Critical implication: when someone says "CVSS 9.1" you must ask "CVSS-B, BT, BE, or BTE?" A CVSS-B 9.1 is very different from a CVSS-BTE 9.1.

New Metrics

Metric	Group	Values	Purpose
Attack Requirements (AT)	Base	None / Present	Deployment conditions beyond security hardening (race conditions, network positioning)
Automatable (AU)	Supplemental	No / Yes	Can attacker automate all kill chain steps?
Provider Urgency (U)	Supplemental	Red / Amber / Green / Clear	Vendor's severity assessment
Recovery (R)	Supplemental	Automatic / User / Irrecoverable	System resilience post-exploitation
Value Density (V)	Supplemental	Diffuse / Concentrated	Resource concentration per exploit
Vulnerability Response Effort (RE)	Supplemental	Low / Moderate / High	Remediation difficulty for consumers
Safety (S)	Supplemental + Environmental	Negligible / Present	Human injury risk (IEC 61508)

Scoring Methodology Change

v4.0 replaces v3.1's linear formula with a MacroVector equivalence class system:

Vectors cluster into MacroVectors (equivalence sets of comparable qualitative severity)
Six equivalence groups (EQ1-EQ6) determined through expert evaluation
Score = MacroVector lookup score, interpolated by "severity distance" within the class
Produces scores rounded to one decimal place

Practical impact: v4.0 scores are NOT directly comparable to v3.1 scores. Organizations transitioning must re-baseline their SLA thresholds.

EPSS — Exploit Prediction Scoring System

[CONFIRMED] — Source: FIRST EPSS Model Documentation.

What EPSS Measures

Daily probability estimate that a published CVE will see exploitation activity in the next 30 days.

Model Inputs

Category	Sources
Vulnerability metadata	CPE, CWE, CVSS vectors (via NVD)
Temporal signals	Days since CVE publication
Known exploitation	CISA KEV, Google Project Zero, Zero Day Initiative
Public exploits	Exploit-DB, GitHub, Metasploit
Security tools	Nuclei, Intrigue, sn1per, jaeles templates
Exploitation evidence	Honeypots, IDS/IPS sensors, host-based detection (from data partners)

Model Methodology

Trains on 12 months of historical data
Validates against 2 months of unseen "future" data
Daily refresh of all probability estimates
Measures attempted exploitation (not successful exploitation)
Recognizes exploitation is "bursty, sporadic, sometimes isolated, localized and ephemeral"

Using CVSS + EPSS Together

The optimal vulnerability prioritization strategy combines both:

Priority_Score = f(CVSS-BTE_Score, EPSS_Probability, Asset_Criticality)

Decision matrix:

EPSS	CVSS High (>=7)	CVSS Low (<7)
High (>=10%)	Immediate action	Investigate — likely exploited despite low severity
Low (<10%)	Standard SLA — severe but unlikely to be exploited	Backlog — lowest priority

12. Anti-Patterns & Pitfalls

Metric Anti-Patterns

Anti-Pattern	Why It's Harmful	Alternative
Measuring activity, not outcomes	"We ran 500 scans" says nothing about security posture	Measure coverage, finding trends, MTTR
Vanity metrics	Inflated numbers (blocked attacks, threats stopped) create false confidence	Measure what got through, not what was blocked
Gaming incentives	Analysts close tickets prematurely to hit KPIs	Balance volume metrics with quality metrics
Measuring everything	200 metrics = no metrics. Attention is finite	Maximum 8-10 KPIs per audience level
Point-in-time snapshots	A single number without trend is meaningless	Always show trend (minimum 6 months)
Watermelon metrics	Green on outside, red inside — aggregate masks problems	Segment by severity, team, asset tier
Comparing unlike things	Comparing MTTR across orgs with different definitions	Standardize definitions before benchmarking
Severity inflation	Everything is critical = nothing is critical	Enforce severity criteria, audit regularly
Denominator blindness	"We fixed 1000 vulns!" but 5000 new ones appeared	Always show rates and ratios, not raw counts

Common Measurement Failures

No baseline: measuring improvement requires knowing where you started
Undefined thresholds: red/yellow/green without defined criteria is opinion, not measurement
Lagging-only programs: if you only measure what already happened, you cannot predict or prevent
Metric rot: dashboards built once and never updated become decoration, not instrumentation
Disconnected metrics: operational metrics that don't roll up to risk metrics that don't connect to business outcomes

The Goodhart's Law Warning

"When a measure becomes a target, it ceases to be a good measure."

Every metric you publish will be optimized for. If you measure MTTR, teams will game MTTR. Counterbalances:

Use metric pairs (MTTR + recurrence rate; fix rate + net new rate)
Rotate emphasis metrics periodically
Validate metrics against ground truth (red team results, real incidents)
Separate metrics used for improvement from metrics used for performance evaluation

Quick Reference: Starter Metric Set

For organizations building a security metrics program from scratch, start with these 10:

#	Metric	Level	Owner
1	Risk Exposure Trend	Strategic	CISO
2	Vulnerability MTTR by Severity	Operational	Vuln Mgmt Lead
3	SLA Adherence Rate	Operational	Vuln Mgmt Lead
4	Detection Coverage (ATT&CK)	Strategic	Detection Eng Lead
5	SOC False Positive Rate	Operational	SOC Manager
6	MTTD (from red team exercises)	Strategic	CISO
7	Patch Compliance Rate	Operational	IT Ops / Infra Lead
8	AppSec Net New Vuln Rate	Operational	AppSec Lead
9	Compliance Control Effectiveness	Strategic	GRC Lead
10	Phishing Report Rate	Operational	Awareness Lead

Add SAMM maturity scoring when the program is ready for formal maturity assessment (typically 12-18 months after program establishment).

Sources

FIRST CVSS v4.0 Specification Document — https://www.first.org/cvss/v4.0/specification-document
FIRST EPSS Model Documentation — https://www.first.org/epss/ and https://www.first.org/epss/model
OWASP Vulnerability Disclosure Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Vulnerability_Disclosure_Cheat_Sheet.html
OWASP SAMM v2.0 — https://owaspsamm.org/model/
NIST Cybersecurity Framework 2.0 — https://www.nist.gov/cyberframework
Risk Measurement (magoo) — https://magoo.github.io/risk-measurement
Netflix Sketchy (deprecated) — Visual URL threat detection for SOC automation context

Security Metrics, KPIs & Measurement — Deep Dive

Security Metrics, KPIs & Measurement — Deep Dive

Table of Contents

1. Foundations: Why Measure Security

The Measurement Hierarchy

Metric Design Principles

NIST CSF 2.0 as Measurement Backbone

2. Vulnerability Management Metrics

Core KPIs

Mean Time to Detect (MTTD)

Mean Time to Remediate (MTTR)

Patch Compliance Rate

Risk-Adjusted Vulnerability Backlog

Vulnerability Recurrence Rate

Vulnerability Disclosure Metrics

Advanced: EPSS-Informed Prioritization

3. SOC & Detection Metrics

Operational KPIs

Alert Volume & Triage Rate

False Positive Rate

Mean Time to Detect (MTTD) — SOC Context

Mean Time to Respond (MTTR) — SOC Context

Analyst Productivity

Log Source Coverage

4. Application Security Metrics

Vulnerability Density

Fix Rate & Velocity

SAST/DAST Coverage

Dependency Health

Security Debt

5. Risk Metrics & Quantification

Quantitative Risk Measurement

Key Quantitative Approaches

Risk Exposure Metrics

Aggregate Risk Exposure

Risk Reduction Rate

Residual Risk

Risk Appetite Adherence

Risk Treatment Metrics

6. Compliance Metrics

Control Coverage

Control Effectiveness

Audit Metrics

Regulatory Compliance Posture

7. Detection Engineering Metrics

ATT&CK Coverage

Coverage Depth Score

Rule Performance Metrics

Detection Gap Trend

Detection-as-Code Metrics

8. Security Awareness Metrics

Phishing Simulation Metrics

Training Metrics

Awareness Program Effectiveness

9. OWASP SAMM Maturity Scoring

Model Structure

Maturity Levels

Scoring Methodology

Assessment Approach

SAMM Metrics Integration

Using SAMM for Program Measurement

10. Executive Dashboard Design

What to Show

The Executive Security Dashboard (1 page)

Design Principles

What NOT to Show

Reporting Cadence

11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS

CVSS v4.0 — Key Changes from v3.1

Metric Groups

New Nomenclature (Mandatory)

New Metrics

Scoring Methodology Change

EPSS — Exploit Prediction Scoring System

What EPSS Measures

Model Inputs

Model Methodology

Using CVSS + EPSS Together

12. Anti-Patterns & Pitfalls

Metric Anti-Patterns