BT
Privacy ToolboxJournalProjectsResumeBookmarks
Feed
Privacy Toolbox
Journal
Projects
Resume
Bookmarks
Intel
CIPHER
Threat Actors
Privacy Threats
Dashboard
CVEs
Tags
Intel
CIPHERThreat ActorsPrivacy ThreatsDashboardCVEsTags

Intel

  • Feed
  • Threat Actors
  • Privacy Threats
  • Dashboard
  • Privacy Toolbox
  • CVEs

Personal

  • Journal
  • Projects

Resources

  • Subscribe
  • Bookmarks
  • Developers
  • Tags
Cybersecurity News & Analysis
github
defconxt
•
© 2026
•
blacktemple.net
  • Compliance
  • GRC & Risk
  • Security Metrics
  • Security Leadership
  • Compliance
  • GRC & Risk
  • Security Metrics
  • Security Leadership
  1. CIPHER
  2. /Governance
  3. /Security Metrics, KPIs & Measurement — Deep Dive

Security Metrics, KPIs & Measurement — Deep Dive

Security Metrics, KPIs & Measurement — Deep Dive

CIPHER Training Module — Security Program Measurement & Executive Reporting Generated: 2026-03-14


Table of Contents

  1. Foundations: Why Measure Security
  2. Vulnerability Management Metrics
  3. SOC & Detection Metrics
  4. Application Security Metrics
  5. Risk Metrics & Quantification
  6. Compliance Metrics
  7. Detection Engineering Metrics
  8. Security Awareness Metrics
  9. OWASP SAMM Maturity Scoring
  10. Executive Dashboard Design
  11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS
  12. Anti-Patterns & Pitfalls

1. Foundations: Why Measure Security

Security measurement serves three purposes: operational improvement, risk communication, and resource justification. Metrics without these anchors are vanity metrics.

The Measurement Hierarchy

Level 4: Business Risk Outcomes     (Board / C-Suite)
Level 3: Program Effectiveness      (CISO / VP)
Level 2: Operational Efficiency      (Directors / Managers)
Level 1: Activity & Volume           (Team Leads / Analysts)

Cardinal rule: every Level 1 metric must roll up into a Level 3 or 4 narrative, or it should not exist. Activity metrics (scans run, tickets opened) are inputs, not outcomes.

Metric Design Principles

  • Actionable: if the number moves, someone knows what to do
  • Comparable: consistent measurement over time enables trend analysis
  • Contextual: raw numbers without baselines are meaningless
  • Owned: every metric has a single accountable owner
  • Lagging vs. Leading: track both — lagging confirms reality, leading predicts it

NIST CSF 2.0 as Measurement Backbone

NIST CSF 2.0 (February 2024) provides the structural framework for organizing security metrics across six core functions:

Function Measurement Focus
Govern (new in 2.0) Policy coverage, risk appetite adherence, program maturity, board reporting cadence
Identify Asset inventory completeness, risk assessment currency, data classification coverage
Protect Control implementation rate, access review completion, training coverage
Detect MTTD, detection coverage by ATT&CK, alert fidelity, log source coverage
Respond MTTR, containment time, playbook execution rate, communication SLAs
Recover RTO/RPO achievement, backup test success rate, service restoration time

CSF Implementation Tiers (1-4: Partial, Risk Informed, Repeatable, Adaptive) provide maturity scoring across each function. Organizations create Current and Target Profiles, and the gap between them becomes the measurement target.

[CONFIRMED] — CSF 2.0's addition of the Govern function reflects the industry shift toward treating cybersecurity as a governance concern, not purely a technical one. Source: NIST CSF 2.0, February 2024.


2. Vulnerability Management Metrics

Core KPIs

Mean Time to Detect (MTTD)

Time from vulnerability introduction (or public disclosure) to organizational awareness.

MTTD = Avg(Detection_Timestamp - Disclosure_Timestamp)

Segment by:

  • Discovery method (scanner, pentest, bug bounty, vendor advisory, OSINT)
  • Asset criticality tier
  • Vulnerability severity

Target benchmarks:

  • Critical CVEs: < 24 hours from NVD publication
  • Scanner-detectable: < scan interval + 4 hours processing
  • Zero-days: measured against threat intel feed latency

Mean Time to Remediate (MTTR)

Time from detection to verified remediation.

MTTR = Avg(Remediation_Verified_Timestamp - Detection_Timestamp)

SLA tiers (industry-standard starting points, adjust per risk appetite):

Severity CVSS-BT Range SLA Target Stretch Goal
Critical 9.0-10.0 15 days 7 days
High 7.0-8.9 30 days 15 days
Medium 4.0-6.9 90 days 45 days
Low 0.1-3.9 180 days 90 days

SLA Adherence Rate:

SLA_Adherence = (Vulns_Remediated_Within_SLA / Total_Vulns_Due) * 100

Track this as a trend line, not a point-in-time number. A declining SLA adherence rate is the leading indicator that the vulnerability program is drowning.

Patch Compliance Rate

Patch_Compliance = (Systems_Patched_Within_SLA / Total_Systems_Requiring_Patch) * 100

Segment by:

  • OS type (Windows, Linux, macOS, firmware)
  • Environment (production, staging, development, OT/ICS)
  • Business unit / asset owner
  • Patch type (security, feature, emergency)

Risk-Adjusted Vulnerability Backlog

Raw backlog counts are misleading. Weight by exploitability and asset criticality:

Risk_Score = CVSS-BT_Score * Asset_Criticality_Weight * EPSS_Probability

Where Asset_Criticality_Weight:

  • Tier 1 (crown jewels): 3.0x
  • Tier 2 (business critical): 2.0x
  • Tier 3 (standard): 1.0x
  • Tier 4 (low impact): 0.5x

Track risk-weighted backlog as a single number over time. The goal is risk reduction, not ticket count reduction.

Vulnerability Recurrence Rate

Recurrence_Rate = (Vulns_Reopened_or_Reintroduced / Total_Vulns_Closed) * 100

A high recurrence rate indicates systemic issues: missing root cause analysis, incomplete patching, or configuration drift. This metric separates teams that fix symptoms from teams that fix problems.

Vulnerability Disclosure Metrics

Per OWASP Vulnerability Disclosure Cheat Sheet guidance:

Metric Description Target
Acknowledgment Time Time to acknowledge researcher report < 1 business day
Triage Time Time to confirm/deny vulnerability < 5 business days
Fix Timeline Communication Time to provide researcher with fix ETA < 10 business days
Disclosure Window Time from report to public disclosure 90 days (Project Zero standard)
Researcher Satisfaction NPS or survey score from reporters > 70 NPS

[CONFIRMED] — Google Project Zero's 90-day disclosure standard has become the de facto industry benchmark. Organizations without a defined disclosure policy face uncoordinated disclosure risk. Source: OWASP Vulnerability Disclosure Cheat Sheet.

Advanced: EPSS-Informed Prioritization

Replace severity-only triage with probability-weighted prioritization:

Strategy Effort (% of vulns actioned) Coverage (exploited vulns caught) Efficiency (precision)
CVSS >= 7 57.4% 82.2% 3.96%
EPSS >= 10% 2.7% 63.2% 65.2%
EPSS >= 1% + CVSS >= 7 ~15% ~85% ~15%

[CONFIRMED] — EPSS data from October 2023 demonstrates that CVSS-only prioritization forces teams to action 57% of all vulnerabilities while achieving only 4% efficiency. EPSS at 10% threshold reduces effort to 2.7% with 65% efficiency. Source: FIRST EPSS model documentation.

Practical guidance: EPSS explicitly rejects universal thresholds. Organizations must select thresholds matching their risk tolerance:

  • Resource-constrained teams: higher thresholds (e.g., EPSS >= 50%) for maximum efficiency per remediation dollar
  • Mission-critical environments: lower thresholds (e.g., EPSS >= 1-5%) accepting higher effort for broader coverage
  • Optimal: combine EPSS probability with CVSS severity and asset criticality for a composite risk score

3. SOC & Detection Metrics

Operational KPIs

Alert Volume & Triage Rate

Daily_Alert_Volume = Total alerts generated per 24-hour period
Triage_Rate = Alerts_Triaged / Total_Alerts * 100

Track by:

  • Source (SIEM, EDR, NDR, cloud, identity)
  • Severity
  • Shift/analyst
  • Disposition (TP, FP, benign-TP, inconclusive)

Healthy range: analysts should triage 15-25 alerts per shift (8 hours) with adequate investigation depth. If volume exceeds this, detection tuning is the fix — not more analysts.

False Positive Rate

FP_Rate = False_Positives / (True_Positives + False_Positives) * 100

Target: < 30% across the detection stack. Rules with > 80% FP rate should be disabled, tuned, or replaced.

Track FP rate per rule/use case. Aggregate FP rate masks individual rule problems. A SOC with 25% aggregate FP rate might have 5 rules at 95% FP generating 40% of all alerts.

Mean Time to Detect (MTTD) — SOC Context

MTTD = Avg(Alert_Timestamp - Compromise_Timestamp)

This is the hardest SOC metric to measure honestly because compromise timestamp is often unknown until post-incident forensics. Proxies:

  • Time from red team action to detection (during exercises)
  • Time from threat intel IOC publication to detection rule deployment
  • Dwell time from incident investigations (retrospective)

Industry benchmark: median dwell time is ~10 days (Mandiant M-Trends 2025), down from 16 days in 2023. Organizations with mature detection programs target < 24 hours for priority TTPs.

Mean Time to Respond (MTTR) — SOC Context

MTTR = Avg(Containment_Timestamp - Alert_Timestamp)

Segment into sub-metrics:

  • Time to Acknowledge: alert generated to analyst pickup
  • Time to Investigate: pickup to determination (TP/FP/escalation)
  • Time to Contain: determination to containment action executed
  • Time to Resolve: containment to full remediation
Sub-metric P1 Target P2 Target P3 Target
Acknowledge 5 min 15 min 1 hour
Investigate 30 min 2 hours 8 hours
Contain 1 hour 4 hours 24 hours
Resolve 24 hours 72 hours 2 weeks

Analyst Productivity

Cases_Per_Analyst_Per_Month = Total_Cases_Closed / FTE_Analysts
Escalation_Rate = Cases_Escalated_to_Tier2_or_IR / Total_Cases
Automation_Rate = Cases_Auto_Resolved / Total_Cases

Warning: do not incentivize cases-closed velocity. This drives premature closure and shallow investigation. Balance with quality metrics (reopened cases, missed detections found in retrospective analysis).

Log Source Coverage

Log_Coverage = Active_Log_Sources / Total_Expected_Log_Sources * 100

Map against your asset inventory. A SIEM receiving logs from 60% of production systems has a 40% blind spot. Track:

  • Coverage by asset tier (crown jewels must be 100%)
  • Coverage by log type (authentication, process execution, network, file, cloud API)
  • Log latency (time from event to SIEM indexing)
  • Log completeness (are you getting ALL event types, or just a subset?)

4. Application Security Metrics

Vulnerability Density

Vuln_Density = Vulnerabilities / KLOC (thousands of lines of code)

Or per application:

App_Vuln_Density = Open_Vulnerabilities / Application_Count

Segment by:

  • Severity (critical/high vs. medium/low)
  • Vulnerability class (injection, auth, crypto, config)
  • Age (< 30 days, 30-90, 90-180, > 180)
  • Source (SAST, DAST, SCA, pentest, bug bounty)

Benchmark: mature programs target < 1 critical/high per 10 KLOC for new code.

Fix Rate & Velocity

Fix_Rate = Vulns_Fixed_This_Period / Vulns_Open_Start_of_Period * 100
Net_New_Rate = New_Vulns_Introduced / Vulns_Fixed

Net New Rate is the critical metric. If > 1.0, the backlog is growing. Track this weekly for active development teams. A sustained Net New Rate > 1.0 means the AppSec program is losing ground regardless of how many vulns it fixes.

SAST/DAST Coverage

SAST_Coverage = Repos_With_SAST_Enabled / Total_Active_Repos * 100
DAST_Coverage = Apps_With_DAST_Scans / Total_Deployed_Apps * 100
Pipeline_Integration = Pipelines_With_Security_Gates / Total_CI_CD_Pipelines * 100

Target: 100% SAST on all active repos, DAST on all deployed web applications. Track "scan success rate" separately — a pipeline with SAST enabled but consistently failing/skipped is not covered.

Dependency Health

Dependency_Age = Avg(Current_Date - Latest_Dependency_Release_Date)
Vulnerable_Dependencies = Deps_With_Known_CVEs / Total_Dependencies * 100
Direct_vs_Transitive = Vulnerable_Transitive_Deps / Total_Vulnerable_Deps * 100

Track dependency age as a leading indicator. Dependencies > 2 years behind latest release are significantly more likely to have unpatched vulnerabilities and harder to upgrade (breaking changes accumulate).

Security Debt

Security_Debt_Days = Sum(Estimated_Remediation_Hours_Per_Vuln) for all open vulns
Debt_Ratio = Security_Debt_Days / Total_Development_Capacity_Days

Express security debt in developer-days. This translates to language leadership understands: "We have 340 developer-days of security debt. At current allocation (2 devs), that's 170 business days — roughly 8 months of dedicated work."


5. Risk Metrics & Quantification

Quantitative Risk Measurement

Per the risk-measurement framework (magoo/risk-measurement), effective security risk measurement replaces subjective heat maps with calibrated probability estimates.

Core principle: "Risk Measurement is written to help you measure complicated risks using a process that's simple enough to work out on the back of a napkin and powerful enough to organize a rocket launch."

[CONFIRMED] — Quantitative risk analysis using calibrated estimation, probability distributions, and Monte Carlo simulation produces more defensible risk assessments than qualitative red/yellow/green matrices. Source: magoo/risk-measurement.

Key Quantitative Approaches

FAIR (Factor Analysis of Information Risk):

  • Decomposes risk into Loss Event Frequency (LEF) and Loss Magnitude (LM)
  • LEF = Threat Event Frequency * Vulnerability (probability of successful attack)
  • LM = Primary Loss + Secondary Loss (regulatory fines, reputation damage)
  • Uses Monte Carlo simulation to produce loss distribution curves
  • Output: "There is a 90% probability that annual losses from this risk scenario will be between $500K and $12M"

Calibrated Estimation:

  • Experts provide 90% confidence intervals instead of point estimates
  • Training improves calibration (most untrained estimators are overconfident)
  • Track estimation accuracy over time: Brier scores, calibration curves
  • Key KPI: % of actual outcomes falling within stated confidence intervals

Risk Exposure Metrics

Aggregate Risk Exposure

Total_Risk_Exposure = Sum(Probability_i * Impact_i) for all identified risks

Track monthly. The trend matters more than the absolute number.

Risk Reduction Rate

Risk_Reduction = (Risk_Exposure_Previous - Risk_Exposure_Current) / Risk_Exposure_Previous * 100

Attribute risk reduction to specific controls/investments. This directly answers "what did we get for the $2M we spent on security this year?"

Residual Risk

Residual_Risk = Inherent_Risk - Risk_Mitigated_by_Controls

Every risk register entry should carry:

  • Inherent risk score (before controls)
  • Control effectiveness rating (0-100%)
  • Residual risk score (after controls)
  • Risk appetite threshold

Risk Appetite Adherence

Risks_Within_Appetite = Risks_Below_Threshold / Total_Identified_Risks * 100

Risks exceeding appetite require documented acceptance with named executive owner and review date. Track:

  • Number of accepted risks exceeding appetite
  • Age of risk acceptances (stale acceptances are unmanaged risks)
  • Acceptance owner distribution (concentration = governance problem)

Risk Treatment Metrics

Metric Formula Target
Treatment Plan Completion Plans_On_Track / Total_Plans * 100 > 85%
Risk Exception Age Avg days since exception granted < 180 days
Risk Assessment Currency Assessments_Current / Total_Required * 100 > 90%
Third-Party Risk Coverage Vendors_Assessed / Critical_Vendors * 100 100% for Tier 1

6. Compliance Metrics

Control Coverage

Control_Coverage = Controls_Implemented / Controls_Required * 100

Map against your applicable frameworks (NIST 800-53, CIS Controls, ISO 27001, SOC 2, PCI DSS). Track per framework and per control family.

Control Effectiveness

Implementation is not effectiveness. A firewall rule that exists but permits all traffic has 100% implementation and 0% effectiveness.

Control_Effectiveness = Controls_Verified_Effective / Controls_Implemented * 100

Verification methods:

  • Automated testing (configuration validation, policy checks)
  • Internal audit findings
  • Pentest/red team results
  • Incident post-mortems (did the control work when tested by a real attacker?)

Audit Metrics

Metric Description Target
Audit Finding Count Open findings by severity Trending down
Finding Remediation Rate Findings closed within SLA > 90%
Repeat Findings Same finding across consecutive audits 0
Days to Remediate Avg time from finding to closure < 90 days for high
Evidence Collection Time Time to produce audit evidence < 2 days per request
Audit Readiness Score Pre-audit self-assessment > 85%

Repeat findings are the most important audit metric. A repeat finding means the organization knew about a problem, committed to fixing it, and failed. This is a governance failure, not a technical one.

Regulatory Compliance Posture

For regulated industries, track:

Regulatory_Readiness = (Controls_Meeting_Requirement / Total_Regulatory_Requirements) * 100

Per regulation (GDPR, HIPAA, PCI DSS, SOX, etc.):

  • Requirements mapped to controls
  • Control evidence freshness
  • Gap count and severity
  • Remediation timeline for gaps
  • Regulatory examination findings (if applicable)

7. Detection Engineering Metrics

ATT&CK Coverage

ATT&CK_Coverage = Techniques_With_Detection / Total_Applicable_Techniques * 100

Do not aim for 100%. Not all techniques are equally relevant to your environment. Weight by:

  • Threat intelligence (what TTPs do your likely adversaries use?)
  • Environment applicability (T1546.015 COM hijacking is irrelevant in a Linux-only shop)
  • Detection feasibility (some techniques are inherently difficult to detect)

Coverage Depth Score

For each covered technique, assess detection quality:

Level Description Score
0 No detection 0
1 Log visibility exists but no rule 1
2 Detection rule exists, not validated 2
3 Rule validated against simulated attack 3
4 Rule tuned with known FP patterns documented 4
5 Rule integrated into automated response 5
Coverage_Depth = Sum(Technique_Scores) / (Total_Applicable_Techniques * 5) * 100

Rule Performance Metrics

Per detection rule:

Metric Formula Healthy Range
True Positive Rate TP / (TP + FN) > 70%
Precision TP / (TP + FP) > 50%
Alert Volume Alerts per day/week Manageable by team
Time to Triage Avg investigation time < 30 min for P1
Last Validated Date of last purple team test < 90 days
Evasion Resistance Variants detected / variants tested > 60%

Detection Gap Trend

Track monthly:

New_Detections_Added - Detections_Retired = Net_Detection_Change
Gap_Closure_Rate = Gaps_Closed / Gaps_Identified * 100

Map gaps against threat intelligence. A gap for a TTP your adversaries actively use is critical. A gap for a theoretical technique nobody targets your industry with is informational.

Detection-as-Code Metrics

Metric Description Target
Rules in Version Control % of rules managed in Git 100%
Rules with Tests Rules with automated validation > 80%
Deployment Automation Rules auto-deployed via CI/CD > 90%
Rule Review Cadence Rules reviewed/updated per quarter 100% per year
Mean Time to Deploy Rule creation to production < 4 hours for priority

8. Security Awareness Metrics

Phishing Simulation Metrics

Metric Formula Target
Click Rate Users_Clicked / Users_Targeted * 100 < 5%
Report Rate Users_Reporting / Users_Targeted * 100 > 70%
Click-to-Report Ratio Reports / Clicks > 3:1
Repeat Clickers Users_Clicking_Multiple_Campaigns / Total_Clickers < 10%
Credential Submission Rate Users_Submitting_Creds / Users_Clicked * 100 < 20% of clickers
Time to First Report Fastest report after send < 2 min

Click rate alone is a terrible metric. A 3% click rate with 5% report rate is worse than 8% click rate with 75% report rate. The second organization has a human detection layer; the first does not.

Training Metrics

Metric Description Target
Completion Rate Users completing required training > 95%
On-Time Completion Users completing before deadline > 90%
Knowledge Assessment Score Post-training quiz scores > 80% avg
Knowledge Retention Score on re-test after 6 months > 70%
Behavior Change Reduction in risky behaviors post-training Measurable improvement

Awareness Program Effectiveness

The true measure is behavior change, not training completion:

  • Reduction in security incidents caused by human error
  • Increase in suspicious activity reports from employees
  • Decrease in policy violations (USB usage, shadow IT, data handling)
  • Improvement in secure development practices (for technical staff)

9. OWASP SAMM Maturity Scoring

Model Structure

OWASP SAMM (Software Assurance Maturity Model) organizes application security into 5 business functions with 15 security practices:

[CONFIRMED] — SAMM 2.0 structure. Source: owaspsamm.org/model.

Business Function Security Practices
Governance Strategy & Metrics, Policy & Compliance, Education & Guidance
Design Threat Assessment, Security Requirements, Secure Architecture
Implementation Secure Build, Secure Deployment, Defect Management
Verification Architecture Assessment, Requirements-driven Testing, Security Testing
Operations Incident Management, Environment Management, Operational Management

Maturity Levels

Each practice has 3 maturity levels:

Level Characterization Typical State
1 Initial / Ad-hoc Basic practices exist, inconsistently applied. Individuals doing security work without organizational support.
2 Managed / Defined Practices are documented, consistent, and organization-wide. Processes exist and are followed.
3 Optimized / Measured Continuous improvement based on metrics. Practices are automated, measured, and feed back into program improvement.

Scoring Methodology

Each practice is scored 0-3 based on maturity level achieved. The organization's overall SAMM score is the average across all 15 practices:

SAMM_Score = Sum(Practice_Scores) / 15

Individual function scores:

Function_Score = Sum(Function_Practice_Scores) / 3

Assessment Approach

  1. Scope: define which applications/business units to assess
  2. Evaluate: for each practice, determine current maturity level using SAMM's activity questions
  3. Score: assign 0-3 per practice based on evidence
  4. Target: set target maturity per practice (not everything needs to be level 3)
  5. Roadmap: prioritize improvement activities based on gap between current and target
  6. Reassess: periodic reassessment (annually recommended) to measure progress

SAMM Metrics Integration

Each practice at Level 3 implies metrics-driven management. Key metrics per function:

Function Key Metrics at Maturity
Governance Security budget as % of IT spend, training coverage, policy review cadence
Design % of projects with threat models, security requirements coverage
Implementation Secure build pipeline adoption, dependency vulnerability rate, defect fix rate
Verification % of apps with security testing, finding density trends, architecture review coverage
Operations Incident response time, environment hardening score, patching compliance

Using SAMM for Program Measurement

Track SAMM scores over time as a program maturity indicator:

Quarter    Governance  Design  Implementation  Verification  Operations  Overall
Q1 2025    1.3         0.7     1.0             0.7           1.0         0.9
Q2 2025    1.7         1.0     1.3             1.0           1.0         1.2
Q3 2025    2.0         1.3     1.7             1.3           1.3         1.5
Q4 2025    2.0         1.7     2.0             1.7           1.7         1.8

This provides a defensible, industry-standard maturity narrative for executive reporting.


10. Executive Dashboard Design

What to Show

The Executive Security Dashboard (1 page)

Section 1: Risk Posture (top of page, most prominent)

  • Overall risk exposure trend (last 12 months)
  • Risk appetite adherence: % of risks within appetite
  • Top 5 risks with owner and status
  • Risk reduction attributed to security investments

Section 2: Threat Landscape (contextualizes the risk)

  • Active threats relevant to the organization (from threat intel)
  • Incidents this period: count, severity, business impact
  • Near misses / blocked attacks (demonstrates value)

Section 3: Program Health (3-5 key operational metrics)

  • Vulnerability MTTR trend (are we getting faster?)
  • Detection coverage score (ATT&CK-based)
  • SLA adherence for critical/high vulnerabilities
  • Compliance posture across applicable frameworks
  • SAMM maturity score trend

Section 4: Investment & Capacity

  • Security spend vs. industry benchmark
  • Key initiative status (on track / at risk / blocked)
  • Staffing: current FTEs, open positions, attrition

Design Principles

  1. Trend lines over point-in-time numbers: a single number is meaningless without context. Show 6-12 months of trend.
  2. Red/yellow/green with thresholds: define what red means BEFORE building the dashboard. If everything is always green, thresholds are wrong.
  3. Comparison baselines: compare to last quarter, last year, and industry benchmarks where available.
  4. Narrative, not just numbers: each metric needs a one-sentence "so what?" annotation.
  5. Drill-down capability: executives see the top level; managers can drill into operational detail.

What NOT to Show

Avoid Why Show Instead
Raw alert volume Meaningless without context; bigger number != more secure Alert-to-incident ratio, FP rate trend
Total vulnerabilities found Penalizes organizations that scan more Vulnerability density, MTTR, net new rate
Scan count / tool inventory Activity, not outcome Coverage percentage, gap analysis
Compliance checklist completion Checkbox security; presence != effectiveness Control effectiveness rate, audit finding trend
Vanity metrics (phishing emails blocked) Inflated numbers that mean nothing Phishing click rate trend, report rate
Technical jargon Executives don't care about Sigma rules "We can detect X% of techniques used by threat actors targeting our industry"
Too many metrics Dilutes attention Maximum 8-10 KPIs per dashboard

Reporting Cadence

Audience Cadence Format Depth
Board of Directors Quarterly 3-5 slides Strategic risk, major incidents, program maturity
C-Suite / ELT Monthly 1-page dashboard + narrative Risk posture, program health, investment ROI
VP / Directors Weekly Operational dashboard Team metrics, SLA adherence, capacity
Team Leads Daily Automated dashboards Alert queues, backlog, sprint progress

11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS

CVSS v4.0 — Key Changes from v3.1

[CONFIRMED] — Source: FIRST CVSS v4.0 Specification Document.

Metric Groups

Group Purpose Affects Score?
Base Intrinsic vulnerability characteristics, constant over time Yes
Threat Current exploit status; replaces v3.1 Temporal Yes
Environmental Organization-specific context, compensating controls Yes
Supplemental Additional context for prioritization No

New Nomenclature (Mandatory)

Label Metrics Included Use Case
CVSS-B Base only NVD-published scores
CVSS-BT Base + Threat Score adjusted for exploit availability
CVSS-BE Base + Environmental Score adjusted for org-specific context
CVSS-BTE Base + Threat + Environmental Fully contextualized score

Critical implication: when someone says "CVSS 9.1" you must ask "CVSS-B, BT, BE, or BTE?" A CVSS-B 9.1 is very different from a CVSS-BTE 9.1.

New Metrics

Metric Group Values Purpose
Attack Requirements (AT) Base None / Present Deployment conditions beyond security hardening (race conditions, network positioning)
Automatable (AU) Supplemental No / Yes Can attacker automate all kill chain steps?
Provider Urgency (U) Supplemental Red / Amber / Green / Clear Vendor's severity assessment
Recovery (R) Supplemental Automatic / User / Irrecoverable System resilience post-exploitation
Value Density (V) Supplemental Diffuse / Concentrated Resource concentration per exploit
Vulnerability Response Effort (RE) Supplemental Low / Moderate / High Remediation difficulty for consumers
Safety (S) Supplemental + Environmental Negligible / Present Human injury risk (IEC 61508)

Scoring Methodology Change

v4.0 replaces v3.1's linear formula with a MacroVector equivalence class system:

  • Vectors cluster into MacroVectors (equivalence sets of comparable qualitative severity)
  • Six equivalence groups (EQ1-EQ6) determined through expert evaluation
  • Score = MacroVector lookup score, interpolated by "severity distance" within the class
  • Produces scores rounded to one decimal place

Practical impact: v4.0 scores are NOT directly comparable to v3.1 scores. Organizations transitioning must re-baseline their SLA thresholds.

EPSS — Exploit Prediction Scoring System

[CONFIRMED] — Source: FIRST EPSS Model Documentation.

What EPSS Measures

Daily probability estimate that a published CVE will see exploitation activity in the next 30 days.

Model Inputs

Category Sources
Vulnerability metadata CPE, CWE, CVSS vectors (via NVD)
Temporal signals Days since CVE publication
Known exploitation CISA KEV, Google Project Zero, Zero Day Initiative
Public exploits Exploit-DB, GitHub, Metasploit
Security tools Nuclei, Intrigue, sn1per, jaeles templates
Exploitation evidence Honeypots, IDS/IPS sensors, host-based detection (from data partners)

Model Methodology

  • Trains on 12 months of historical data
  • Validates against 2 months of unseen "future" data
  • Daily refresh of all probability estimates
  • Measures attempted exploitation (not successful exploitation)
  • Recognizes exploitation is "bursty, sporadic, sometimes isolated, localized and ephemeral"

Using CVSS + EPSS Together

The optimal vulnerability prioritization strategy combines both:

Priority_Score = f(CVSS-BTE_Score, EPSS_Probability, Asset_Criticality)

Decision matrix:

EPSS CVSS High (>=7) CVSS Low (<7)
High (>=10%) Immediate action Investigate — likely exploited despite low severity
Low (<10%) Standard SLA — severe but unlikely to be exploited Backlog — lowest priority

12. Anti-Patterns & Pitfalls

Metric Anti-Patterns

Anti-Pattern Why It's Harmful Alternative
Measuring activity, not outcomes "We ran 500 scans" says nothing about security posture Measure coverage, finding trends, MTTR
Vanity metrics Inflated numbers (blocked attacks, threats stopped) create false confidence Measure what got through, not what was blocked
Gaming incentives Analysts close tickets prematurely to hit KPIs Balance volume metrics with quality metrics
Measuring everything 200 metrics = no metrics. Attention is finite Maximum 8-10 KPIs per audience level
Point-in-time snapshots A single number without trend is meaningless Always show trend (minimum 6 months)
Watermelon metrics Green on outside, red inside — aggregate masks problems Segment by severity, team, asset tier
Comparing unlike things Comparing MTTR across orgs with different definitions Standardize definitions before benchmarking
Severity inflation Everything is critical = nothing is critical Enforce severity criteria, audit regularly
Denominator blindness "We fixed 1000 vulns!" but 5000 new ones appeared Always show rates and ratios, not raw counts

Common Measurement Failures

  1. No baseline: measuring improvement requires knowing where you started
  2. Undefined thresholds: red/yellow/green without defined criteria is opinion, not measurement
  3. Lagging-only programs: if you only measure what already happened, you cannot predict or prevent
  4. Metric rot: dashboards built once and never updated become decoration, not instrumentation
  5. Disconnected metrics: operational metrics that don't roll up to risk metrics that don't connect to business outcomes

The Goodhart's Law Warning

"When a measure becomes a target, it ceases to be a good measure."

Every metric you publish will be optimized for. If you measure MTTR, teams will game MTTR. Counterbalances:

  • Use metric pairs (MTTR + recurrence rate; fix rate + net new rate)
  • Rotate emphasis metrics periodically
  • Validate metrics against ground truth (red team results, real incidents)
  • Separate metrics used for improvement from metrics used for performance evaluation

Quick Reference: Starter Metric Set

For organizations building a security metrics program from scratch, start with these 10:

# Metric Level Owner
1 Risk Exposure Trend Strategic CISO
2 Vulnerability MTTR by Severity Operational Vuln Mgmt Lead
3 SLA Adherence Rate Operational Vuln Mgmt Lead
4 Detection Coverage (ATT&CK) Strategic Detection Eng Lead
5 SOC False Positive Rate Operational SOC Manager
6 MTTD (from red team exercises) Strategic CISO
7 Patch Compliance Rate Operational IT Ops / Infra Lead
8 AppSec Net New Vuln Rate Operational AppSec Lead
9 Compliance Control Effectiveness Strategic GRC Lead
10 Phishing Report Rate Operational Awareness Lead

Add SAMM maturity scoring when the program is ready for formal maturity assessment (typically 12-18 months after program establishment).


Sources

  • FIRST CVSS v4.0 Specification Document — https://www.first.org/cvss/v4.0/specification-document
  • FIRST EPSS Model Documentation — https://www.first.org/epss/ and https://www.first.org/epss/model
  • OWASP Vulnerability Disclosure Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Vulnerability_Disclosure_Cheat_Sheet.html
  • OWASP SAMM v2.0 — https://owaspsamm.org/model/
  • NIST Cybersecurity Framework 2.0 — https://www.nist.gov/cyberframework
  • Risk Measurement (magoo) — https://magoo.github.io/risk-measurement
  • Netflix Sketchy (deprecated) — Visual URL threat detection for SOC automation context
PreviousGRC & Risk
NextSecurity Leadership

On this page

  • Table of Contents
  • 1. Foundations: Why Measure Security
  • The Measurement Hierarchy
  • Metric Design Principles
  • NIST CSF 2.0 as Measurement Backbone
  • 2. Vulnerability Management Metrics
  • Core KPIs
  • Vulnerability Disclosure Metrics
  • Advanced: EPSS-Informed Prioritization
  • 3. SOC & Detection Metrics
  • Operational KPIs
  • 4. Application Security Metrics
  • Vulnerability Density
  • Fix Rate & Velocity
  • SAST/DAST Coverage
  • Dependency Health
  • Security Debt
  • 5. Risk Metrics & Quantification
  • Quantitative Risk Measurement
  • Risk Exposure Metrics
  • Risk Treatment Metrics
  • 6. Compliance Metrics
  • Control Coverage
  • Control Effectiveness
  • Audit Metrics
  • Regulatory Compliance Posture
  • 7. Detection Engineering Metrics
  • ATT&CK Coverage
  • Rule Performance Metrics
  • Detection Gap Trend
  • Detection-as-Code Metrics
  • 8. Security Awareness Metrics
  • Phishing Simulation Metrics
  • Training Metrics
  • Awareness Program Effectiveness
  • 9. OWASP SAMM Maturity Scoring
  • Model Structure
  • Maturity Levels
  • Scoring Methodology
  • Assessment Approach
  • SAMM Metrics Integration
  • Using SAMM for Program Measurement
  • 10. Executive Dashboard Design
  • What to Show
  • Design Principles
  • What NOT to Show
  • Reporting Cadence
  • 11. Scoring Systems Deep Dive: CVSS v4.0 & EPSS
  • CVSS v4.0 — Key Changes from v3.1
  • EPSS — Exploit Prediction Scoring System
  • 12. Anti-Patterns & Pitfalls
  • Metric Anti-Patterns
  • Common Measurement Failures
  • The Goodhart's Law Warning
  • Quick Reference: Starter Metric Set
  • Sources