SOC Metrics & KPIs Standard
This document defines the Key Performance Indicators (KPIs) and operational metrics used to measure SOC effectiveness, efficiency, and team health. Metrics drive continuous improvement and enable data-driven resource allocation.
Overview
graph LR
Data["📊 Raw Data"] --> Metrics["📈 Metrics"]
Metrics --> KPIs["🎯 KPIs"]
KPIs --> Decisions["💡 Decisions"]
Decisions --> Actions["🔧 Improvements"]
Actions --> Data
Metric Categories
| Category |
Focus |
Example Metrics |
| ⏱️ Efficiency |
How fast we respond |
MTTD, MTTR, MTTA |
| 🎯 Effectiveness |
How well we detect |
FPR, detection coverage, dwell time |
| 👥 Capacity |
Team workload and health |
Alerts per analyst, burnout, utilization |
| 💰 Business |
Value delivered |
Cost per incident, breach prevention |
| 📊 Compliance |
Regulatory adherence |
SLA met %, audit findings |
1. Efficiency Metrics (How Fast)
1.1 Mean Time To Detect (MTTD)
| Attribute |
Detail |
| Definition |
Average time from threat intrusion to SOC detection |
| Formula |
Σ(Detection Time − Intrusion Time) / Total Incidents |
| Target |
< 30 minutes |
| Measurement |
Monthly average |
| Data Source |
SIEM timestamps, EDR first-seen |
💡 Improvement levers: Better log coverage, behavioral analytics, automated correlation, threat intel feeds
1.2 Mean Time To Acknowledge (MTTA)
| Attribute |
Detail |
| Definition |
Average time from alert firing to analyst pickup |
| Formula |
Σ(Acknowledge Time − Alert Time) / Total Alerts |
| Target |
< 10 minutes |
| Measurement |
Daily average |
| Data Source |
Ticketing system timestamps |
💡 Improvement levers: SOAR auto-enrichment, queue prioritization, staffing optimization
1.3 Mean Time To Respond (MTTR)
| Attribute |
Detail |
| Definition |
Average time from detection to containment + remediation |
| Formula |
Σ(Resolution Time − Detection Time) / Total Incidents |
| Target |
< 60 minutes (Critical/High) · < 4 hours (Medium) |
| Measurement |
Monthly average, segmented by severity |
| Data Source |
Ticketing system, IR logs |
💡 Improvement levers: Pre-approved containment actions, SOAR playbooks, clear escalation paths
1.4 Mean Time To Close (MTTC)
| Attribute |
Detail |
| Definition |
Average time from incident opened to fully closed (including PIR) |
| Formula |
Σ(Close Time − Open Time) / Total Incidents |
| Target |
< 24 hours (Critical) · < 72 hours (High) |
| Measurement |
Monthly average |
2. Effectiveness Metrics (How Well)
2.1 False Positive Rate (FPR)
| Attribute |
Detail |
| Definition |
Percentage of alerts that are benign after investigation |
| Formula |
(False Positive Alerts / Total Alerts) × 100% |
| Target |
< 10% |
| Measurement |
Weekly trend |
FPR Improvement Actions
| FPR Level |
Action Required |
| < 5% |
✅ Excellent — maintain current tuning |
| 5–10% |
⚠️ Acceptable — review top 5 noisiest rules |
| 10–25% |
🟠 Needs attention — dedicated tuning sprint |
| > 25% |
🔴 Critical — pause new detections, focus on tuning |
📚 Tuning process: Alert Tuning SOP
2.2 Detection Coverage
| Attribute |
Detail |
| Definition |
Percentage of MITRE ATT&CK techniques with at least one detection |
| Formula |
(Techniques with Detection / Total Relevant Techniques) × 100% |
| Target |
≥ 80% of top 50 techniques |
| Measurement |
Quarterly assessment |
2.3 Dwell Time
| Attribute |
Detail |
| Definition |
Duration a threat actor remains undetected in the environment |
| Formula |
Compromise Time − Detection Time |
| Target |
< 24 hours (industry median: 16 days) |
| Impact |
Longer dwell time = higher data breach risk and cost |
2.4 Escalation Accuracy
| Attribute |
Detail |
| Definition |
Percentage of escalations from T1→T2 that are legitimate |
| Formula |
(Valid Escalations / Total Escalations) × 100% |
| Target |
≥ 85% |
| Impact |
Poor accuracy wastes Tier 2 capacity |
3. Capacity Metrics (Team Health)
3.1 Alert Volume & Distribution
| Metric |
Target |
Action if Exceeded |
| Alerts per analyst per shift |
15–25 |
Add staff or automate triage |
| Queue depth at shift end |
< 10 unassigned |
Review staffing model |
| Alert backlog (> 24h old) |
0 |
Immediate triage sprint |
3.2 Analyst Utilization
| Metric |
Target |
Notes |
| Utilization rate |
60–80% |
> 80% = risk of burnout |
| Overtime hours |
< 10% of regular hours |
Track monthly |
| Training time |
≥ 10% of work time |
Per individual per month |
3.3 Team Health
| Metric |
Target |
Why It Matters |
| Annual turnover rate |
< 15% |
SOC talent is expensive to replace |
| Average tenure |
> 2 years |
Institutional knowledge retention |
| Certification rate |
≥ 70% |
Team capability baseline |
| Job satisfaction score |
≥ 4/5 |
Quarterly anonymous survey |
4. Business Metrics (Value Delivered)
4.1 Cost Metrics
| Metric |
Formula |
Use |
| Cost per incident |
Total SOC Cost / Total Incidents |
Budget planning |
| Cost per alert |
Total SOC Cost / Total Alerts |
Efficiency comparison |
| Automation savings |
(Manual Time − Automated Time) × Hourly Rate |
ROI justification |
4.2 Business Impact Metrics
| Metric |
Definition |
| Prevented breach value |
Estimated cost of breaches prevented by SOC detection |
| Downtime prevented |
Hours of system downtime avoided through rapid response |
| Compliance adherence |
% of regulatory requirements met (PDPA, ISO 27001) |
5. Reporting & Dashboards
5.1 Reporting Cadence
| Report |
Frequency |
Audience |
Key Metrics |
| Shift Report |
Per shift |
SOC Lead |
Queue depth, active incidents, system health |
| Daily Brief |
Daily |
SOC Manager |
MTTA, alerts processed, escalations |
| Weekly Summary |
Weekly |
SOC Manager, CISO |
MTTD, MTTR, FPR, trends |
| Monthly SOC Report |
Monthly |
CISO, Management |
All KPIs, trends, improvements |
| Quarterly Business Review |
Quarterly |
C-Suite, Board |
Business metrics, ROI, strategic |
📚 Templates: Monthly SOC Report · QBR · KPI Dashboard
5.2 Dashboard Panels
Recommended real-time dashboard layout:
| Panel |
Visualization |
Refresh |
| 🚨 Active Incidents by Severity |
Pie/donut chart |
Real-time |
| 📈 Alert Volume Trend |
Line chart (7-day) |
5 min |
| ⏱️ MTTA / MTTR Real-time |
Gauge |
5 min |
| 📊 Queue Depth |
Bar chart by shift |
5 min |
| 🎯 FPR Weekly Trend |
Line chart |
Daily |
| 👥 Analyst Workload |
Heatmap |
15 min |
| 🌍 Top Source Countries |
Geo map |
Hourly |
| 🛡️ Detection Coverage |
MITRE heatmap |
Weekly |
6. Targets Summary
Quick reference for all metric targets:
| Metric |
Target |
Severity Segmentation |
| MTTD |
< 30 min |
All severities |
| MTTA |
< 10 min |
All severities |
| MTTR |
< 60 min |
Critical/High |
| MTTC |
< 24 hours |
Critical |
| FPR |
< 10% |
Overall |
| Detection Coverage |
≥ 80% |
Top 50 MITRE techniques |
| Dwell Time |
< 24 hours |
All incidents |
| Escalation Accuracy |
≥ 85% |
T1→T2 |
| Alerts per Analyst |
15–25 |
Per shift |
| Utilization |
60–80% |
Per analyst |
| Turnover |
< 15% |
Annual |
| SLA Adherence |
≥ 95% |
All incidents |
References