Change Management & Deployment Standard¶

This document outlines the standard process for managing changes and deployments within the SOC environment.

1. Change Management Process¶

All changes to the production SOC environment (Alert Rules, Parsers, Infrastructure) must follow a structured process.

sequenceDiagram
    participant Eng as Engineer
    participant Mgr as Manager
    participant CAB as CAB Board
    participant Prod as Production

    Eng->>Mgr: Submit RFC
    Mgr->>Mgr: Review Risk
    alt Low Risk
        Mgr->>Prod: Approve & Schedule
    else High Risk
        Mgr->>CAB: Request Approval
        CAB->>Prod: Approve Deployment
    end
    Prod-->>Eng: Deployment Complete

1.1 Request (RFC)¶

Submit a Request for Change (RFC) documenting:
- Description of change.
- Justification/Impact.
- Risk assessment.
- Rollback plan.

1.2 Review & Approval (CAB)¶

Change Advisory Board (CAB) reviews High-risk changes.
Peer review is required for Alert Rule modifications (Detection Engineering).

2. Deployment Procedures¶

2.1 Environment Strategy¶

Development/Lab: Sandbox environment for testing new rules and integrations.
Staging: Mirror of production for final verification.
Production: Live environment.

2.2 Deployment Steps¶

Test: Validate functionality in the Lab environment.
Snapshot: Take a backup/snapshot of the current configuration.
Deploy: Apply changes to Production during the approved window.
Verify: Confirm operational status and check for errors.

2.3 CI/CD for Detection Rules¶

Manage detection rules as code (Detection-as-Code).
Use Version Control (Git) for all rule logic.
Automate testing (Syntax check, Unit test) via CI pipeline before merging to main.

3. Rollback Plan¶

Every deployment must have a predefined rollback strategy.
If verification fails, immediately revert to the pre-deployment snapshot.
Conduct a Root Cause Analysis (RCA) for failed changes.

4. Change Risk Assessment¶

Risk Level	Criteria	Approval	Maintenance Window
Low	Cosmetic, documentation, non-impacting	SOC Lead	Anytime
Medium	New detection rule, parser update	SOC Manager	Business hours
High	SIEM config, integration change	SOC Manager + CAB	Maintenance window
Critical	Infrastructure, network, auth changes	CISO + CAB	Scheduled downtime

5. Maintenance Windows¶

Window	Schedule	Duration	Use For
Standard	Tuesday & Thursday 02:00–06:00	4 hours	Medium/High changes
Emergency	As needed (CAB approval)	2 hours	Critical hotfixes
Extended	Last Saturday of month 00:00–08:00	8 hours	Infrastructure upgrades

6. Deployment Checklist¶

#	Step	Owner	Done
1	RFC submitted and approved	Engineer	☐
2	Peer review completed (detection rules)	Detection Eng	☐
3	Pre-deployment snapshot/backup taken	Engineer	☐
4	Change tested in staging environment	Engineer	☐
5	Rollback plan documented and tested	Engineer	☐
6	Deployment window confirmed	SOC Manager	☐
7	Stakeholders notified of change	SOC Lead	☐
8	Change deployed to production	Engineer	☐
9	Post-deployment verification completed	Engineer	☐
10	Monitoring for 30 min post-change (no errors)	SOC Lead	☐
11	RFC closed with results documented	Engineer	☐

7. Detection-as-Code Pipeline¶

graph LR
    Author["Author Rule"] --> PR["Pull Request"]
    PR --> Review["Peer Review"]
    Review --> CI["CI Pipeline"]
    CI --> Syntax["Syntax Check"]
    Syntax --> Unit["Unit Test"]
    Unit --> Staging["Deploy Staging"]
    Staging --> Validate["Validate 24h"]
    Validate --> Prod["Deploy Production"]

Rollback Procedures¶

When to Rollback¶

Indicator	Action
SIEM stops receiving logs	Rollback immediately
Alert volume drops to 0	Investigate first, rollback if not resolved in 15 min
False positive rate spikes > 50%	Rollback rule change, investigate
Dashboard/query errors	Rollback config change
Agent crash after update	Rollback agent version

Rollback Checklist¶

□ Identify the change that caused the issue
□ Notify SOC Manager that rollback is in progress
□ Apply rollback from backup/git
□ Verify system returns to normal operation
□ Document the failed change and root cause
□ Schedule post-mortem within 48 hours

Change Window Schedule¶

Change Type	Allowed Window	Approval Required	Rollback Time
Detection rule (new)	Anytime (test mode)	SOC Lead	< 5 min
Detection rule (production)	Business hours	SOC Lead + peer review	< 5 min
SIEM configuration	Maintenance window (Sun 02:00-06:00)	SOC Manager	< 30 min
Agent update (fleet)	Staged: 10% → 50% → 100% over 3 days	SOC Manager + IT	< 1 hour
Major platform upgrade	Maintenance window + CAB approval	CISO	< 4 hours

Post-Deployment Smoke Test¶

After any deployment, run these verification steps:

#!/bin/bash
# smoke_test.sh — Post-deployment verification

echo "=== Post-Deployment Smoke Test ==="

# 1. SIEM connectivity
echo -n "SIEM API: "
curl -s -o /dev/null -w "%{http_code}" https://siem.internal/api/health && echo " ✅" || echo " ❌"

# 2. Log ingestion (check last 5 min)
echo -n "Log ingestion: "
RECENT=$(curl -s 'localhost:9200/_count?q=@timestamp:>now-5m' | grep -o '"count":[0-9]*')
echo "$RECENT events in last 5 min"

# 3. Detection rules
echo -n "Active rules: "
# Adjust for your SIEM
echo "$(curl -s 'localhost:9200/_cat/count/sigma-*' | awk '{print $3}') rules loaded"

# 4. Alert routing
echo "Sending test alert..."
# Add your test alert mechanism here