Security Architecture Design Patterns — Deep Dive

Principal-level reference for defense-in-depth, zero trust, microservices security, database hardening, browser security policies, protocol-specific controls, rate limiting, circuit breakers, and multi-tenant isolation.

Sources: OWASP Cheat Sheet Series (Microservices, Database, Docker, Kubernetes, SSRF, DoS, CSP, GraphQL, WebSocket, XML, CSS, Race Conditions), Google Cloud Security Foundations Blueprint.

Defense-in-Depth Patterns
Zero Trust Network Architecture
Microservices Security Architecture
Service Mesh and mTLS
API Gateway Security
Database Security Architecture
Container Security (Docker)
Kubernetes Security Architecture
CSP, CORS, and Same-Origin Policy
WebSocket Security
GraphQL Security Controls
XML and Serialization Security
SSRF Prevention Architecture
Rate Limiting Architecture
Circuit Breaker and Resilience Patterns
Race Condition Defense
Secure Multi-Tenant Design
CSS Security
Architectural Decision Framework

1. Defense-in-Depth Patterns

Defense-in-depth layers independent security controls so that failure of any single layer does not compromise the system. Each layer operates under the assumption that all other layers have already been breached.

The Layered Model

┌─────────────────────────────────────────────┐
│  LAYER 7: DATA        encryption at rest,   │
│                       field-level encryption,│
│                       tokenization, masking  │
├─────────────────────────────────────────────┤
│  LAYER 6: APPLICATION input validation, CSP,│
│                       auth/authz, WAF       │
├─────────────────────────────────────────────┤
│  LAYER 5: RUNTIME     containers, seccomp,  │
│                       AppArmor, sandboxing   │
├─────────────────────────────────────────────┤
│  LAYER 4: HOST        OS hardening, patching,│
│                       EDR, auditd           │
├─────────────────────────────────────────────┤
│  LAYER 3: NETWORK     segmentation, NACLs,  │
│                       IDS/IPS, mTLS         │
├─────────────────────────────────────────────┤
│  LAYER 2: IDENTITY    MFA, SSO, RBAC,       │
│                       least privilege        │
├─────────────────────────────────────────────┤
│  LAYER 1: PHYSICAL    datacenter security,   │
│                       HSMs, secure boot      │
└─────────────────────────────────────────────┘

Three Control Types (Google Cloud Blueprint Model)

Policy Controls — Programmatic constraints that enforce acceptable resource configurations. Prevent risky setups through infrastructure-as-code validation and organization policy constraints before deployment.
Architecture Controls — Resource configuration based on security best practices: network topology, resource hierarchy, blast radius containment.
Detective Controls — Anomaly detection, log aggregation, threat detection services, SIEM integration, custom enforcement.

Principles

Assume breach: design every layer as if the attacker already has a foothold in the adjacent layer.
Independent failure domains: a control at layer N must not depend on layer N-1 being intact.
Validation ordering: perform cheap validations (format, size, type) before expensive ones (database lookups, crypto operations).
No security theater: every control must measurably reduce risk. Call out controls that create illusion without substance.

2. Zero Trust Network Architecture

Zero trust eliminates implicit trust based on network location. Every request is authenticated, authorized, and encrypted regardless of origin.

Core Tenets

Principle	Implementation
Never trust, always verify	Every service-to-service call carries verifiable identity
Least privilege access	RBAC/ABAC with just-in-time elevation, time-bounded tokens
Assume breach	Microsegmentation limits blast radius; east-west traffic encrypted
Verify explicitly	Context-aware access: identity + device + location + behavior
Continuous validation	Session re-validation at intervals; token refresh with short TTL

Network Architecture Pattern

                        ┌─────────────┐
  User ──── Identity ───┤  Policy     │
  Device    Provider    │  Decision   │
  Context               │  Point      │
                        └──────┬──────┘
                               │ allow/deny
                        ┌──────▼──────┐
                        │  Policy     │
                        │  Enforcement│
                        │  Point      │
                        └──────┬──────┘
                               │ mTLS
                    ┌──────────┼──────────┐
                    ▼          ▼          ▼
               ┌────────┐ ┌────────┐ ┌────────┐
               │Service │ │Service │ │Service │
               │   A    │ │   B    │ │   C    │
               └────────┘ └────────┘ └────────┘

Google Cloud Blueprint Implementation

No public internet access by default: no outbound or inbound traffic to/from public internet permitted unless explicitly allowed.
Shared VPC: centralized network resource management across regions/zones with environment separation by network topology.
Private paths enforced: all on-premises and cloud resource communication over private interconnects.
GitOps model: all infrastructure changes through version-controlled, reviewed Terraform with policy-as-code validation in CI/CD pipeline before deployment.

Microsegmentation

Segment by workload, not by network subnet. Each workload gets its own identity.
Network policies (K8s) or security groups (cloud) restrict east-west traffic to explicit allow rules.
Default-deny posture: all traffic blocked unless a policy explicitly permits it.

3. Microservices Security Architecture

Authorization Layers

Authorization enforcement must occur at three independent layers:

Gateway/Proxy — Coarse-grained, cross-cutting decisions (authentication, basic role checks, rate limiting).
Microservice Layer — Shared libraries or sidecar proxies for fine-grained policy enforcement. Centralized policy with embedded Policy Decision Point (PDP) is recommended.
Business Logic — Service-specific authorization that understands domain context.

Authorization Patterns

Pattern	Description	Trade-offs
Decentralized	Policy embedded in service code	Independent but inconsistent; requires code changes for policy updates
Centralized Single PDP	Remote policy service evaluates all requests	Consistent but introduces latency and availability risk
Centralized Embedded PDP	Policy defined centrally, deployed as library/sidecar	Best of both: consistent policy, low latency, no external dependency at runtime

Netflix pattern: Policy Portal (authoring) -> Repository (storage) -> Aggregator (compilation) -> Distributor (deployment to sidecars).

Identity Propagation

Recommended: Trusted Issuer-Signed Structures

Edge services authenticate external tokens (OAuth2, OIDC), then mint internally-signed identity structures (e.g., Netflix "Passport"). This approach:

Decouples external tokens from internal representations
Uses single, extensible data structures
Never exposes internal structures externally
Is external access token agnostic

Anti-pattern: Passing raw external tokens between internal services. This creates tight coupling and risks privilege escalation through token manipulation.

Security Architecture Documentation

For each microservice, document:

Unique service name/ID, business process, API definitions with security schemes
Service-to-storage access types (read/write)
Synchronous service-to-service calls (protocol, data exchanged)
Asynchronous communications (publisher/subscriber via message queues)
Data asset classification (PII, confidential, public)
Trust boundary justifications

Logging Architecture

Service stdout/stderr ──► Local File ──► Logging Agent ──► Message Broker ──► Central Logging
                              │                │                  │
                              │                │                  ├─ Mutual TLS
                              │                ├─ Data sanitization│
                              │                │  (strip PII,     ├─ Least-privilege
                              │                │   passwords,     │   access policies
                              ├─ Prevents      │   API keys)     │
                              │  data loss     │                  │
                              │  on failure    ├─ Asynchronous    │
                                               │  (prevents DoS   │
                                               │   of log system) │

Requirements:

Correlation IDs for cross-service call tracing
Structured format (JSON) with contextual metadata (hostname, container, class)
Sanitization: never send PII, passwords, or API keys to central logging

4. Service Mesh and mTLS

Mutual TLS (mTLS)

Each microservice uses public/private key pairs for bidirectional authentication, providing:

Confidentiality: encrypted channel between services
Integrity: tamper detection
Authentication: cryptographic identity verification

Operational challenges:

Key provisioning and trust bootstrap (initial certificate distribution)
Certificate revocation (CRL/OCSP infrastructure)
Key rotation (automated renewal before expiry)
Certificate authority management (dedicated internal CA)

Service Mesh Benefits

Capability	Security Value
Automatic mTLS	Encryption and authentication without application code changes
Telemetry/tracing	Generates security-relevant metrics and distributed traces
Ingress/egress control	Traffic monitoring and policy enforcement at mesh boundary
Fine-grained RBAC	Service-level access control via mesh policies
Traffic shaping	Rate limiting, circuit breaking, retries with backoff

Service Mesh Trade-offs

Increases architectural complexity
Requires expertise in both K8s and mesh technology (Istio, Linkerd, Consul Connect)
Performance impact from sidecar proxy overhead (typically 1-3ms latency per hop)
Debugging becomes harder with proxy-mediated traffic

Token-Based Service Authentication (Alternative to mTLS)

Mode	Use Case	Trade-off
Online validation	Centralized token service validates each request	Detects revoked tokens immediately; higher latency
Offline validation	Services validate using downloaded public keys (JWKS)	Lower latency; cannot detect revoked tokens in real-time

5. API Gateway Security

Gateway as Security Perimeter

The API gateway centralizes:

Authentication (OAuth2/OIDC token validation)
Coarse-grained authorization (role/scope checks)
Rate limiting and throttling
Request/response transformation and validation
TLS termination
Logging and correlation ID injection

Gateway Limitations

Single point of decision: violates defense-in-depth if relied upon exclusively.
Scalability constraint: complex ecosystems with numerous roles become difficult to manage at the edge alone.
Operational bottleneck: development teams cannot independently modify authorization rules.

Mitigation Pattern

Implement mutual authentication to prevent gateway bypass and direct internal service access.
Layer authorization at gateway AND service AND business logic levels.
Use the gateway for cross-cutting concerns only; push domain-specific authorization to services.

6. Database Security Architecture

Network Isolation

┌──────────────────────────────────┐
│  DMZ / Application Tier          │
│  ┌──────────┐  ┌──────────┐     │
│  │  App A   │  │  App B   │     │
│  └────┬─────┘  └────┬─────┘     │
│       │              │           │
├───────┼──────────────┼───────────┤  ◄── Firewall
│  Database Tier                   │
│  ┌──────────┐  ┌──────────┐     │
│  │  DB A    │  │  DB B    │     │
│  └──────────┘  └──────────┘     │
└──────────────────────────────────┘

Disable TCP access where possible; require local socket or named pipe.
If TCP needed, bind to localhost or restrict via firewall to specific application hosts only.
Database servers in separate network segments from application tier.
Web-based management tools (phpMyAdmin, pgAdmin) require authentication, HTTPS, and network-level access controls.

Authentication and Access Control

Mandatory authentication for all connections, including local access.
Strong, unique passwords per database account.
Single-application or service-specific accounts (never shared credentials).
Never use default accounts (root, sa, SYS, SYSTEM) for application access.
No administrative rights for application accounts.
Host-based connection restrictions (connect only from designated app servers).
Environment-specific databases and accounts (dev, staging, prod never share credentials).

Least Privilege in Practice

Permission Level	Pattern
Minimal	SELECT, UPDATE, DELETE only (no DDL)
Table-level	Grant access to specific tables only
Column-level	Restrict sensitive columns (SSN, credit card)
Row-level	Row-level security policies filter by tenant/role
View-based	Access through restricted views rather than base tables
No DB links	Avoid database links unless absolutely necessary

Credential Management

Credentials stored outside web root in configuration files with restricted file permissions.
Excluded from source code repositories.
Encrypted using platform features (ASP.NET protected configuration, Vault, AWS Secrets Manager).
Regular credential rotation; immediate rotation on staff changes.

Transport Security

Enforce encrypted connections exclusively (reject plaintext).
Deploy trusted certificates on database servers.
Require TLSv1.2+ with modern ciphers (AES-GCM, ChaCha20).
Client-side certificate validation.

Hardening Checklist

Apply security patches promptly.
Run database service under low-privileged OS account.
Remove default accounts and sample databases.
Transaction logs on separate storage from data files.
Regular encrypted backups with restricted access permissions.
SQL Server: disable xp_cmdshell, xp_dirtree, CLR execution, SQL Browser, Mixed Mode Auth.
MySQL/MariaDB: run mysql_secure_installation; disable FILE privilege.

7. Container Security (Docker)

Defense-in-Depth Stack for Containers

Layer 1: Image Security
  ├── Pin specific versions (no floating tags)
  ├── Minimal base images (distroless, scratch)
  ├── CI/CD image scanning (Trivy, Snyk, Docker Scout)
  ├── SBOM generation
  ├── Image signing (Notary/Cosign)
  └── Private registries with access controls

Layer 2: Runtime Isolation
  ├── Non-root user (USER directive, runAsUser)
  ├── no-new-privileges flag
  ├── Drop all capabilities, add only needed
  ├── Never use --privileged
  └── Read-only root filesystem + tmpfs for temp

Layer 3: Kernel Security
  ├── Seccomp profiles (start from Docker default, customize)
  ├── AppArmor/SELinux mandatory access control
  └── Behavioral monitoring (Falco, Tetragon, Cilium eBPF)

Layer 4: Network Security
  ├── Custom Docker networks (explicit connectivity)
  ├── K8s NetworkPolicies for east-west traffic
  └── No exposed daemon sockets

Layer 5: Resource Limits
  ├── Memory limits (-m 512m)
  ├── CPU limits (--cpus="0.5")
  ├── File descriptor limits (--ulimit nofile=1024)
  ├── Process limits (--ulimit nproc=256)
  └── Restart policy (--restart=on-failure:3)

Layer 6: Secrets Management
  ├── Docker Secrets (Swarm) or external vault
  ├── Never bake secrets into images
  └── K8s: enable etcd encryption or use external KMS

Critical Anti-Patterns

Never expose /var/run/docker.sock to containers (container escape vector).
Never use TCP daemon socket without TLS mutual authentication.
Never use --privileged (grants all kernel capabilities).
Never use floating image tags in production (supply chain risk).
Never store secrets in environment variables in K8s (visible via API, logged in crash dumps).

Rootless Mode

Docker daemon and containers run as unprivileged user. If container escape occurs, attacker lands as unprivileged host user. Different from userns-remap (which remaps UIDs while daemon runs as root).

Alternative: Podman

Daemonless architecture using fork-exec model eliminates central daemon as single point of compromise. Native rootless support and SELinux integration provide OCI-compliant security defaults.

8. Kubernetes Security Architecture

Multi-Layer Security Model

┌─────────────────────────────────────────────┐
│  CLUSTER LEVEL                              │
│  ├── API Server hardening (OIDC, no static  │
│  │   tokens, Node+RBAC authorization)       │
│  ├── etcd encryption + mTLS + isolation     │
│  ├── Admission controllers (PSA, OPA,       │
│  │   Kyverno, ImagePolicyWebhook)           │
│  └── Audit logging (Metadata/Request level) │
├─────────────────────────────────────────────┤
│  NAMESPACE LEVEL                            │
│  ├── RBAC (deny-by-default, minimal verbs)  │
│  ├── Resource quotas (CPU, memory, pods)    │
│  ├── NetworkPolicies (default-deny ingress  │
│  │   and egress per namespace)              │
│  └── Pod Security Standards (restricted)    │
├─────────────────────────────────────────────┤
│  POD LEVEL                                  │
│  ├── SecurityContext (runAsNonRoot,         │
│  │   readOnlyRootFilesystem,               │
│  │   allowPrivilegeEscalation: false)       │
│  ├── Capability dropping (drop ALL)         │
│  ├── Service account with minimal RBAC      │
│  └── Image from signed, scanned registry    │
├─────────────────────────────────────────────┤
│  RUNTIME LEVEL                              │
│  ├── Falco/Tetragon behavioral monitoring   │
│  ├── Container sandboxing (gVisor, Kata)    │
│  └── Continuous vulnerability scanning      │
└─────────────────────────────────────────────┘

Pod Security Standards

Level	Posture	Use Case
Privileged	Unrestricted	System workloads (CNI, storage drivers) only
Baseline	Prevents known privilege escalations	General workloads
Restricted	Maximum hardening	Sensitive workloads, multi-tenant

Applied via namespace labels: pod-security.kubernetes.io/enforce: restricted Three modes: enforce (blocks), audit (logs), warn (alerts).

etcd Security

etcd stores all cluster state and secrets. Write access to etcd = root on entire cluster.

mTLS between API servers and etcd (dedicated CA).
Firewall isolation: only API servers can reach etcd.
etcd ACLs to limit keyspace access per component.
Consider separate etcd instances for different components.

API Server Authentication

Recommended: OIDC for short-lived tokens and centralized group management, or managed provider IAM (GKE, EKS, AKS).

Avoid: Static token files (no rotation), X509 client certs (no revocation), service account tokens for user auth (cluster-scoped, no expiry by default).

Container Sandboxing

For untrusted workloads, add isolation beyond Linux namespaces:

Technology	Mechanism	Overhead
gVisor	User-space kernel in Go, ~70% syscall coverage, uses ~20 host syscalls	Low-moderate
Kata Containers	Stripped-down VM per pod	Moderate
Firecracker	Micro-VM with seccomp + cgroup + namespace	Low

Kubelet Security

Kubelets expose HTTPS endpoints with powerful node/container control:

Enable authentication and authorization (disable anonymous access).
Restrict API access to trusted networks.
Monitor port 10250 (Kubelet API) for unauthorized access attempts.

9. CSP, CORS, and Same-Origin Policy

Same-Origin Policy (SOP)

The browser's foundational security boundary. Two URLs have the same origin if protocol, host, and port all match. SOP prevents scripts from one origin reading responses from another origin.

URL A	URL B	Same Origin?
https://a.com/page	https://a.com/other	Yes
https://a.com	http://a.com	No (protocol)
https://a.com	https://a.com:8443	No (port)
https://a.com	https://b.a.com	No (host)

Content Security Policy (CSP)

CSP is a defense-in-depth layer against XSS. It does not replace secure coding; it mitigates exploitation when output encoding fails.

Directive Categories

Category	Directives	Purpose
Fetch	script-src, style-src, img-src, connect-src, font-src, object-src, default-src	Control resource loading origins
Document	base-uri, sandbox, plugin-types	Restrict document properties
Navigation	form-action, frame-ancestors	Restrict navigation and framing
Reporting	report-to, report-uri	Violation reporting

Strict CSP (Recommended)

Nonce-based (preferred for server-rendered):

Content-Security-Policy:
  script-src 'nonce-{RANDOM}' 'strict-dynamic';
  object-src 'none';
  base-uri 'none';

Hash-based (for static pages):

Content-Security-Policy:
  script-src 'sha256-{HASH}' 'strict-dynamic';
  object-src 'none';
  base-uri 'none';

Key rules:

Generate unique nonce per HTTP response (cryptographically random).
Never create middleware that auto-injects nonces into all script tags (attacker-injected scripts would get nonces too).
strict-dynamic allows dynamically-created scripts from trusted scripts, reducing annotation burden.
object-src 'none' blocks plugin-based XSS vectors (Flash, Java).
base-uri 'none' prevents base tag injection for relative URL hijacking.

Deployment Strategy

Deploy in Content-Security-Policy-Report-Only mode first.
Monitor violation reports via report-to endpoint.
Refactor inline scripts to external files or add nonces.
Convert inline event handlers to addEventListener.
Switch to enforcing mode.

Additional CSP Protections

frame-ancestors 'none' — prevents clickjacking (supersedes X-Frame-Options).
upgrade-insecure-requests — forces HTTPS for mixed content.
form-action 'self' — prevents form hijacking to external endpoints.

CORS Security

CORS relaxes SOP in a controlled manner. Misconfigurations create same-origin-equivalent access for attackers.

Critical CORS Rules

Never reflect the Origin header as Access-Control-Allow-Origin — this is equivalent to a wildcard with credentials.
Never use wildcard * with credentials — browsers reject Access-Control-Allow-Origin: * when Access-Control-Allow-Credentials: true.
Validate Origin against a strict allowlist — exact string match, not substring or regex that can be bypassed.
Minimize exposed headers — only expose headers the client genuinely needs.
Set Vary: Origin — prevents cache poisoning when responses differ by origin.

Common CORS Misconfigurations

Misconfiguration	Risk
Reflecting Origin header verbatim	Any site can read authenticated responses
Origin: null in allowlist	Sandboxed iframes and data: URIs get access
Substring matching (e.g., `endsWith('.example.com')`)	`attacker-example.com` bypasses
Regex without anchoring	`example.com.evil.com` bypasses
Wildcard with credentials	Browser blocks but indicates design flaw

Secure CORS Pattern

ALLOWED_ORIGINS = {"https://app.example.com", "https://admin.example.com"}

def cors_middleware(request, response):
    origin = request.headers.get("Origin")
    if origin in ALLOWED_ORIGINS:
        response.headers["Access-Control-Allow-Origin"] = origin
        response.headers["Vary"] = "Origin"
        response.headers["Access-Control-Allow-Credentials"] = "true"
        response.headers["Access-Control-Allow-Methods"] = "GET, POST"
        response.headers["Access-Control-Allow-Headers"] = "Content-Type, Authorization"
        response.headers["Access-Control-Max-Age"] = "7200"
    # If origin not in allowlist: no CORS headers = browser blocks

10. WebSocket Security

Transport Security

Always use wss:// in production. Never use unencrypted ws://.
Support only RFC 6455. Disable legacy protocol versions (Hixie-76, hybi-00) with known vulnerabilities.
Disable permessage-deflate by default to prevent CRIME/BREACH-style compression side-channel attacks.

Cross-Site WebSocket Hijacking (CSWSH) Prevention

Browsers automatically include session cookies in WebSocket handshakes, enabling attackers on malicious sites to hijack authenticated connections.

Defenses:

Validate Origin header on every handshake against explicit allowlist (never blacklist, never wildcard).
Apply SameSite=Lax or SameSite=Strict cookies.
Use token-based authentication (query string or post-connection message) instead of relying solely on cookies.
Rotate tokens in long-lived connections to prevent hijacked session persistence.

Message-Level Security

Treat all WebSocket messages as untrusted input.
JSON schema validation with allowlists for message types/fields.
Binary file type verification via magic numbers (not headers).
Message size limits (typically 64KB maximum).
Nonce/timestamp inclusion to prevent replay attacks.
Use JSON.parse(), never eval().

Per-Action Authorization

Connection establishment does not grant blanket access. Validate user roles and permissions before processing each message/action independently.

DoS Mitigation

Control	Recommended Baseline
Per-user connection limit	5-10 concurrent connections
Message rate limit	100 messages/minute
Max payload size	64KB (configurable per use case)
Idle timeout	Close inactive connections
Backpressure	Flow control preventing unbounded buffering
Heartbeat	Ping/pong frames detecting and cleaning dead connections

Logging

Capture: connection/termination events with user identity and origin, auth outcomes, authz failures, protocol violations. Exclude: tokens, session IDs, message payloads containing sensitive data.

11. GraphQL Security Controls

GraphQL's flexibility creates unique attack surface compared to REST.

Query Abuse Prevention

Control	Purpose	Tools
Depth limiting	Prevent deeply nested queries causing recursive resolution	graphql-depth-limit (JS), MaxQueryDepthInstrumentation (Java)
Complexity analysis	Assign cost to field resolution, reject expensive queries	graphql-cost-analysis (JS), Apollo complexity plugins
Timeout per resolver	Prevent individual resolvers from hanging	10-second default
Pagination enforcement	Prevent unbounded list queries	Require `first`/`last` arguments

Schema Exposure Controls

Disable introspection in production to prevent schema reconnaissance.
Disable "Did you mean?" suggestions (leaks field names even with introspection off).
Field visibility middleware for role-based schema exposure (different roles see different schema subsets).

Authorization Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Gateway    │────▶│   Resolver   │────▶│  Data Layer  │
│  Auth Check  │     │  RBAC Check  │     │  Row-Level   │
│  (identity)  │     │  (field-lvl) │     │  Security    │
└──────────────┘     └──────────────┘     └──────────────┘

Validate authorization on both graph edges AND nodes.
Implement checks within Query/Mutation resolvers using RBAC middleware.
Prevent IDOR by verifying caller permissions before data access (especially for direct ID-based lookups).
Use GraphQL Interfaces and Unions to return different object shapes based on requester privileges.

Batching Attack Mitigation

GraphQL allows multiple queries in a single HTTP request, bypassing per-request rate limits:

Object-level rate limiting: track per-caller object requests across batches.
Sensitive field protection: prevent batching for usernames, emails, OTPs, session tokens.
Operation throttling: limit concurrent queries per request (e.g., max 5 operations per batch).

Persisted Queries

Pre-approve query strings at deployment time. Clients send query hash instead of arbitrary query text. Eliminates arbitrary query execution, batching abuse, and query injection risks.

Input Validation

Enforce allowlisting via GraphQL scalars, enums, and custom validators.
Define input schemas for all mutations.
Use parameterized queries/ORMs in resolvers (never string concatenation).
Disable dynamic resolver targeting to prevent SSRF/command injection.

12. XML and Serialization Security

XXE Prevention

XXE (XML External Entity) attacks exploit parser features to read files, perform SSRF, or cause DoS.

Universal defense: disable external entity processing entirely in parser configuration.

# Python (defusedxml)
import defusedxml.ElementTree as ET
tree = ET.parse(source)  # XXE-safe by default

# Java (DocumentBuilderFactory)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", True)
factory.setFeature("http://xml.org/sax/features/external-general-entities", False)
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", False)

XML Bomb Prevention

Attack	Mechanism	Defense
Billion Laughs	Exponential entity nesting (recursive references)	Entity expansion limits, depth restrictions
Quadratic Blowup	Large entity referenced repeatedly (O(n^2) expansion)	Entity size limits
Recursive References	Circular entity definitions	Recursion depth limits

Schema Hardening

Use XML Schema (XSD), not DTD, for validation.
Set maxOccurs boundaries (never unbounded without testing).
Use precise types: positiveInteger not integer, decimal not float/double (prevents Infinity/NaN).
Apply maxLength, minLength, pattern restrictions on strings.
Enumerate allowed values where possible.

Schema Poisoning Defense

Embed schemas with integrity verification (don't fetch remotely at runtime).
Restrict file permissions on local schema/DTD files.
If remote schemas needed, use HTTPS only, maintain local copies, verify integrity.

General Serialization Security

Reject DTDs entirely (SOAP specification forbids them).
Validate document well-formedness before processing.
Set resource limits: document size, element count, nesting depth.
Avoid disclosing internal paths in error messages.

13. SSRF Prevention Architecture

Defense-in-Depth: Application + Network Layers

┌─────────────────────────────────────┐
│  APPLICATION LAYER                  │
│  ├── Input validation (reject URLs) │
│  ├── IP/domain allowlisting         │
│  ├── DNS rebinding prevention       │
│  └── URL scheme restriction         │
├─────────────────────────────────────┤
│  NETWORK LAYER                      │
│  ├── Firewall egress filtering      │
│  ├── Network segmentation           │
│  └── Cloud metadata protection      │
└─────────────────────────────────────┘

Application Layer Controls

Rule 1: Never accept complete URLs from users. URLs are difficult to validate and parsers can be abused. Accept only validated IP addresses or domain names.

IP validation:

Validate format using language-specific libraries (not regex).
Cross-reference against allowlist of trusted IPs (both IPv4 and IPv6).
Use validated library output as comparison baseline to prevent encoding bypasses.

Domain validation:

Validate format without performing DNS resolution.
Maintain allowlist of trusted domains.
Monitor DNS records to detect resolution to non-public IP ranges.

Deny-list minimums (when allowlisting not possible):

AWS IMDS: 169.254.169.254, fd00:ec2::254
Localhost: 127.0.0.0/8, ::1/128
RFC1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Link-local: 169.254.0.0/16
Multicast: 224.0.0.0/4

Network Layer Controls

Restrict outbound application access via host-based or network firewalls to only legitimate routes.
Network compartmentalization to block illegitimate calls at infrastructure level.
Disable HTTP redirect following to prevent validation bypass.

Cloud Metadata Protection

Migrate from IMDSv1 to IMDSv2 (AWS) as defense-in-depth. IMDSv2 requires a session token obtained via PUT request, which SSRF attacks cannot easily replicate.

DNS Rebinding Prevention

Resolve domains against internal DNS resolvers only.
Retrieve all A and AAAA records, validate each IP against private ranges.
Monitor allowlisted domains for resolution changes to non-public addresses.
Pin DNS resolution results (use the resolved IP, not the hostname, for the actual request).

14. Rate Limiting Architecture

Multi-Layer Rate Limiting

┌──────────────────────────────────────────┐
│  EDGE (CDN/WAF/Load Balancer)            │
│  ├── Per-IP rate limits                  │
│  ├── Geographic filtering                │
│  ├── Volumetric DDoS mitigation          │
│  └── Connection rate limits              │
├──────────────────────────────────────────┤
│  API GATEWAY                             │
│  ├── Per-user/API-key rate limits        │
│  ├── Per-endpoint rate limits            │
│  ├── Request size limits                 │
│  └── Concurrent connection limits        │
├──────────────────────────────────────────┤
│  APPLICATION                             │
│  ├── Per-operation rate limits            │
│  ├── Resource-specific throttling        │
│  ├── Business logic rate limits          │
│  └── Object-level rate limits (GraphQL)  │
├──────────────────────────────────────────┤
│  DATABASE                                │
│  ├── Connection pool limits              │
│  ├── Query timeout limits                │
│  └── Transaction timeout limits          │
└──────────────────────────────────────────┘

Rate Limiting Algorithms

Algorithm	Behavior	Use Case
Token Bucket	Steady refill rate, burst allowed up to bucket size	API rate limiting (most common)
Leaky Bucket	Fixed drain rate, excess dropped	Smoothing bursty traffic
Fixed Window	Counter resets at interval boundary	Simple per-minute/hour limits
Sliding Window Log	Precise per-request timestamp tracking	High-accuracy rate limiting
Sliding Window Counter	Weighted average of current and previous window	Balance of accuracy and performance

Slow HTTP Attack Defense

Define minimum ingress data rate limit; drop connections below that rate (counters Slowloris, Slow POST).
Absolute connection timeouts (not just idle timeouts).
Maximum request header size and body size limits.
Total concurrent connection limits per client IP.

DoS Resilience Patterns

Validation ordering: perform cheap checks (format, size) before expensive ones (database, crypto).
Authentication gating: require authentication before allowing access to resource-intensive operations.
Graceful degradation: maintain reduced functionality rather than complete failure.
Static resource separation: host images, scripts, CSS on separate domains/CDN.
Caching: serve cached responses for repeat requests.
Asynchronous processing: use queues for CPU-intensive operations; return 202 Accepted.

15. Circuit Breaker and Resilience Patterns

Circuit Breaker Pattern

Prevents cascading failures when downstream services degrade.

            ┌──────────────┐
  Request ──┤   CLOSED     │──── Forward to service
            │  (normal)    │
            └──────┬───────┘
                   │ failure threshold exceeded
            ┌──────▼───────┐
            │    OPEN      │──── Return fallback/error immediately
            │  (tripped)   │     (no request forwarded)
            └──────┬───────┘
                   │ timeout expires
            ┌──────▼───────┐
            │  HALF-OPEN   │──── Forward limited probe requests
            │  (testing)   │     Success → CLOSED
            └──────────────┘     Failure → OPEN

Configuration Parameters

Parameter	Description	Typical Value
Failure threshold	Errors before opening	5-10 failures in 60s
Timeout	Time in OPEN before probing	30-60 seconds
Success threshold	Successes in HALF-OPEN to close	3-5 consecutive
Monitoring window	Rolling window for failure counting	60 seconds

Bulkhead Pattern

Isolate failures to prevent resource exhaustion across the entire system:

Separate thread pools per downstream dependency.
Separate connection pools per service.
Resource quotas per tenant/customer.
Namespace isolation in K8s (CPU/memory quotas per namespace).

Retry with Backoff

Attempt 1: immediate
Attempt 2: wait 1s + jitter
Attempt 3: wait 2s + jitter
Attempt 4: wait 4s + jitter
(cap at max backoff, e.g., 30s)

Always add random jitter to prevent thundering herd.
Set maximum retry count (3-5 typically).
Only retry on transient failures (5xx, timeouts), never on 4xx.
Combine with circuit breaker: when circuit opens, stop retrying.

Timeout Hierarchy

Client timeout > Gateway timeout > Service timeout > DB timeout
     30s              15s              10s             5s

Each layer's timeout must be shorter than its caller's to prevent zombie connections.

16. Race Condition Defense

TOCTOU (Time-of-Check-to-Time-of-Use)

The classic pattern: a resource is checked for a condition, then used based on that check, but the resource changes between check and use.

Thread A: check(balance >= 100)     → true
Thread B: check(balance >= 100)     → true
Thread A: debit(100)                → balance = 0
Thread B: debit(100)                → balance = -100  ← RACE

Defense Patterns

Pattern	Mechanism	Use Case
Pessimistic locking	`SELECT ... FOR UPDATE` acquires row lock before read	Financial transactions, inventory
Optimistic locking	Version column; `UPDATE ... WHERE version = N` fails if concurrent modification	Low-contention scenarios
Atomic operations	`UPDATE balance = balance - 100 WHERE balance >= 100` (check + modify in single statement)	Simple counter/balance operations
Database constraints	`CHECK (balance >= 0)` enforced at DB level	Invariant enforcement
Idempotency keys	Client-generated unique key per operation; server rejects duplicates	Payment processing, API mutations
Serializable isolation	`SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`	Highest consistency requirement
Mutex/advisory locks	`pg_advisory_lock(key)` or application-level mutex	Cross-table consistency

Idempotency Pattern

Client: POST /payments {idempotency_key: "abc-123", amount: 100}

Server:
  1. Check idempotency_key in store
  2. If exists: return cached response (no re-execution)
  3. If not: execute, store result keyed by idempotency_key, return

Keys should expire after reasonable window (24-48 hours).
Store both request hash and response to detect parameter tampering.
Use database unique constraint on idempotency key for atomicity.

Distributed Systems Considerations

Redis SETNX (SET if Not eXists) for distributed locks with TTL.
Redlock algorithm for fault-tolerant distributed locking across multiple Redis instances.
Database-level locking preferred over application-level when possible (closer to the data, harder to bypass).
Event sourcing: append-only log eliminates update races entirely.

17. Secure Multi-Tenant Design

Isolation Models

Strongest ──────────────────────────────────── Weakest
   │                                              │
   ▼                                              ▼
Separate     Separate      Shared Infra,     Shared
Infrastructure  Namespaces/   Separate DB/      Everything
(per tenant)    VPCs          Schema            (row-level)

Model	Isolation	Cost	Complexity	Use Case
Separate infrastructure	Highest	Highest	Moderate	Regulated industries, government
Separate namespaces/VPCs	High	High	High	Enterprise SaaS
Shared infra, separate DB/schema	Medium	Medium	Medium	Standard SaaS
Shared everything (row-level)	Lowest	Lowest	Low	Consumer apps, cost-sensitive

Kubernetes Multi-Tenant Pattern

┌─────────────────────────────────────────┐
│  Cluster                                │
│  ┌─────────────────────────────────────┐│
│  │  Namespace: tenant-a               ││
│  │  ├── ResourceQuota (4 pods, 2 CPU) ││
│  │  ├── NetworkPolicy (deny all       ││
│  │  │   cross-namespace)              ││
│  │  ├── Pod Security: restricted      ││
│  │  └── RBAC: tenant-a-role           ││
│  └─────────────────────────────────────┘│
│  ┌─────────────────────────────────────┐│
│  │  Namespace: tenant-b               ││
│  │  ├── ResourceQuota (4 pods, 2 CPU) ││
│  │  ├── NetworkPolicy (deny all       ││
│  │  │   cross-namespace)              ││
│  │  ├── Pod Security: restricted      ││
│  │  └── RBAC: tenant-b-role           ││
│  └─────────────────────────────────────┘│
└─────────────────────────────────────────┘

Cross-Cutting Multi-Tenant Controls

Layer	Control	Purpose
Identity	Tenant context in JWT claims	Every request carries tenant identity
API Gateway	Tenant-aware rate limiting	Per-tenant quotas prevent noisy neighbor
Application	Tenant filter on all queries	Prevent cross-tenant data access
Database	Row-level security policies	DB-enforced tenant isolation
Storage	Tenant-prefixed object keys + IAM	Prevent cross-tenant storage access
Encryption	Per-tenant encryption keys	Cryptographic isolation of data
Logging	Tenant ID in all log entries	Tenant-scoped audit trails
Network	Namespace/VPC isolation	Network-level blast radius containment

Noisy Neighbor Prevention

Per-tenant resource quotas (CPU, memory, storage, API calls).
Per-tenant connection pool limits to shared databases.
Per-tenant queue depth limits for async processing.
Circuit breakers per tenant: if one tenant causes excessive errors, isolate them without affecting others.
Fair scheduling: weighted round-robin or priority queues preventing any single tenant from monopolizing shared resources.

Data Isolation Verification

Automated tests that attempt cross-tenant data access (should fail).
SQL query audit: every query touching tenant data must include tenant filter (static analysis or query interceptor).
Penetration testing specifically targeting tenant boundary bypass (IDOR, parameter tampering, JWT manipulation).

18. CSS Security

Attack Surface

Reconnaissance via CSS selectors: descriptive class names (.addUser, .deleteUser, .adminPanel) reveal application features to unauthenticated attackers examining global CSS files.
CSS injection: if attacker-controlled content enters stylesheets, it can enable data exfiltration via background-image URLs, clickjacking via element repositioning, and UI redressing.
Third-party stylesheet risk: externally hosted CSS can be modified to inject malicious styles.

Defenses

Role-based CSS isolation: segregate stylesheets by access level. Server-side access controls on CSS file delivery. Log suspicious CSS file access.
CSS obfuscation: replace descriptive selectors with generated names using CSS Modules, JSS (minify option), or build-time obfuscation. Use framework classes (Bootstrap, Tailwind) to reduce custom selectors.
CSP for styles: style-src 'self' or style-src 'nonce-{RANDOM}' to prevent inline style injection and restrict stylesheet sources.
Subresource Integrity (SRI): <link rel="stylesheet" href="..." integrity="sha256-..." crossorigin="anonymous"> for third-party stylesheets.

19. Architectural Decision Framework

Security Architecture Review Checklist

When designing or reviewing a system, evaluate each area:

□ Authentication
  ├── How are users/services identified?
  ├── Token lifecycle (issuance, validation, revocation, rotation)?
  └── MFA requirements?

□ Authorization
  ├── Where are access decisions made (gateway, service, DB)?
  ├── RBAC vs ABAC vs ReBAC?
  └── Least privilege verification?

□ Network Security
  ├── Trust boundaries identified?
  ├── East-west encryption (mTLS)?
  ├── Egress filtering?
  └── Microsegmentation?

□ Data Protection
  ├── Encryption at rest and in transit?
  ├── Key management (rotation, access)?
  ├── Data classification applied?
  └── PII handling (GDPR Art. 25 Privacy by Design)?

□ Input Handling
  ├── Validation at every trust boundary?
  ├── Serialization security (XXE, deserialization)?
  └── File upload controls?

□ Resilience
  ├── Rate limiting at multiple layers?
  ├── Circuit breakers for downstream dependencies?
  ├── Timeout hierarchy (caller > callee)?
  ├── Graceful degradation plan?
  └── Resource limits (CPU, memory, connections)?

□ Observability
  ├── Security-relevant log coverage?
  ├── Correlation IDs across services?
  ├── Alerting on auth failures, policy violations?
  └── Audit trail for sensitive operations?

□ Supply Chain
  ├── Dependency scanning in CI/CD?
  ├── Image signing and verification?
  ├── SBOM generation?
  └── Third-party resource integrity (SRI)?

□ Multi-Tenancy (if applicable)
  ├── Isolation model chosen and justified?
  ├── Tenant context propagation?
  ├── Cross-tenant access testing?
  └── Noisy neighbor prevention?

□ Blast Radius
  ├── What does compromise of component X give the attacker?
  ├── Can lateral movement be contained?
  ├── Are secrets scoped to minimum necessary?
  └── Is there a kill switch for compromised components?

Threat Modeling Integration

Every architecture decision should be validated through STRIDE analysis:

Threat	Question
Spoofing	Can an attacker impersonate a user or service?
Tampering	Can data be modified in transit or at rest?
Repudiation	Can actions be denied without audit evidence?
Information Disclosure	What data leaks on compromise of each component?
Denial of Service	What happens under load or resource exhaustion?
Elevation of Privilege	Can a low-privilege actor escalate?

Map each finding to mitigations, owners, and implementation status. Prioritize by blast radius and exploitability.

Summary: The Principal Architect's Mental Model

Security architecture is not a checklist bolted onto a design. It is a set of constraints that shape the design from inception:

Identity is the perimeter — network location grants nothing; every request proves identity.
Every boundary validates — gateway, service, database each enforce independently.
Blast radius drives topology — segment by damage potential, not by convenience.
Resilience is security — DoS, cascading failure, and resource exhaustion are attack vectors.
Observability enables defense — you cannot defend what you cannot see.
Least privilege is not optional — default-deny at every layer, justify every permission.
Assume breach, design for containment — the question is not "if" but "when" and "how far."

Security Architecture Design Patterns — Deep Dive

Principal-level reference for defense-in-depth, zero trust, microservices security, database hardening, browser security policies, protocol-specific controls, rate limiting, circuit breakers, and multi-tenant isolation.

Sources: OWASP Cheat Sheet Series (Microservices, Database, Docker, Kubernetes, SSRF, DoS, CSP, GraphQL, WebSocket, XML, CSS, Race Conditions), Google Cloud Security Foundations Blueprint.

Defense-in-Depth Patterns
Zero Trust Network Architecture
Microservices Security Architecture
Service Mesh and mTLS
API Gateway Security
Database Security Architecture
Container Security (Docker)
Kubernetes Security Architecture
CSP, CORS, and Same-Origin Policy
WebSocket Security
GraphQL Security Controls
XML and Serialization Security
SSRF Prevention Architecture
Rate Limiting Architecture
Circuit Breaker and Resilience Patterns
Race Condition Defense
Secure Multi-Tenant Design
CSS Security
Architectural Decision Framework

1. Defense-in-Depth Patterns

The Layered Model

┌─────────────────────────────────────────────┐
│  LAYER 7: DATA        encryption at rest,   │
│                       field-level encryption,│
│                       tokenization, masking  │
├─────────────────────────────────────────────┤
│  LAYER 6: APPLICATION input validation, CSP,│
│                       auth/authz, WAF       │
├─────────────────────────────────────────────┤
│  LAYER 5: RUNTIME     containers, seccomp,  │
│                       AppArmor, sandboxing   │
├─────────────────────────────────────────────┤
│  LAYER 4: HOST        OS hardening, patching,│
│                       EDR, auditd           │
├─────────────────────────────────────────────┤
│  LAYER 3: NETWORK     segmentation, NACLs,  │
│                       IDS/IPS, mTLS         │
├─────────────────────────────────────────────┤
│  LAYER 2: IDENTITY    MFA, SSO, RBAC,       │
│                       least privilege        │
├─────────────────────────────────────────────┤
│  LAYER 1: PHYSICAL    datacenter security,   │
│                       HSMs, secure boot      │
└─────────────────────────────────────────────┘

Three Control Types (Google Cloud Blueprint Model)

Policy Controls — Programmatic constraints that enforce acceptable resource configurations. Prevent risky setups through infrastructure-as-code validation and organization policy constraints before deployment.
Architecture Controls — Resource configuration based on security best practices: network topology, resource hierarchy, blast radius containment.
Detective Controls — Anomaly detection, log aggregation, threat detection services, SIEM integration, custom enforcement.

Principles

Assume breach: design every layer as if the attacker already has a foothold in the adjacent layer.
Independent failure domains: a control at layer N must not depend on layer N-1 being intact.
Validation ordering: perform cheap validations (format, size, type) before expensive ones (database lookups, crypto operations).
No security theater: every control must measurably reduce risk. Call out controls that create illusion without substance.

2. Zero Trust Network Architecture

Zero trust eliminates implicit trust based on network location. Every request is authenticated, authorized, and encrypted regardless of origin.

Core Tenets

Principle	Implementation
Never trust, always verify	Every service-to-service call carries verifiable identity
Least privilege access	RBAC/ABAC with just-in-time elevation, time-bounded tokens
Assume breach	Microsegmentation limits blast radius; east-west traffic encrypted
Verify explicitly	Context-aware access: identity + device + location + behavior
Continuous validation	Session re-validation at intervals; token refresh with short TTL

Network Architecture Pattern

                        ┌─────────────┐
  User ──── Identity ───┤  Policy     │
  Device    Provider    │  Decision   │
  Context               │  Point      │
                        └──────┬──────┘
                               │ allow/deny
                        ┌──────▼──────┐
                        │  Policy     │
                        │  Enforcement│
                        │  Point      │
                        └──────┬──────┘
                               │ mTLS
                    ┌──────────┼──────────┐
                    ▼          ▼          ▼
               ┌────────┐ ┌────────┐ ┌────────┐
               │Service │ │Service │ │Service │
               │   A    │ │   B    │ │   C    │
               └────────┘ └────────┘ └────────┘

Google Cloud Blueprint Implementation

No public internet access by default: no outbound or inbound traffic to/from public internet permitted unless explicitly allowed.
Shared VPC: centralized network resource management across regions/zones with environment separation by network topology.
Private paths enforced: all on-premises and cloud resource communication over private interconnects.
GitOps model: all infrastructure changes through version-controlled, reviewed Terraform with policy-as-code validation in CI/CD pipeline before deployment.

Microsegmentation

Segment by workload, not by network subnet. Each workload gets its own identity.
Network policies (K8s) or security groups (cloud) restrict east-west traffic to explicit allow rules.
Default-deny posture: all traffic blocked unless a policy explicitly permits it.

3. Microservices Security Architecture

Authorization Layers

Authorization enforcement must occur at three independent layers:

Gateway/Proxy — Coarse-grained, cross-cutting decisions (authentication, basic role checks, rate limiting).
Microservice Layer — Shared libraries or sidecar proxies for fine-grained policy enforcement. Centralized policy with embedded Policy Decision Point (PDP) is recommended.
Business Logic — Service-specific authorization that understands domain context.

Authorization Patterns

Pattern	Description	Trade-offs
Decentralized	Policy embedded in service code	Independent but inconsistent; requires code changes for policy updates
Centralized Single PDP	Remote policy service evaluates all requests	Consistent but introduces latency and availability risk
Centralized Embedded PDP	Policy defined centrally, deployed as library/sidecar	Best of both: consistent policy, low latency, no external dependency at runtime

Netflix pattern: Policy Portal (authoring) -> Repository (storage) -> Aggregator (compilation) -> Distributor (deployment to sidecars).

Identity Propagation

Recommended: Trusted Issuer-Signed Structures

Edge services authenticate external tokens (OAuth2, OIDC), then mint internally-signed identity structures (e.g., Netflix "Passport"). This approach:

Decouples external tokens from internal representations
Uses single, extensible data structures
Never exposes internal structures externally
Is external access token agnostic

Anti-pattern: Passing raw external tokens between internal services. This creates tight coupling and risks privilege escalation through token manipulation.

Security Architecture Documentation

For each microservice, document:

Unique service name/ID, business process, API definitions with security schemes
Service-to-storage access types (read/write)
Synchronous service-to-service calls (protocol, data exchanged)
Asynchronous communications (publisher/subscriber via message queues)
Data asset classification (PII, confidential, public)
Trust boundary justifications

Logging Architecture

Service stdout/stderr ──► Local File ──► Logging Agent ──► Message Broker ──► Central Logging
                              │                │                  │
                              │                │                  ├─ Mutual TLS
                              │                ├─ Data sanitization│
                              │                │  (strip PII,     ├─ Least-privilege
                              │                │   passwords,     │   access policies
                              ├─ Prevents      │   API keys)     │
                              │  data loss     │                  │
                              │  on failure    ├─ Asynchronous    │
                                               │  (prevents DoS   │
                                               │   of log system) │

Requirements:

Correlation IDs for cross-service call tracing
Structured format (JSON) with contextual metadata (hostname, container, class)
Sanitization: never send PII, passwords, or API keys to central logging

4. Service Mesh and mTLS

Mutual TLS (mTLS)

Each microservice uses public/private key pairs for bidirectional authentication, providing:

Confidentiality: encrypted channel between services
Integrity: tamper detection
Authentication: cryptographic identity verification

Operational challenges:

Key provisioning and trust bootstrap (initial certificate distribution)
Certificate revocation (CRL/OCSP infrastructure)
Key rotation (automated renewal before expiry)
Certificate authority management (dedicated internal CA)

Service Mesh Benefits

Capability	Security Value
Automatic mTLS	Encryption and authentication without application code changes
Telemetry/tracing	Generates security-relevant metrics and distributed traces
Ingress/egress control	Traffic monitoring and policy enforcement at mesh boundary
Fine-grained RBAC	Service-level access control via mesh policies
Traffic shaping	Rate limiting, circuit breaking, retries with backoff

Service Mesh Trade-offs

Increases architectural complexity
Requires expertise in both K8s and mesh technology (Istio, Linkerd, Consul Connect)
Performance impact from sidecar proxy overhead (typically 1-3ms latency per hop)
Debugging becomes harder with proxy-mediated traffic

Token-Based Service Authentication (Alternative to mTLS)

Mode	Use Case	Trade-off
Online validation	Centralized token service validates each request	Detects revoked tokens immediately; higher latency
Offline validation	Services validate using downloaded public keys (JWKS)	Lower latency; cannot detect revoked tokens in real-time

5. API Gateway Security

Gateway as Security Perimeter

The API gateway centralizes:

Authentication (OAuth2/OIDC token validation)
Coarse-grained authorization (role/scope checks)
Rate limiting and throttling
Request/response transformation and validation
TLS termination
Logging and correlation ID injection

Gateway Limitations

Single point of decision: violates defense-in-depth if relied upon exclusively.
Scalability constraint: complex ecosystems with numerous roles become difficult to manage at the edge alone.
Operational bottleneck: development teams cannot independently modify authorization rules.

Mitigation Pattern

Implement mutual authentication to prevent gateway bypass and direct internal service access.
Layer authorization at gateway AND service AND business logic levels.
Use the gateway for cross-cutting concerns only; push domain-specific authorization to services.

6. Database Security Architecture

Network Isolation

┌──────────────────────────────────┐
│  DMZ / Application Tier          │
│  ┌──────────┐  ┌──────────┐     │
│  │  App A   │  │  App B   │     │
│  └────┬─────┘  └────┬─────┘     │
│       │              │           │
├───────┼──────────────┼───────────┤  ◄── Firewall
│  Database Tier                   │
│  ┌──────────┐  ┌──────────┐     │
│  │  DB A    │  │  DB B    │     │
│  └──────────┘  └──────────┘     │
└──────────────────────────────────┘

Disable TCP access where possible; require local socket or named pipe.
If TCP needed, bind to localhost or restrict via firewall to specific application hosts only.
Database servers in separate network segments from application tier.
Web-based management tools (phpMyAdmin, pgAdmin) require authentication, HTTPS, and network-level access controls.

Authentication and Access Control

Mandatory authentication for all connections, including local access.
Strong, unique passwords per database account.
Single-application or service-specific accounts (never shared credentials).
Never use default accounts (root, sa, SYS, SYSTEM) for application access.
No administrative rights for application accounts.
Host-based connection restrictions (connect only from designated app servers).
Environment-specific databases and accounts (dev, staging, prod never share credentials).

Least Privilege in Practice

Permission Level	Pattern
Minimal	SELECT, UPDATE, DELETE only (no DDL)
Table-level	Grant access to specific tables only
Column-level	Restrict sensitive columns (SSN, credit card)
Row-level	Row-level security policies filter by tenant/role
View-based	Access through restricted views rather than base tables
No DB links	Avoid database links unless absolutely necessary

Credential Management

Credentials stored outside web root in configuration files with restricted file permissions.
Excluded from source code repositories.
Encrypted using platform features (ASP.NET protected configuration, Vault, AWS Secrets Manager).
Regular credential rotation; immediate rotation on staff changes.

Transport Security

Enforce encrypted connections exclusively (reject plaintext).
Deploy trusted certificates on database servers.
Require TLSv1.2+ with modern ciphers (AES-GCM, ChaCha20).
Client-side certificate validation.

Hardening Checklist

Apply security patches promptly.
Run database service under low-privileged OS account.
Remove default accounts and sample databases.
Transaction logs on separate storage from data files.
Regular encrypted backups with restricted access permissions.
SQL Server: disable xp_cmdshell, xp_dirtree, CLR execution, SQL Browser, Mixed Mode Auth.
MySQL/MariaDB: run mysql_secure_installation; disable FILE privilege.

7. Container Security (Docker)

Defense-in-Depth Stack for Containers

Layer 1: Image Security
  ├── Pin specific versions (no floating tags)
  ├── Minimal base images (distroless, scratch)
  ├── CI/CD image scanning (Trivy, Snyk, Docker Scout)
  ├── SBOM generation
  ├── Image signing (Notary/Cosign)
  └── Private registries with access controls

Layer 2: Runtime Isolation
  ├── Non-root user (USER directive, runAsUser)
  ├── no-new-privileges flag
  ├── Drop all capabilities, add only needed
  ├── Never use --privileged
  └── Read-only root filesystem + tmpfs for temp

Layer 3: Kernel Security
  ├── Seccomp profiles (start from Docker default, customize)
  ├── AppArmor/SELinux mandatory access control
  └── Behavioral monitoring (Falco, Tetragon, Cilium eBPF)

Layer 4: Network Security
  ├── Custom Docker networks (explicit connectivity)
  ├── K8s NetworkPolicies for east-west traffic
  └── No exposed daemon sockets

Layer 5: Resource Limits
  ├── Memory limits (-m 512m)
  ├── CPU limits (--cpus="0.5")
  ├── File descriptor limits (--ulimit nofile=1024)
  ├── Process limits (--ulimit nproc=256)
  └── Restart policy (--restart=on-failure:3)

Layer 6: Secrets Management
  ├── Docker Secrets (Swarm) or external vault
  ├── Never bake secrets into images
  └── K8s: enable etcd encryption or use external KMS

Critical Anti-Patterns

Never expose /var/run/docker.sock to containers (container escape vector).
Never use TCP daemon socket without TLS mutual authentication.
Never use --privileged (grants all kernel capabilities).
Never use floating image tags in production (supply chain risk).
Never store secrets in environment variables in K8s (visible via API, logged in crash dumps).

Rootless Mode

Docker daemon and containers run as unprivileged user. If container escape occurs, attacker lands as unprivileged host user. Different from userns-remap (which remaps UIDs while daemon runs as root).

Alternative: Podman

Daemonless architecture using fork-exec model eliminates central daemon as single point of compromise. Native rootless support and SELinux integration provide OCI-compliant security defaults.

8. Kubernetes Security Architecture

Multi-Layer Security Model

┌─────────────────────────────────────────────┐
│  CLUSTER LEVEL                              │
│  ├── API Server hardening (OIDC, no static  │
│  │   tokens, Node+RBAC authorization)       │
│  ├── etcd encryption + mTLS + isolation     │
│  ├── Admission controllers (PSA, OPA,       │
│  │   Kyverno, ImagePolicyWebhook)           │
│  └── Audit logging (Metadata/Request level) │
├─────────────────────────────────────────────┤
│  NAMESPACE LEVEL                            │
│  ├── RBAC (deny-by-default, minimal verbs)  │
│  ├── Resource quotas (CPU, memory, pods)    │
│  ├── NetworkPolicies (default-deny ingress  │
│  │   and egress per namespace)              │
│  └── Pod Security Standards (restricted)    │
├─────────────────────────────────────────────┤
│  POD LEVEL                                  │
│  ├── SecurityContext (runAsNonRoot,         │
│  │   readOnlyRootFilesystem,               │
│  │   allowPrivilegeEscalation: false)       │
│  ├── Capability dropping (drop ALL)         │
│  ├── Service account with minimal RBAC      │
│  └── Image from signed, scanned registry    │
├─────────────────────────────────────────────┤
│  RUNTIME LEVEL                              │
│  ├── Falco/Tetragon behavioral monitoring   │
│  ├── Container sandboxing (gVisor, Kata)    │
│  └── Continuous vulnerability scanning      │
└─────────────────────────────────────────────┘

Pod Security Standards

Level	Posture	Use Case
Privileged	Unrestricted	System workloads (CNI, storage drivers) only
Baseline	Prevents known privilege escalations	General workloads
Restricted	Maximum hardening	Sensitive workloads, multi-tenant

Applied via namespace labels: pod-security.kubernetes.io/enforce: restricted Three modes: enforce (blocks), audit (logs), warn (alerts).

etcd Security

etcd stores all cluster state and secrets. Write access to etcd = root on entire cluster.

mTLS between API servers and etcd (dedicated CA).
Firewall isolation: only API servers can reach etcd.
etcd ACLs to limit keyspace access per component.
Consider separate etcd instances for different components.

API Server Authentication

Recommended: OIDC for short-lived tokens and centralized group management, or managed provider IAM (GKE, EKS, AKS).

Avoid: Static token files (no rotation), X509 client certs (no revocation), service account tokens for user auth (cluster-scoped, no expiry by default).

Container Sandboxing

For untrusted workloads, add isolation beyond Linux namespaces:

Technology	Mechanism	Overhead
gVisor	User-space kernel in Go, ~70% syscall coverage, uses ~20 host syscalls	Low-moderate
Kata Containers	Stripped-down VM per pod	Moderate
Firecracker	Micro-VM with seccomp + cgroup + namespace	Low

Kubelet Security

Kubelets expose HTTPS endpoints with powerful node/container control:

Enable authentication and authorization (disable anonymous access).
Restrict API access to trusted networks.
Monitor port 10250 (Kubelet API) for unauthorized access attempts.

9. CSP, CORS, and Same-Origin Policy

Same-Origin Policy (SOP)

The browser's foundational security boundary. Two URLs have the same origin if protocol, host, and port all match. SOP prevents scripts from one origin reading responses from another origin.

URL A	URL B	Same Origin?
https://a.com/page	https://a.com/other	Yes
https://a.com	http://a.com	No (protocol)
https://a.com	https://a.com:8443	No (port)
https://a.com	https://b.a.com	No (host)

Content Security Policy (CSP)

CSP is a defense-in-depth layer against XSS. It does not replace secure coding; it mitigates exploitation when output encoding fails.

Directive Categories

Category	Directives	Purpose
Fetch	script-src, style-src, img-src, connect-src, font-src, object-src, default-src	Control resource loading origins
Document	base-uri, sandbox, plugin-types	Restrict document properties
Navigation	form-action, frame-ancestors	Restrict navigation and framing
Reporting	report-to, report-uri	Violation reporting

Strict CSP (Recommended)

Nonce-based (preferred for server-rendered):

Content-Security-Policy:
  script-src 'nonce-{RANDOM}' 'strict-dynamic';
  object-src 'none';
  base-uri 'none';

Hash-based (for static pages):

Content-Security-Policy:
  script-src 'sha256-{HASH}' 'strict-dynamic';
  object-src 'none';
  base-uri 'none';

Key rules:

Generate unique nonce per HTTP response (cryptographically random).
Never create middleware that auto-injects nonces into all script tags (attacker-injected scripts would get nonces too).
strict-dynamic allows dynamically-created scripts from trusted scripts, reducing annotation burden.
object-src 'none' blocks plugin-based XSS vectors (Flash, Java).
base-uri 'none' prevents base tag injection for relative URL hijacking.

Deployment Strategy

Deploy in Content-Security-Policy-Report-Only mode first.
Monitor violation reports via report-to endpoint.
Refactor inline scripts to external files or add nonces.
Convert inline event handlers to addEventListener.
Switch to enforcing mode.

Additional CSP Protections

frame-ancestors 'none' — prevents clickjacking (supersedes X-Frame-Options).
upgrade-insecure-requests — forces HTTPS for mixed content.
form-action 'self' — prevents form hijacking to external endpoints.

CORS Security

CORS relaxes SOP in a controlled manner. Misconfigurations create same-origin-equivalent access for attackers.

Critical CORS Rules

Never reflect the Origin header as Access-Control-Allow-Origin — this is equivalent to a wildcard with credentials.
Never use wildcard * with credentials — browsers reject Access-Control-Allow-Origin: * when Access-Control-Allow-Credentials: true.
Validate Origin against a strict allowlist — exact string match, not substring or regex that can be bypassed.
Minimize exposed headers — only expose headers the client genuinely needs.
Set Vary: Origin — prevents cache poisoning when responses differ by origin.

Common CORS Misconfigurations

Misconfiguration	Risk
Reflecting Origin header verbatim	Any site can read authenticated responses
Origin: null in allowlist	Sandboxed iframes and data: URIs get access
Substring matching (e.g., `endsWith('.example.com')`)	`attacker-example.com` bypasses
Regex without anchoring	`example.com.evil.com` bypasses
Wildcard with credentials	Browser blocks but indicates design flaw

Secure CORS Pattern

ALLOWED_ORIGINS = {"https://app.example.com", "https://admin.example.com"}

def cors_middleware(request, response):
    origin = request.headers.get("Origin")
    if origin in ALLOWED_ORIGINS:
        response.headers["Access-Control-Allow-Origin"] = origin
        response.headers["Vary"] = "Origin"
        response.headers["Access-Control-Allow-Credentials"] = "true"
        response.headers["Access-Control-Allow-Methods"] = "GET, POST"
        response.headers["Access-Control-Allow-Headers"] = "Content-Type, Authorization"
        response.headers["Access-Control-Max-Age"] = "7200"
    # If origin not in allowlist: no CORS headers = browser blocks

10. WebSocket Security

Transport Security

Always use wss:// in production. Never use unencrypted ws://.
Support only RFC 6455. Disable legacy protocol versions (Hixie-76, hybi-00) with known vulnerabilities.
Disable permessage-deflate by default to prevent CRIME/BREACH-style compression side-channel attacks.

Cross-Site WebSocket Hijacking (CSWSH) Prevention

Browsers automatically include session cookies in WebSocket handshakes, enabling attackers on malicious sites to hijack authenticated connections.

Defenses:

Validate Origin header on every handshake against explicit allowlist (never blacklist, never wildcard).
Apply SameSite=Lax or SameSite=Strict cookies.
Use token-based authentication (query string or post-connection message) instead of relying solely on cookies.
Rotate tokens in long-lived connections to prevent hijacked session persistence.

Message-Level Security

Treat all WebSocket messages as untrusted input.
JSON schema validation with allowlists for message types/fields.
Binary file type verification via magic numbers (not headers).
Message size limits (typically 64KB maximum).
Nonce/timestamp inclusion to prevent replay attacks.
Use JSON.parse(), never eval().

Per-Action Authorization

Connection establishment does not grant blanket access. Validate user roles and permissions before processing each message/action independently.

DoS Mitigation

Control	Recommended Baseline
Per-user connection limit	5-10 concurrent connections
Message rate limit	100 messages/minute
Max payload size	64KB (configurable per use case)
Idle timeout	Close inactive connections
Backpressure	Flow control preventing unbounded buffering
Heartbeat	Ping/pong frames detecting and cleaning dead connections

Logging

Capture: connection/termination events with user identity and origin, auth outcomes, authz failures, protocol violations. Exclude: tokens, session IDs, message payloads containing sensitive data.

11. GraphQL Security Controls

GraphQL's flexibility creates unique attack surface compared to REST.

Query Abuse Prevention

Control	Purpose	Tools
Depth limiting	Prevent deeply nested queries causing recursive resolution	graphql-depth-limit (JS), MaxQueryDepthInstrumentation (Java)
Complexity analysis	Assign cost to field resolution, reject expensive queries	graphql-cost-analysis (JS), Apollo complexity plugins
Timeout per resolver	Prevent individual resolvers from hanging	10-second default
Pagination enforcement	Prevent unbounded list queries	Require `first`/`last` arguments

Schema Exposure Controls

Disable introspection in production to prevent schema reconnaissance.
Disable "Did you mean?" suggestions (leaks field names even with introspection off).
Field visibility middleware for role-based schema exposure (different roles see different schema subsets).

Authorization Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Gateway    │────▶│   Resolver   │────▶│  Data Layer  │
│  Auth Check  │     │  RBAC Check  │     │  Row-Level   │
│  (identity)  │     │  (field-lvl) │     │  Security    │
└──────────────┘     └──────────────┘     └──────────────┘

Validate authorization on both graph edges AND nodes.
Implement checks within Query/Mutation resolvers using RBAC middleware.
Prevent IDOR by verifying caller permissions before data access (especially for direct ID-based lookups).
Use GraphQL Interfaces and Unions to return different object shapes based on requester privileges.

Batching Attack Mitigation

GraphQL allows multiple queries in a single HTTP request, bypassing per-request rate limits:

Object-level rate limiting: track per-caller object requests across batches.
Sensitive field protection: prevent batching for usernames, emails, OTPs, session tokens.
Operation throttling: limit concurrent queries per request (e.g., max 5 operations per batch).

Persisted Queries

Pre-approve query strings at deployment time. Clients send query hash instead of arbitrary query text. Eliminates arbitrary query execution, batching abuse, and query injection risks.

Input Validation

Enforce allowlisting via GraphQL scalars, enums, and custom validators.
Define input schemas for all mutations.
Use parameterized queries/ORMs in resolvers (never string concatenation).
Disable dynamic resolver targeting to prevent SSRF/command injection.

12. XML and Serialization Security

XXE Prevention

XXE (XML External Entity) attacks exploit parser features to read files, perform SSRF, or cause DoS.

Universal defense: disable external entity processing entirely in parser configuration.

# Python (defusedxml)
import defusedxml.ElementTree as ET
tree = ET.parse(source)  # XXE-safe by default

# Java (DocumentBuilderFactory)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", True)
factory.setFeature("http://xml.org/sax/features/external-general-entities", False)
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", False)

XML Bomb Prevention

Attack	Mechanism	Defense
Billion Laughs	Exponential entity nesting (recursive references)	Entity expansion limits, depth restrictions
Quadratic Blowup	Large entity referenced repeatedly (O(n^2) expansion)	Entity size limits
Recursive References	Circular entity definitions	Recursion depth limits

Schema Hardening

Use XML Schema (XSD), not DTD, for validation.
Set maxOccurs boundaries (never unbounded without testing).
Use precise types: positiveInteger not integer, decimal not float/double (prevents Infinity/NaN).
Apply maxLength, minLength, pattern restrictions on strings.
Enumerate allowed values where possible.

Schema Poisoning Defense

Embed schemas with integrity verification (don't fetch remotely at runtime).
Restrict file permissions on local schema/DTD files.
If remote schemas needed, use HTTPS only, maintain local copies, verify integrity.

General Serialization Security

Reject DTDs entirely (SOAP specification forbids them).
Validate document well-formedness before processing.
Set resource limits: document size, element count, nesting depth.
Avoid disclosing internal paths in error messages.

13. SSRF Prevention Architecture

Defense-in-Depth: Application + Network Layers

┌─────────────────────────────────────┐
│  APPLICATION LAYER                  │
│  ├── Input validation (reject URLs) │
│  ├── IP/domain allowlisting         │
│  ├── DNS rebinding prevention       │
│  └── URL scheme restriction         │
├─────────────────────────────────────┤
│  NETWORK LAYER                      │
│  ├── Firewall egress filtering      │
│  ├── Network segmentation           │
│  └── Cloud metadata protection      │
└─────────────────────────────────────┘

Application Layer Controls

Rule 1: Never accept complete URLs from users. URLs are difficult to validate and parsers can be abused. Accept only validated IP addresses or domain names.

IP validation:

Validate format using language-specific libraries (not regex).
Cross-reference against allowlist of trusted IPs (both IPv4 and IPv6).
Use validated library output as comparison baseline to prevent encoding bypasses.

Domain validation:

Validate format without performing DNS resolution.
Maintain allowlist of trusted domains.
Monitor DNS records to detect resolution to non-public IP ranges.

Deny-list minimums (when allowlisting not possible):

AWS IMDS: 169.254.169.254, fd00:ec2::254
Localhost: 127.0.0.0/8, ::1/128
RFC1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Link-local: 169.254.0.0/16
Multicast: 224.0.0.0/4

Network Layer Controls

Restrict outbound application access via host-based or network firewalls to only legitimate routes.
Network compartmentalization to block illegitimate calls at infrastructure level.
Disable HTTP redirect following to prevent validation bypass.

Cloud Metadata Protection

Migrate from IMDSv1 to IMDSv2 (AWS) as defense-in-depth. IMDSv2 requires a session token obtained via PUT request, which SSRF attacks cannot easily replicate.

DNS Rebinding Prevention

Resolve domains against internal DNS resolvers only.
Retrieve all A and AAAA records, validate each IP against private ranges.
Monitor allowlisted domains for resolution changes to non-public addresses.
Pin DNS resolution results (use the resolved IP, not the hostname, for the actual request).

14. Rate Limiting Architecture

Multi-Layer Rate Limiting

┌──────────────────────────────────────────┐
│  EDGE (CDN/WAF/Load Balancer)            │
│  ├── Per-IP rate limits                  │
│  ├── Geographic filtering                │
│  ├── Volumetric DDoS mitigation          │
│  └── Connection rate limits              │
├──────────────────────────────────────────┤
│  API GATEWAY                             │
│  ├── Per-user/API-key rate limits        │
│  ├── Per-endpoint rate limits            │
│  ├── Request size limits                 │
│  └── Concurrent connection limits        │
├──────────────────────────────────────────┤
│  APPLICATION                             │
│  ├── Per-operation rate limits            │
│  ├── Resource-specific throttling        │
│  ├── Business logic rate limits          │
│  └── Object-level rate limits (GraphQL)  │
├──────────────────────────────────────────┤
│  DATABASE                                │
│  ├── Connection pool limits              │
│  ├── Query timeout limits                │
│  └── Transaction timeout limits          │
└──────────────────────────────────────────┘

Rate Limiting Algorithms

Algorithm	Behavior	Use Case
Token Bucket	Steady refill rate, burst allowed up to bucket size	API rate limiting (most common)
Leaky Bucket	Fixed drain rate, excess dropped	Smoothing bursty traffic
Fixed Window	Counter resets at interval boundary	Simple per-minute/hour limits
Sliding Window Log	Precise per-request timestamp tracking	High-accuracy rate limiting
Sliding Window Counter	Weighted average of current and previous window	Balance of accuracy and performance

Slow HTTP Attack Defense

Define minimum ingress data rate limit; drop connections below that rate (counters Slowloris, Slow POST).
Absolute connection timeouts (not just idle timeouts).
Maximum request header size and body size limits.
Total concurrent connection limits per client IP.

DoS Resilience Patterns

Validation ordering: perform cheap checks (format, size) before expensive ones (database, crypto).
Authentication gating: require authentication before allowing access to resource-intensive operations.
Graceful degradation: maintain reduced functionality rather than complete failure.
Static resource separation: host images, scripts, CSS on separate domains/CDN.
Caching: serve cached responses for repeat requests.
Asynchronous processing: use queues for CPU-intensive operations; return 202 Accepted.

15. Circuit Breaker and Resilience Patterns

Circuit Breaker Pattern

Prevents cascading failures when downstream services degrade.

            ┌──────────────┐
  Request ──┤   CLOSED     │──── Forward to service
            │  (normal)    │
            └──────┬───────┘
                   │ failure threshold exceeded
            ┌──────▼───────┐
            │    OPEN      │──── Return fallback/error immediately
            │  (tripped)   │     (no request forwarded)
            └──────┬───────┘
                   │ timeout expires
            ┌──────▼───────┐
            │  HALF-OPEN   │──── Forward limited probe requests
            │  (testing)   │     Success → CLOSED
            └──────────────┘     Failure → OPEN

Configuration Parameters

Parameter	Description	Typical Value
Failure threshold	Errors before opening	5-10 failures in 60s
Timeout	Time in OPEN before probing	30-60 seconds
Success threshold	Successes in HALF-OPEN to close	3-5 consecutive
Monitoring window	Rolling window for failure counting	60 seconds

Bulkhead Pattern

Isolate failures to prevent resource exhaustion across the entire system:

Separate thread pools per downstream dependency.
Separate connection pools per service.
Resource quotas per tenant/customer.
Namespace isolation in K8s (CPU/memory quotas per namespace).

Retry with Backoff

Attempt 1: immediate
Attempt 2: wait 1s + jitter
Attempt 3: wait 2s + jitter
Attempt 4: wait 4s + jitter
(cap at max backoff, e.g., 30s)

Always add random jitter to prevent thundering herd.
Set maximum retry count (3-5 typically).
Only retry on transient failures (5xx, timeouts), never on 4xx.
Combine with circuit breaker: when circuit opens, stop retrying.

Timeout Hierarchy

Client timeout > Gateway timeout > Service timeout > DB timeout
     30s              15s              10s             5s

Each layer's timeout must be shorter than its caller's to prevent zombie connections.

16. Race Condition Defense

TOCTOU (Time-of-Check-to-Time-of-Use)

The classic pattern: a resource is checked for a condition, then used based on that check, but the resource changes between check and use.

Thread A: check(balance >= 100)     → true
Thread B: check(balance >= 100)     → true
Thread A: debit(100)                → balance = 0
Thread B: debit(100)                → balance = -100  ← RACE

Defense Patterns

Pattern	Mechanism	Use Case
Pessimistic locking	`SELECT ... FOR UPDATE` acquires row lock before read	Financial transactions, inventory
Optimistic locking	Version column; `UPDATE ... WHERE version = N` fails if concurrent modification	Low-contention scenarios
Atomic operations	`UPDATE balance = balance - 100 WHERE balance >= 100` (check + modify in single statement)	Simple counter/balance operations
Database constraints	`CHECK (balance >= 0)` enforced at DB level	Invariant enforcement
Idempotency keys	Client-generated unique key per operation; server rejects duplicates	Payment processing, API mutations
Serializable isolation	`SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`	Highest consistency requirement
Mutex/advisory locks	`pg_advisory_lock(key)` or application-level mutex	Cross-table consistency

Idempotency Pattern

Client: POST /payments {idempotency_key: "abc-123", amount: 100}

Server:
  1. Check idempotency_key in store
  2. If exists: return cached response (no re-execution)
  3. If not: execute, store result keyed by idempotency_key, return

Keys should expire after reasonable window (24-48 hours).
Store both request hash and response to detect parameter tampering.
Use database unique constraint on idempotency key for atomicity.

Distributed Systems Considerations

Redis SETNX (SET if Not eXists) for distributed locks with TTL.
Redlock algorithm for fault-tolerant distributed locking across multiple Redis instances.
Database-level locking preferred over application-level when possible (closer to the data, harder to bypass).
Event sourcing: append-only log eliminates update races entirely.

17. Secure Multi-Tenant Design

Isolation Models

Strongest ──────────────────────────────────── Weakest
   │                                              │
   ▼                                              ▼
Separate     Separate      Shared Infra,     Shared
Infrastructure  Namespaces/   Separate DB/      Everything
(per tenant)    VPCs          Schema            (row-level)

Model	Isolation	Cost	Complexity	Use Case
Separate infrastructure	Highest	Highest	Moderate	Regulated industries, government
Separate namespaces/VPCs	High	High	High	Enterprise SaaS
Shared infra, separate DB/schema	Medium	Medium	Medium	Standard SaaS
Shared everything (row-level)	Lowest	Lowest	Low	Consumer apps, cost-sensitive

Kubernetes Multi-Tenant Pattern

┌─────────────────────────────────────────┐
│  Cluster                                │
│  ┌─────────────────────────────────────┐│
│  │  Namespace: tenant-a               ││
│  │  ├── ResourceQuota (4 pods, 2 CPU) ││
│  │  ├── NetworkPolicy (deny all       ││
│  │  │   cross-namespace)              ││
│  │  ├── Pod Security: restricted      ││
│  │  └── RBAC: tenant-a-role           ││
│  └─────────────────────────────────────┘│
│  ┌─────────────────────────────────────┐│
│  │  Namespace: tenant-b               ││
│  │  ├── ResourceQuota (4 pods, 2 CPU) ││
│  │  ├── NetworkPolicy (deny all       ││
│  │  │   cross-namespace)              ││
│  │  ├── Pod Security: restricted      ││
│  │  └── RBAC: tenant-b-role           ││
│  └─────────────────────────────────────┘│
└─────────────────────────────────────────┘

Cross-Cutting Multi-Tenant Controls

Layer	Control	Purpose
Identity	Tenant context in JWT claims	Every request carries tenant identity
API Gateway	Tenant-aware rate limiting	Per-tenant quotas prevent noisy neighbor
Application	Tenant filter on all queries	Prevent cross-tenant data access
Database	Row-level security policies	DB-enforced tenant isolation
Storage	Tenant-prefixed object keys + IAM	Prevent cross-tenant storage access
Encryption	Per-tenant encryption keys	Cryptographic isolation of data
Logging	Tenant ID in all log entries	Tenant-scoped audit trails
Network	Namespace/VPC isolation	Network-level blast radius containment

Noisy Neighbor Prevention

Per-tenant resource quotas (CPU, memory, storage, API calls).
Per-tenant connection pool limits to shared databases.
Per-tenant queue depth limits for async processing.
Circuit breakers per tenant: if one tenant causes excessive errors, isolate them without affecting others.
Fair scheduling: weighted round-robin or priority queues preventing any single tenant from monopolizing shared resources.

Data Isolation Verification

Automated tests that attempt cross-tenant data access (should fail).
SQL query audit: every query touching tenant data must include tenant filter (static analysis or query interceptor).
Penetration testing specifically targeting tenant boundary bypass (IDOR, parameter tampering, JWT manipulation).

18. CSS Security

Attack Surface

Reconnaissance via CSS selectors: descriptive class names (.addUser, .deleteUser, .adminPanel) reveal application features to unauthenticated attackers examining global CSS files.
CSS injection: if attacker-controlled content enters stylesheets, it can enable data exfiltration via background-image URLs, clickjacking via element repositioning, and UI redressing.
Third-party stylesheet risk: externally hosted CSS can be modified to inject malicious styles.

Defenses

Role-based CSS isolation: segregate stylesheets by access level. Server-side access controls on CSS file delivery. Log suspicious CSS file access.
CSS obfuscation: replace descriptive selectors with generated names using CSS Modules, JSS (minify option), or build-time obfuscation. Use framework classes (Bootstrap, Tailwind) to reduce custom selectors.
CSP for styles: style-src 'self' or style-src 'nonce-{RANDOM}' to prevent inline style injection and restrict stylesheet sources.
Subresource Integrity (SRI): <link rel="stylesheet" href="..." integrity="sha256-..." crossorigin="anonymous"> for third-party stylesheets.

19. Architectural Decision Framework

Security Architecture Review Checklist

When designing or reviewing a system, evaluate each area:

□ Authentication
  ├── How are users/services identified?
  ├── Token lifecycle (issuance, validation, revocation, rotation)?
  └── MFA requirements?

□ Authorization
  ├── Where are access decisions made (gateway, service, DB)?
  ├── RBAC vs ABAC vs ReBAC?
  └── Least privilege verification?

□ Network Security
  ├── Trust boundaries identified?
  ├── East-west encryption (mTLS)?
  ├── Egress filtering?
  └── Microsegmentation?

□ Data Protection
  ├── Encryption at rest and in transit?
  ├── Key management (rotation, access)?
  ├── Data classification applied?
  └── PII handling (GDPR Art. 25 Privacy by Design)?

□ Input Handling
  ├── Validation at every trust boundary?
  ├── Serialization security (XXE, deserialization)?
  └── File upload controls?

□ Resilience
  ├── Rate limiting at multiple layers?
  ├── Circuit breakers for downstream dependencies?
  ├── Timeout hierarchy (caller > callee)?
  ├── Graceful degradation plan?
  └── Resource limits (CPU, memory, connections)?

□ Observability
  ├── Security-relevant log coverage?
  ├── Correlation IDs across services?
  ├── Alerting on auth failures, policy violations?
  └── Audit trail for sensitive operations?

□ Supply Chain
  ├── Dependency scanning in CI/CD?
  ├── Image signing and verification?
  ├── SBOM generation?
  └── Third-party resource integrity (SRI)?

□ Multi-Tenancy (if applicable)
  ├── Isolation model chosen and justified?
  ├── Tenant context propagation?
  ├── Cross-tenant access testing?
  └── Noisy neighbor prevention?

□ Blast Radius
  ├── What does compromise of component X give the attacker?
  ├── Can lateral movement be contained?
  ├── Are secrets scoped to minimum necessary?
  └── Is there a kill switch for compromised components?

Threat Modeling Integration

Every architecture decision should be validated through STRIDE analysis:

Threat	Question
Spoofing	Can an attacker impersonate a user or service?
Tampering	Can data be modified in transit or at rest?
Repudiation	Can actions be denied without audit evidence?
Information Disclosure	What data leaks on compromise of each component?
Denial of Service	What happens under load or resource exhaustion?
Elevation of Privilege	Can a low-privilege actor escalate?

Map each finding to mitigations, owners, and implementation status. Prioritize by blast radius and exploitability.

Summary: The Principal Architect's Mental Model

Security architecture is not a checklist bolted onto a design. It is a set of constraints that shape the design from inception:

Identity is the perimeter — network location grants nothing; every request proves identity.
Every boundary validates — gateway, service, database each enforce independently.
Blast radius drives topology — segment by damage potential, not by convenience.
Resilience is security — DoS, cascading failure, and resource exhaustion are attack vectors.
Observability enables defense — you cannot defend what you cannot see.
Least privilege is not optional — default-deny at every layer, justify every permission.
Assume breach, design for containment — the question is not "if" but "when" and "how far."

Security Architecture Design Patterns — Deep Dive

Security Architecture Design Patterns — Deep Dive

Table of Contents

1. Defense-in-Depth Patterns

The Layered Model

Three Control Types (Google Cloud Blueprint Model)

Principles

2. Zero Trust Network Architecture

Core Tenets

Network Architecture Pattern

Google Cloud Blueprint Implementation

Microsegmentation

3. Microservices Security Architecture

Authorization Layers

Authorization Patterns

Identity Propagation

Security Architecture Documentation

Logging Architecture

4. Service Mesh and mTLS

Mutual TLS (mTLS)

Service Mesh Benefits

Service Mesh Trade-offs

Token-Based Service Authentication (Alternative to mTLS)

5. API Gateway Security

Gateway as Security Perimeter

Gateway Limitations

Mitigation Pattern

6. Database Security Architecture

Network Isolation

Authentication and Access Control

Least Privilege in Practice

Credential Management

Transport Security

Hardening Checklist

7. Container Security (Docker)

Defense-in-Depth Stack for Containers

Critical Anti-Patterns

Rootless Mode

Alternative: Podman

8. Kubernetes Security Architecture

Multi-Layer Security Model

Pod Security Standards

etcd Security

API Server Authentication

Container Sandboxing

Kubelet Security

9. CSP, CORS, and Same-Origin Policy

Same-Origin Policy (SOP)

Content Security Policy (CSP)

Directive Categories

Strict CSP (Recommended)

Deployment Strategy

Additional CSP Protections

CORS Security

Critical CORS Rules

Common CORS Misconfigurations

Secure CORS Pattern

10. WebSocket Security

Transport Security

Cross-Site WebSocket Hijacking (CSWSH) Prevention

Message-Level Security

Per-Action Authorization

DoS Mitigation

Logging

11. GraphQL Security Controls

Query Abuse Prevention

Schema Exposure Controls

Authorization Architecture

Batching Attack Mitigation

Persisted Queries

Input Validation

12. XML and Serialization Security

XXE Prevention

XML Bomb Prevention

Schema Hardening

Schema Poisoning Defense

General Serialization Security

13. SSRF Prevention Architecture

Defense-in-Depth: Application + Network Layers

Application Layer Controls