Security Architecture Design Patterns — Deep Dive
Security Architecture Design Patterns — Deep Dive
Principal-level reference for defense-in-depth, zero trust, microservices security, database hardening, browser security policies, protocol-specific controls, rate limiting, circuit breakers, and multi-tenant isolation.
Sources: OWASP Cheat Sheet Series (Microservices, Database, Docker, Kubernetes, SSRF, DoS, CSP, GraphQL, WebSocket, XML, CSS, Race Conditions), Google Cloud Security Foundations Blueprint.
Table of Contents
- Defense-in-Depth Patterns
- Zero Trust Network Architecture
- Microservices Security Architecture
- Service Mesh and mTLS
- API Gateway Security
- Database Security Architecture
- Container Security (Docker)
- Kubernetes Security Architecture
- CSP, CORS, and Same-Origin Policy
- WebSocket Security
- GraphQL Security Controls
- XML and Serialization Security
- SSRF Prevention Architecture
- Rate Limiting Architecture
- Circuit Breaker and Resilience Patterns
- Race Condition Defense
- Secure Multi-Tenant Design
- CSS Security
- Architectural Decision Framework
1. Defense-in-Depth Patterns
Defense-in-depth layers independent security controls so that failure of any single layer does not compromise the system. Each layer operates under the assumption that all other layers have already been breached.
The Layered Model
┌─────────────────────────────────────────────┐
│ LAYER 7: DATA encryption at rest, │
│ field-level encryption,│
│ tokenization, masking │
├─────────────────────────────────────────────┤
│ LAYER 6: APPLICATION input validation, CSP,│
│ auth/authz, WAF │
├─────────────────────────────────────────────┤
│ LAYER 5: RUNTIME containers, seccomp, │
│ AppArmor, sandboxing │
├─────────────────────────────────────────────┤
│ LAYER 4: HOST OS hardening, patching,│
│ EDR, auditd │
├─────────────────────────────────────────────┤
│ LAYER 3: NETWORK segmentation, NACLs, │
│ IDS/IPS, mTLS │
├─────────────────────────────────────────────┤
│ LAYER 2: IDENTITY MFA, SSO, RBAC, │
│ least privilege │
├─────────────────────────────────────────────┤
│ LAYER 1: PHYSICAL datacenter security, │
│ HSMs, secure boot │
└─────────────────────────────────────────────┘
Three Control Types (Google Cloud Blueprint Model)
- Policy Controls — Programmatic constraints that enforce acceptable resource configurations. Prevent risky setups through infrastructure-as-code validation and organization policy constraints before deployment.
- Architecture Controls — Resource configuration based on security best practices: network topology, resource hierarchy, blast radius containment.
- Detective Controls — Anomaly detection, log aggregation, threat detection services, SIEM integration, custom enforcement.
Principles
- Assume breach: design every layer as if the attacker already has a foothold in the adjacent layer.
- Independent failure domains: a control at layer N must not depend on layer N-1 being intact.
- Validation ordering: perform cheap validations (format, size, type) before expensive ones (database lookups, crypto operations).
- No security theater: every control must measurably reduce risk. Call out controls that create illusion without substance.
2. Zero Trust Network Architecture
Zero trust eliminates implicit trust based on network location. Every request is authenticated, authorized, and encrypted regardless of origin.
Core Tenets
| Principle | Implementation |
|---|---|
| Never trust, always verify | Every service-to-service call carries verifiable identity |
| Least privilege access | RBAC/ABAC with just-in-time elevation, time-bounded tokens |
| Assume breach | Microsegmentation limits blast radius; east-west traffic encrypted |
| Verify explicitly | Context-aware access: identity + device + location + behavior |
| Continuous validation | Session re-validation at intervals; token refresh with short TTL |
Network Architecture Pattern
┌─────────────┐
User ──── Identity ───┤ Policy │
Device Provider │ Decision │
Context │ Point │
└──────┬──────┘
│ allow/deny
┌──────▼──────┐
│ Policy │
│ Enforcement│
│ Point │
└──────┬──────┘
│ mTLS
┌──────────┼──────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Service │ │Service │ │Service │
│ A │ │ B │ │ C │
└────────┘ └────────┘ └────────┘
Google Cloud Blueprint Implementation
- No public internet access by default: no outbound or inbound traffic to/from public internet permitted unless explicitly allowed.
- Shared VPC: centralized network resource management across regions/zones with environment separation by network topology.
- Private paths enforced: all on-premises and cloud resource communication over private interconnects.
- GitOps model: all infrastructure changes through version-controlled, reviewed Terraform with policy-as-code validation in CI/CD pipeline before deployment.
Microsegmentation
- Segment by workload, not by network subnet. Each workload gets its own identity.
- Network policies (K8s) or security groups (cloud) restrict east-west traffic to explicit allow rules.
- Default-deny posture: all traffic blocked unless a policy explicitly permits it.
3. Microservices Security Architecture
Authorization Layers
Authorization enforcement must occur at three independent layers:
- Gateway/Proxy — Coarse-grained, cross-cutting decisions (authentication, basic role checks, rate limiting).
- Microservice Layer — Shared libraries or sidecar proxies for fine-grained policy enforcement. Centralized policy with embedded Policy Decision Point (PDP) is recommended.
- Business Logic — Service-specific authorization that understands domain context.
Authorization Patterns
| Pattern | Description | Trade-offs |
|---|---|---|
| Decentralized | Policy embedded in service code | Independent but inconsistent; requires code changes for policy updates |
| Centralized Single PDP | Remote policy service evaluates all requests | Consistent but introduces latency and availability risk |
| Centralized Embedded PDP | Policy defined centrally, deployed as library/sidecar | Best of both: consistent policy, low latency, no external dependency at runtime |
Netflix pattern: Policy Portal (authoring) -> Repository (storage) -> Aggregator (compilation) -> Distributor (deployment to sidecars).
Identity Propagation
Recommended: Trusted Issuer-Signed Structures
Edge services authenticate external tokens (OAuth2, OIDC), then mint internally-signed identity structures (e.g., Netflix "Passport"). This approach:
- Decouples external tokens from internal representations
- Uses single, extensible data structures
- Never exposes internal structures externally
- Is external access token agnostic
Anti-pattern: Passing raw external tokens between internal services. This creates tight coupling and risks privilege escalation through token manipulation.
Security Architecture Documentation
For each microservice, document:
- Unique service name/ID, business process, API definitions with security schemes
- Service-to-storage access types (read/write)
- Synchronous service-to-service calls (protocol, data exchanged)
- Asynchronous communications (publisher/subscriber via message queues)
- Data asset classification (PII, confidential, public)
- Trust boundary justifications
Logging Architecture
Service stdout/stderr ──► Local File ──► Logging Agent ──► Message Broker ──► Central Logging
│ │ │
│ │ ├─ Mutual TLS
│ ├─ Data sanitization│
│ │ (strip PII, ├─ Least-privilege
│ │ passwords, │ access policies
├─ Prevents │ API keys) │
│ data loss │ │
│ on failure ├─ Asynchronous │
│ (prevents DoS │
│ of log system) │
Requirements:
- Correlation IDs for cross-service call tracing
- Structured format (JSON) with contextual metadata (hostname, container, class)
- Sanitization: never send PII, passwords, or API keys to central logging
4. Service Mesh and mTLS
Mutual TLS (mTLS)
Each microservice uses public/private key pairs for bidirectional authentication, providing:
- Confidentiality: encrypted channel between services
- Integrity: tamper detection
- Authentication: cryptographic identity verification
Operational challenges:
- Key provisioning and trust bootstrap (initial certificate distribution)
- Certificate revocation (CRL/OCSP infrastructure)
- Key rotation (automated renewal before expiry)
- Certificate authority management (dedicated internal CA)
Service Mesh Benefits
| Capability | Security Value |
|---|---|
| Automatic mTLS | Encryption and authentication without application code changes |
| Telemetry/tracing | Generates security-relevant metrics and distributed traces |
| Ingress/egress control | Traffic monitoring and policy enforcement at mesh boundary |
| Fine-grained RBAC | Service-level access control via mesh policies |
| Traffic shaping | Rate limiting, circuit breaking, retries with backoff |
Service Mesh Trade-offs
- Increases architectural complexity
- Requires expertise in both K8s and mesh technology (Istio, Linkerd, Consul Connect)
- Performance impact from sidecar proxy overhead (typically 1-3ms latency per hop)
- Debugging becomes harder with proxy-mediated traffic
Token-Based Service Authentication (Alternative to mTLS)
| Mode | Use Case | Trade-off |
|---|---|---|
| Online validation | Centralized token service validates each request | Detects revoked tokens immediately; higher latency |
| Offline validation | Services validate using downloaded public keys (JWKS) | Lower latency; cannot detect revoked tokens in real-time |
5. API Gateway Security
Gateway as Security Perimeter
The API gateway centralizes:
- Authentication (OAuth2/OIDC token validation)
- Coarse-grained authorization (role/scope checks)
- Rate limiting and throttling
- Request/response transformation and validation
- TLS termination
- Logging and correlation ID injection
Gateway Limitations
- Single point of decision: violates defense-in-depth if relied upon exclusively.
- Scalability constraint: complex ecosystems with numerous roles become difficult to manage at the edge alone.
- Operational bottleneck: development teams cannot independently modify authorization rules.
Mitigation Pattern
- Implement mutual authentication to prevent gateway bypass and direct internal service access.
- Layer authorization at gateway AND service AND business logic levels.
- Use the gateway for cross-cutting concerns only; push domain-specific authorization to services.
6. Database Security Architecture
Network Isolation
┌──────────────────────────────────┐
│ DMZ / Application Tier │
│ ┌──────────┐ ┌──────────┐ │
│ │ App A │ │ App B │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
├───────┼──────────────┼───────────┤ ◄── Firewall
│ Database Tier │
│ ┌──────────┐ ┌──────────┐ │
│ │ DB A │ │ DB B │ │
│ └──────────┘ └──────────┘ │
└──────────────────────────────────┘
- Disable TCP access where possible; require local socket or named pipe.
- If TCP needed, bind to localhost or restrict via firewall to specific application hosts only.
- Database servers in separate network segments from application tier.
- Web-based management tools (phpMyAdmin, pgAdmin) require authentication, HTTPS, and network-level access controls.
Authentication and Access Control
- Mandatory authentication for all connections, including local access.
- Strong, unique passwords per database account.
- Single-application or service-specific accounts (never shared credentials).
- Never use default accounts (root, sa, SYS, SYSTEM) for application access.
- No administrative rights for application accounts.
- Host-based connection restrictions (connect only from designated app servers).
- Environment-specific databases and accounts (dev, staging, prod never share credentials).
Least Privilege in Practice
| Permission Level | Pattern |
|---|---|
| Minimal | SELECT, UPDATE, DELETE only (no DDL) |
| Table-level | Grant access to specific tables only |
| Column-level | Restrict sensitive columns (SSN, credit card) |
| Row-level | Row-level security policies filter by tenant/role |
| View-based | Access through restricted views rather than base tables |
| No DB links | Avoid database links unless absolutely necessary |
Credential Management
- Credentials stored outside web root in configuration files with restricted file permissions.
- Excluded from source code repositories.
- Encrypted using platform features (ASP.NET protected configuration, Vault, AWS Secrets Manager).
- Regular credential rotation; immediate rotation on staff changes.
Transport Security
- Enforce encrypted connections exclusively (reject plaintext).
- Deploy trusted certificates on database servers.
- Require TLSv1.2+ with modern ciphers (AES-GCM, ChaCha20).
- Client-side certificate validation.
Hardening Checklist
- Apply security patches promptly.
- Run database service under low-privileged OS account.
- Remove default accounts and sample databases.
- Transaction logs on separate storage from data files.
- Regular encrypted backups with restricted access permissions.
- SQL Server: disable xp_cmdshell, xp_dirtree, CLR execution, SQL Browser, Mixed Mode Auth.
- MySQL/MariaDB: run mysql_secure_installation; disable FILE privilege.
7. Container Security (Docker)
Defense-in-Depth Stack for Containers
Layer 1: Image Security
├── Pin specific versions (no floating tags)
├── Minimal base images (distroless, scratch)
├── CI/CD image scanning (Trivy, Snyk, Docker Scout)
├── SBOM generation
├── Image signing (Notary/Cosign)
└── Private registries with access controls
Layer 2: Runtime Isolation
├── Non-root user (USER directive, runAsUser)
├── no-new-privileges flag
├── Drop all capabilities, add only needed
├── Never use --privileged
└── Read-only root filesystem + tmpfs for temp
Layer 3: Kernel Security
├── Seccomp profiles (start from Docker default, customize)
├── AppArmor/SELinux mandatory access control
└── Behavioral monitoring (Falco, Tetragon, Cilium eBPF)
Layer 4: Network Security
├── Custom Docker networks (explicit connectivity)
├── K8s NetworkPolicies for east-west traffic
└── No exposed daemon sockets
Layer 5: Resource Limits
├── Memory limits (-m 512m)
├── CPU limits (--cpus="0.5")
├── File descriptor limits (--ulimit nofile=1024)
├── Process limits (--ulimit nproc=256)
└── Restart policy (--restart=on-failure:3)
Layer 6: Secrets Management
├── Docker Secrets (Swarm) or external vault
├── Never bake secrets into images
└── K8s: enable etcd encryption or use external KMS
Critical Anti-Patterns
- Never expose
/var/run/docker.sockto containers (container escape vector). - Never use TCP daemon socket without TLS mutual authentication.
- Never use
--privileged(grants all kernel capabilities). - Never use floating image tags in production (supply chain risk).
- Never store secrets in environment variables in K8s (visible via API, logged in crash dumps).
Rootless Mode
Docker daemon and containers run as unprivileged user. If container escape occurs, attacker lands as unprivileged host user. Different from userns-remap (which remaps UIDs while daemon runs as root).
Alternative: Podman
Daemonless architecture using fork-exec model eliminates central daemon as single point of compromise. Native rootless support and SELinux integration provide OCI-compliant security defaults.
8. Kubernetes Security Architecture
Multi-Layer Security Model
┌─────────────────────────────────────────────┐
│ CLUSTER LEVEL │
│ ├── API Server hardening (OIDC, no static │
│ │ tokens, Node+RBAC authorization) │
│ ├── etcd encryption + mTLS + isolation │
│ ├── Admission controllers (PSA, OPA, │
│ │ Kyverno, ImagePolicyWebhook) │
│ └── Audit logging (Metadata/Request level) │
├─────────────────────────────────────────────┤
│ NAMESPACE LEVEL │
│ ├── RBAC (deny-by-default, minimal verbs) │
│ ├── Resource quotas (CPU, memory, pods) │
│ ├── NetworkPolicies (default-deny ingress │
│ │ and egress per namespace) │
│ └── Pod Security Standards (restricted) │
├─────────────────────────────────────────────┤
│ POD LEVEL │
│ ├── SecurityContext (runAsNonRoot, │
│ │ readOnlyRootFilesystem, │
│ │ allowPrivilegeEscalation: false) │
│ ├── Capability dropping (drop ALL) │
│ ├── Service account with minimal RBAC │
│ └── Image from signed, scanned registry │
├─────────────────────────────────────────────┤
│ RUNTIME LEVEL │
│ ├── Falco/Tetragon behavioral monitoring │
│ ├── Container sandboxing (gVisor, Kata) │
│ └── Continuous vulnerability scanning │
└─────────────────────────────────────────────┘
Pod Security Standards
| Level | Posture | Use Case |
|---|---|---|
| Privileged | Unrestricted | System workloads (CNI, storage drivers) only |
| Baseline | Prevents known privilege escalations | General workloads |
| Restricted | Maximum hardening | Sensitive workloads, multi-tenant |
Applied via namespace labels: pod-security.kubernetes.io/enforce: restricted
Three modes: enforce (blocks), audit (logs), warn (alerts).
etcd Security
etcd stores all cluster state and secrets. Write access to etcd = root on entire cluster.
- mTLS between API servers and etcd (dedicated CA).
- Firewall isolation: only API servers can reach etcd.
- etcd ACLs to limit keyspace access per component.
- Consider separate etcd instances for different components.
API Server Authentication
Recommended: OIDC for short-lived tokens and centralized group management, or managed provider IAM (GKE, EKS, AKS).
Avoid: Static token files (no rotation), X509 client certs (no revocation), service account tokens for user auth (cluster-scoped, no expiry by default).
Container Sandboxing
For untrusted workloads, add isolation beyond Linux namespaces:
| Technology | Mechanism | Overhead |
|---|---|---|
| gVisor | User-space kernel in Go, ~70% syscall coverage, uses ~20 host syscalls | Low-moderate |
| Kata Containers | Stripped-down VM per pod | Moderate |
| Firecracker | Micro-VM with seccomp + cgroup + namespace | Low |
Kubelet Security
Kubelets expose HTTPS endpoints with powerful node/container control:
- Enable authentication and authorization (disable anonymous access).
- Restrict API access to trusted networks.
- Monitor port 10250 (Kubelet API) for unauthorized access attempts.
9. CSP, CORS, and Same-Origin Policy
Same-Origin Policy (SOP)
The browser's foundational security boundary. Two URLs have the same origin if protocol, host, and port all match. SOP prevents scripts from one origin reading responses from another origin.
| URL A | URL B | Same Origin? |
|---|---|---|
| https://a.com/page | https://a.com/other | Yes |
| https://a.com | http://a.com | No (protocol) |
| https://a.com | https://a.com:8443 | No (port) |
| https://a.com | https://b.a.com | No (host) |
Content Security Policy (CSP)
CSP is a defense-in-depth layer against XSS. It does not replace secure coding; it mitigates exploitation when output encoding fails.
Directive Categories
| Category | Directives | Purpose |
|---|---|---|
| Fetch | script-src, style-src, img-src, connect-src, font-src, object-src, default-src | Control resource loading origins |
| Document | base-uri, sandbox, plugin-types | Restrict document properties |
| Navigation | form-action, frame-ancestors | Restrict navigation and framing |
| Reporting | report-to, report-uri | Violation reporting |
Strict CSP (Recommended)
Nonce-based (preferred for server-rendered):
Content-Security-Policy:
script-src 'nonce-{RANDOM}' 'strict-dynamic';
object-src 'none';
base-uri 'none';
Hash-based (for static pages):
Content-Security-Policy:
script-src 'sha256-{HASH}' 'strict-dynamic';
object-src 'none';
base-uri 'none';
Key rules:
- Generate unique nonce per HTTP response (cryptographically random).
- Never create middleware that auto-injects nonces into all script tags (attacker-injected scripts would get nonces too).
strict-dynamicallows dynamically-created scripts from trusted scripts, reducing annotation burden.object-src 'none'blocks plugin-based XSS vectors (Flash, Java).base-uri 'none'prevents base tag injection for relative URL hijacking.
Deployment Strategy
- Deploy in
Content-Security-Policy-Report-Onlymode first. - Monitor violation reports via
report-toendpoint. - Refactor inline scripts to external files or add nonces.
- Convert inline event handlers to addEventListener.
- Switch to enforcing mode.
Additional CSP Protections
frame-ancestors 'none'— prevents clickjacking (supersedes X-Frame-Options).upgrade-insecure-requests— forces HTTPS for mixed content.form-action 'self'— prevents form hijacking to external endpoints.
CORS Security
CORS relaxes SOP in a controlled manner. Misconfigurations create same-origin-equivalent access for attackers.
Critical CORS Rules
- Never reflect the Origin header as Access-Control-Allow-Origin — this is equivalent to a wildcard with credentials.
- Never use wildcard
*with credentials — browsers rejectAccess-Control-Allow-Origin: *whenAccess-Control-Allow-Credentials: true. - Validate Origin against a strict allowlist — exact string match, not substring or regex that can be bypassed.
- Minimize exposed headers — only expose headers the client genuinely needs.
- Set
Vary: Origin— prevents cache poisoning when responses differ by origin.
Common CORS Misconfigurations
| Misconfiguration | Risk |
|---|---|
| Reflecting Origin header verbatim | Any site can read authenticated responses |
| Origin: null in allowlist | Sandboxed iframes and data: URIs get access |
Substring matching (e.g., endsWith('.example.com')) |
attacker-example.com bypasses |
| Regex without anchoring | example.com.evil.com bypasses |
| Wildcard with credentials | Browser blocks but indicates design flaw |
Secure CORS Pattern
ALLOWED_ORIGINS = {"https://app.example.com", "https://admin.example.com"}
def cors_middleware(request, response):
origin = request.headers.get("Origin")
if origin in ALLOWED_ORIGINS:
response.headers["Access-Control-Allow-Origin"] = origin
response.headers["Vary"] = "Origin"
response.headers["Access-Control-Allow-Credentials"] = "true"
response.headers["Access-Control-Allow-Methods"] = "GET, POST"
response.headers["Access-Control-Allow-Headers"] = "Content-Type, Authorization"
response.headers["Access-Control-Max-Age"] = "7200"
# If origin not in allowlist: no CORS headers = browser blocks
10. WebSocket Security
Transport Security
- Always use
wss://in production. Never use unencryptedws://. - Support only RFC 6455. Disable legacy protocol versions (Hixie-76, hybi-00) with known vulnerabilities.
- Disable
permessage-deflateby default to prevent CRIME/BREACH-style compression side-channel attacks.
Cross-Site WebSocket Hijacking (CSWSH) Prevention
Browsers automatically include session cookies in WebSocket handshakes, enabling attackers on malicious sites to hijack authenticated connections.
Defenses:
- Validate
Originheader on every handshake against explicit allowlist (never blacklist, never wildcard). - Apply
SameSite=LaxorSameSite=Strictcookies. - Use token-based authentication (query string or post-connection message) instead of relying solely on cookies.
- Rotate tokens in long-lived connections to prevent hijacked session persistence.
Message-Level Security
- Treat all WebSocket messages as untrusted input.
- JSON schema validation with allowlists for message types/fields.
- Binary file type verification via magic numbers (not headers).
- Message size limits (typically 64KB maximum).
- Nonce/timestamp inclusion to prevent replay attacks.
- Use
JSON.parse(), nevereval().
Per-Action Authorization
Connection establishment does not grant blanket access. Validate user roles and permissions before processing each message/action independently.
DoS Mitigation
| Control | Recommended Baseline |
|---|---|
| Per-user connection limit | 5-10 concurrent connections |
| Message rate limit | 100 messages/minute |
| Max payload size | 64KB (configurable per use case) |
| Idle timeout | Close inactive connections |
| Backpressure | Flow control preventing unbounded buffering |
| Heartbeat | Ping/pong frames detecting and cleaning dead connections |
Logging
Capture: connection/termination events with user identity and origin, auth outcomes, authz failures, protocol violations. Exclude: tokens, session IDs, message payloads containing sensitive data.
11. GraphQL Security Controls
GraphQL's flexibility creates unique attack surface compared to REST.
Query Abuse Prevention
| Control | Purpose | Tools |
|---|---|---|
| Depth limiting | Prevent deeply nested queries causing recursive resolution | graphql-depth-limit (JS), MaxQueryDepthInstrumentation (Java) |
| Complexity analysis | Assign cost to field resolution, reject expensive queries | graphql-cost-analysis (JS), Apollo complexity plugins |
| Timeout per resolver | Prevent individual resolvers from hanging | 10-second default |
| Pagination enforcement | Prevent unbounded list queries | Require first/last arguments |
Schema Exposure Controls
- Disable introspection in production to prevent schema reconnaissance.
- Disable "Did you mean?" suggestions (leaks field names even with introspection off).
- Field visibility middleware for role-based schema exposure (different roles see different schema subsets).
Authorization Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Gateway │────▶│ Resolver │────▶│ Data Layer │
│ Auth Check │ │ RBAC Check │ │ Row-Level │
│ (identity) │ │ (field-lvl) │ │ Security │
└──────────────┘ └──────────────┘ └──────────────┘
- Validate authorization on both graph edges AND nodes.
- Implement checks within Query/Mutation resolvers using RBAC middleware.
- Prevent IDOR by verifying caller permissions before data access (especially for direct ID-based lookups).
- Use GraphQL Interfaces and Unions to return different object shapes based on requester privileges.
Batching Attack Mitigation
GraphQL allows multiple queries in a single HTTP request, bypassing per-request rate limits:
- Object-level rate limiting: track per-caller object requests across batches.
- Sensitive field protection: prevent batching for usernames, emails, OTPs, session tokens.
- Operation throttling: limit concurrent queries per request (e.g., max 5 operations per batch).
Persisted Queries
Pre-approve query strings at deployment time. Clients send query hash instead of arbitrary query text. Eliminates arbitrary query execution, batching abuse, and query injection risks.
Input Validation
- Enforce allowlisting via GraphQL scalars, enums, and custom validators.
- Define input schemas for all mutations.
- Use parameterized queries/ORMs in resolvers (never string concatenation).
- Disable dynamic resolver targeting to prevent SSRF/command injection.
12. XML and Serialization Security
XXE Prevention
XXE (XML External Entity) attacks exploit parser features to read files, perform SSRF, or cause DoS.
Universal defense: disable external entity processing entirely in parser configuration.
# Python (defusedxml)
import defusedxml.ElementTree as ET
tree = ET.parse(source) # XXE-safe by default
# Java (DocumentBuilderFactory)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", True)
factory.setFeature("http://xml.org/sax/features/external-general-entities", False)
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", False)
XML Bomb Prevention
| Attack | Mechanism | Defense |
|---|---|---|
| Billion Laughs | Exponential entity nesting (recursive references) | Entity expansion limits, depth restrictions |
| Quadratic Blowup | Large entity referenced repeatedly (O(n^2) expansion) | Entity size limits |
| Recursive References | Circular entity definitions | Recursion depth limits |
Schema Hardening
- Use XML Schema (XSD), not DTD, for validation.
- Set
maxOccursboundaries (neverunboundedwithout testing). - Use precise types:
positiveIntegernotinteger,decimalnotfloat/double(prevents Infinity/NaN). - Apply
maxLength,minLength,patternrestrictions on strings. - Enumerate allowed values where possible.
Schema Poisoning Defense
- Embed schemas with integrity verification (don't fetch remotely at runtime).
- Restrict file permissions on local schema/DTD files.
- If remote schemas needed, use HTTPS only, maintain local copies, verify integrity.
General Serialization Security
- Reject DTDs entirely (SOAP specification forbids them).
- Validate document well-formedness before processing.
- Set resource limits: document size, element count, nesting depth.
- Avoid disclosing internal paths in error messages.
13. SSRF Prevention Architecture
Defense-in-Depth: Application + Network Layers
┌─────────────────────────────────────┐
│ APPLICATION LAYER │
│ ├── Input validation (reject URLs) │
│ ├── IP/domain allowlisting │
│ ├── DNS rebinding prevention │
│ └── URL scheme restriction │
├─────────────────────────────────────┤
│ NETWORK LAYER │
│ ├── Firewall egress filtering │
│ ├── Network segmentation │
│ └── Cloud metadata protection │
└─────────────────────────────────────┘
Application Layer Controls
Rule 1: Never accept complete URLs from users. URLs are difficult to validate and parsers can be abused. Accept only validated IP addresses or domain names.
IP validation:
- Validate format using language-specific libraries (not regex).
- Cross-reference against allowlist of trusted IPs (both IPv4 and IPv6).
- Use validated library output as comparison baseline to prevent encoding bypasses.
Domain validation:
- Validate format without performing DNS resolution.
- Maintain allowlist of trusted domains.
- Monitor DNS records to detect resolution to non-public IP ranges.
Deny-list minimums (when allowlisting not possible):
- AWS IMDS:
169.254.169.254,fd00:ec2::254 - Localhost:
127.0.0.0/8,::1/128 - RFC1918:
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 - Link-local:
169.254.0.0/16 - Multicast:
224.0.0.0/4
Network Layer Controls
- Restrict outbound application access via host-based or network firewalls to only legitimate routes.
- Network compartmentalization to block illegitimate calls at infrastructure level.
- Disable HTTP redirect following to prevent validation bypass.
Cloud Metadata Protection
Migrate from IMDSv1 to IMDSv2 (AWS) as defense-in-depth. IMDSv2 requires a session token obtained via PUT request, which SSRF attacks cannot easily replicate.
DNS Rebinding Prevention
- Resolve domains against internal DNS resolvers only.
- Retrieve all A and AAAA records, validate each IP against private ranges.
- Monitor allowlisted domains for resolution changes to non-public addresses.
- Pin DNS resolution results (use the resolved IP, not the hostname, for the actual request).
14. Rate Limiting Architecture
Multi-Layer Rate Limiting
┌──────────────────────────────────────────┐
│ EDGE (CDN/WAF/Load Balancer) │
│ ├── Per-IP rate limits │
│ ├── Geographic filtering │
│ ├── Volumetric DDoS mitigation │
│ └── Connection rate limits │
├──────────────────────────────────────────┤
│ API GATEWAY │
│ ├── Per-user/API-key rate limits │
│ ├── Per-endpoint rate limits │
│ ├── Request size limits │
│ └── Concurrent connection limits │
├──────────────────────────────────────────┤
│ APPLICATION │
│ ├── Per-operation rate limits │
│ ├── Resource-specific throttling │
│ ├── Business logic rate limits │
│ └── Object-level rate limits (GraphQL) │
├──────────────────────────────────────────┤
│ DATABASE │
│ ├── Connection pool limits │
│ ├── Query timeout limits │
│ └── Transaction timeout limits │
└──────────────────────────────────────────┘
Rate Limiting Algorithms
| Algorithm | Behavior | Use Case |
|---|---|---|
| Token Bucket | Steady refill rate, burst allowed up to bucket size | API rate limiting (most common) |
| Leaky Bucket | Fixed drain rate, excess dropped | Smoothing bursty traffic |
| Fixed Window | Counter resets at interval boundary | Simple per-minute/hour limits |
| Sliding Window Log | Precise per-request timestamp tracking | High-accuracy rate limiting |
| Sliding Window Counter | Weighted average of current and previous window | Balance of accuracy and performance |
Slow HTTP Attack Defense
- Define minimum ingress data rate limit; drop connections below that rate (counters Slowloris, Slow POST).
- Absolute connection timeouts (not just idle timeouts).
- Maximum request header size and body size limits.
- Total concurrent connection limits per client IP.
DoS Resilience Patterns
- Validation ordering: perform cheap checks (format, size) before expensive ones (database, crypto).
- Authentication gating: require authentication before allowing access to resource-intensive operations.
- Graceful degradation: maintain reduced functionality rather than complete failure.
- Static resource separation: host images, scripts, CSS on separate domains/CDN.
- Caching: serve cached responses for repeat requests.
- Asynchronous processing: use queues for CPU-intensive operations; return 202 Accepted.
15. Circuit Breaker and Resilience Patterns
Circuit Breaker Pattern
Prevents cascading failures when downstream services degrade.
┌──────────────┐
Request ──┤ CLOSED │──── Forward to service
│ (normal) │
└──────┬───────┘
│ failure threshold exceeded
┌──────▼───────┐
│ OPEN │──── Return fallback/error immediately
│ (tripped) │ (no request forwarded)
└──────┬───────┘
│ timeout expires
┌──────▼───────┐
│ HALF-OPEN │──── Forward limited probe requests
│ (testing) │ Success → CLOSED
└──────────────┘ Failure → OPEN
Configuration Parameters
| Parameter | Description | Typical Value |
|---|---|---|
| Failure threshold | Errors before opening | 5-10 failures in 60s |
| Timeout | Time in OPEN before probing | 30-60 seconds |
| Success threshold | Successes in HALF-OPEN to close | 3-5 consecutive |
| Monitoring window | Rolling window for failure counting | 60 seconds |
Bulkhead Pattern
Isolate failures to prevent resource exhaustion across the entire system:
- Separate thread pools per downstream dependency.
- Separate connection pools per service.
- Resource quotas per tenant/customer.
- Namespace isolation in K8s (CPU/memory quotas per namespace).
Retry with Backoff
Attempt 1: immediate
Attempt 2: wait 1s + jitter
Attempt 3: wait 2s + jitter
Attempt 4: wait 4s + jitter
(cap at max backoff, e.g., 30s)
- Always add random jitter to prevent thundering herd.
- Set maximum retry count (3-5 typically).
- Only retry on transient failures (5xx, timeouts), never on 4xx.
- Combine with circuit breaker: when circuit opens, stop retrying.
Timeout Hierarchy
Client timeout > Gateway timeout > Service timeout > DB timeout
30s 15s 10s 5s
Each layer's timeout must be shorter than its caller's to prevent zombie connections.
16. Race Condition Defense
TOCTOU (Time-of-Check-to-Time-of-Use)
The classic pattern: a resource is checked for a condition, then used based on that check, but the resource changes between check and use.
Thread A: check(balance >= 100) → true
Thread B: check(balance >= 100) → true
Thread A: debit(100) → balance = 0
Thread B: debit(100) → balance = -100 ← RACE
Defense Patterns
| Pattern | Mechanism | Use Case |
|---|---|---|
| Pessimistic locking | SELECT ... FOR UPDATE acquires row lock before read |
Financial transactions, inventory |
| Optimistic locking | Version column; UPDATE ... WHERE version = N fails if concurrent modification |
Low-contention scenarios |
| Atomic operations | UPDATE balance = balance - 100 WHERE balance >= 100 (check + modify in single statement) |
Simple counter/balance operations |
| Database constraints | CHECK (balance >= 0) enforced at DB level |
Invariant enforcement |
| Idempotency keys | Client-generated unique key per operation; server rejects duplicates | Payment processing, API mutations |
| Serializable isolation | SET TRANSACTION ISOLATION LEVEL SERIALIZABLE |
Highest consistency requirement |
| Mutex/advisory locks | pg_advisory_lock(key) or application-level mutex |
Cross-table consistency |
Idempotency Pattern
Client: POST /payments {idempotency_key: "abc-123", amount: 100}
Server:
1. Check idempotency_key in store
2. If exists: return cached response (no re-execution)
3. If not: execute, store result keyed by idempotency_key, return
- Keys should expire after reasonable window (24-48 hours).
- Store both request hash and response to detect parameter tampering.
- Use database unique constraint on idempotency key for atomicity.
Distributed Systems Considerations
- Redis
SETNX(SET if Not eXists) for distributed locks with TTL. - Redlock algorithm for fault-tolerant distributed locking across multiple Redis instances.
- Database-level locking preferred over application-level when possible (closer to the data, harder to bypass).
- Event sourcing: append-only log eliminates update races entirely.
17. Secure Multi-Tenant Design
Isolation Models
Strongest ──────────────────────────────────── Weakest
│ │
▼ ▼
Separate Separate Shared Infra, Shared
Infrastructure Namespaces/ Separate DB/ Everything
(per tenant) VPCs Schema (row-level)
| Model | Isolation | Cost | Complexity | Use Case |
|---|---|---|---|---|
| Separate infrastructure | Highest | Highest | Moderate | Regulated industries, government |
| Separate namespaces/VPCs | High | High | High | Enterprise SaaS |
| Shared infra, separate DB/schema | Medium | Medium | Medium | Standard SaaS |
| Shared everything (row-level) | Lowest | Lowest | Low | Consumer apps, cost-sensitive |
Kubernetes Multi-Tenant Pattern
┌─────────────────────────────────────────┐
│ Cluster │
│ ┌─────────────────────────────────────┐│
│ │ Namespace: tenant-a ││
│ │ ├── ResourceQuota (4 pods, 2 CPU) ││
│ │ ├── NetworkPolicy (deny all ││
│ │ │ cross-namespace) ││
│ │ ├── Pod Security: restricted ││
│ │ └── RBAC: tenant-a-role ││
│ └─────────────────────────────────────┘│
│ ┌─────────────────────────────────────┐│
│ │ Namespace: tenant-b ││
│ │ ├── ResourceQuota (4 pods, 2 CPU) ││
│ │ ├── NetworkPolicy (deny all ││
│ │ │ cross-namespace) ││
│ │ ├── Pod Security: restricted ││
│ │ └── RBAC: tenant-b-role ││
│ └─────────────────────────────────────┘│
└─────────────────────────────────────────┘
Cross-Cutting Multi-Tenant Controls
| Layer | Control | Purpose |
|---|---|---|
| Identity | Tenant context in JWT claims | Every request carries tenant identity |
| API Gateway | Tenant-aware rate limiting | Per-tenant quotas prevent noisy neighbor |
| Application | Tenant filter on all queries | Prevent cross-tenant data access |
| Database | Row-level security policies | DB-enforced tenant isolation |
| Storage | Tenant-prefixed object keys + IAM | Prevent cross-tenant storage access |
| Encryption | Per-tenant encryption keys | Cryptographic isolation of data |
| Logging | Tenant ID in all log entries | Tenant-scoped audit trails |
| Network | Namespace/VPC isolation | Network-level blast radius containment |
Noisy Neighbor Prevention
- Per-tenant resource quotas (CPU, memory, storage, API calls).
- Per-tenant connection pool limits to shared databases.
- Per-tenant queue depth limits for async processing.
- Circuit breakers per tenant: if one tenant causes excessive errors, isolate them without affecting others.
- Fair scheduling: weighted round-robin or priority queues preventing any single tenant from monopolizing shared resources.
Data Isolation Verification
- Automated tests that attempt cross-tenant data access (should fail).
- SQL query audit: every query touching tenant data must include tenant filter (static analysis or query interceptor).
- Penetration testing specifically targeting tenant boundary bypass (IDOR, parameter tampering, JWT manipulation).
18. CSS Security
Attack Surface
- Reconnaissance via CSS selectors: descriptive class names (
.addUser,.deleteUser,.adminPanel) reveal application features to unauthenticated attackers examining global CSS files. - CSS injection: if attacker-controlled content enters stylesheets, it can enable data exfiltration via
background-imageURLs, clickjacking via element repositioning, and UI redressing. - Third-party stylesheet risk: externally hosted CSS can be modified to inject malicious styles.
Defenses
- Role-based CSS isolation: segregate stylesheets by access level. Server-side access controls on CSS file delivery. Log suspicious CSS file access.
- CSS obfuscation: replace descriptive selectors with generated names using CSS Modules, JSS (minify option), or build-time obfuscation. Use framework classes (Bootstrap, Tailwind) to reduce custom selectors.
- CSP for styles:
style-src 'self'orstyle-src 'nonce-{RANDOM}'to prevent inline style injection and restrict stylesheet sources. - Subresource Integrity (SRI):
<link rel="stylesheet" href="..." integrity="sha256-..." crossorigin="anonymous">for third-party stylesheets.
19. Architectural Decision Framework
Security Architecture Review Checklist
When designing or reviewing a system, evaluate each area:
□ Authentication
├── How are users/services identified?
├── Token lifecycle (issuance, validation, revocation, rotation)?
└── MFA requirements?
□ Authorization
├── Where are access decisions made (gateway, service, DB)?
├── RBAC vs ABAC vs ReBAC?
└── Least privilege verification?
□ Network Security
├── Trust boundaries identified?
├── East-west encryption (mTLS)?
├── Egress filtering?
└── Microsegmentation?
□ Data Protection
├── Encryption at rest and in transit?
├── Key management (rotation, access)?
├── Data classification applied?
└── PII handling (GDPR Art. 25 Privacy by Design)?
□ Input Handling
├── Validation at every trust boundary?
├── Serialization security (XXE, deserialization)?
└── File upload controls?
□ Resilience
├── Rate limiting at multiple layers?
├── Circuit breakers for downstream dependencies?
├── Timeout hierarchy (caller > callee)?
├── Graceful degradation plan?
└── Resource limits (CPU, memory, connections)?
□ Observability
├── Security-relevant log coverage?
├── Correlation IDs across services?
├── Alerting on auth failures, policy violations?
└── Audit trail for sensitive operations?
□ Supply Chain
├── Dependency scanning in CI/CD?
├── Image signing and verification?
├── SBOM generation?
└── Third-party resource integrity (SRI)?
□ Multi-Tenancy (if applicable)
├── Isolation model chosen and justified?
├── Tenant context propagation?
├── Cross-tenant access testing?
└── Noisy neighbor prevention?
□ Blast Radius
├── What does compromise of component X give the attacker?
├── Can lateral movement be contained?
├── Are secrets scoped to minimum necessary?
└── Is there a kill switch for compromised components?
Threat Modeling Integration
Every architecture decision should be validated through STRIDE analysis:
| Threat | Question |
|---|---|
| Spoofing | Can an attacker impersonate a user or service? |
| Tampering | Can data be modified in transit or at rest? |
| Repudiation | Can actions be denied without audit evidence? |
| Information Disclosure | What data leaks on compromise of each component? |
| Denial of Service | What happens under load or resource exhaustion? |
| Elevation of Privilege | Can a low-privilege actor escalate? |
Map each finding to mitigations, owners, and implementation status. Prioritize by blast radius and exploitability.
Summary: The Principal Architect's Mental Model
Security architecture is not a checklist bolted onto a design. It is a set of constraints that shape the design from inception:
- Identity is the perimeter — network location grants nothing; every request proves identity.
- Every boundary validates — gateway, service, database each enforce independently.
- Blast radius drives topology — segment by damage potential, not by convenience.
- Resilience is security — DoS, cascading failure, and resource exhaustion are attack vectors.
- Observability enables defense — you cannot defend what you cannot see.
- Least privilege is not optional — default-deny at every layer, justify every permission.
- Assume breach, design for containment — the question is not "if" but "when" and "how far."