CIPHER Training: Recon & OSINT Deep Dive
CIPHER Training: Recon & OSINT Deep Dive
Source Material Analyzed
- reconftw (full workflow engine, config, 8 modules)
- bbot (module system, presets, event pipeline)
- nmap (613 NSE scripts, scripting engine architecture)
- subfinder (55+ passive sources, provider config)
- httpx (HTTP toolkit, 30+ probes)
- dnsx (DNS toolkit, all record types)
- GHunt (Google account OSINT via People API, Maps, Calendar, Drive)
- maigret (username OSINT across 35k+ sites, recursive search, permutations)
- octosuite (GitHub OSINT via API -- users, orgs, repos, gists, events)
1. RECONFTW: Full Automated Recon Workflow
Architecture
reconftw is a modular bash framework with 8 source files orchestrating 50+ tools:
modes.sh-- workflow orchestration (recon, passive, osint, all, vulns, multi_recon)osint.sh-- 15 OSINT functions (dorks, leaks, metadata, email, cloud, spoof)subdomains.sh-- 12 subdomain functions (passive, brute, permute, recursive, takeover)web.sh-- 20 web functions (probe, fuzz, JS analysis, CMS, screenshots, params)vulns.sh-- 14 vuln functions (XSS, SQLi, SSRF, SSTI, LFI, CRLF, smuggling)axiom.sh-- distributed scanning on cloud fleetcore.sh-- shared infrastructureutils.sh-- utility functions
Execution Phases (recon mode, -r)
Phase 1: OSINT (parallel groups)
Group 1: domain_info, ip_info, emails, google_dorks, third_party_misconfigs
Group 2: github_repos, github_leaks, github_actions_audit, metadata, apileaks, zonetransfer
Standalone: cloud_enum_scan
Phase 2: Subdomains
subdomains_full -> subtakeover -> s3buckets
Internal sub-phases:
0. ASN enumeration (asnmap for CIDR discovery)
1. Passive: sub_passive (subfinder), sub_crt (crt.sh)
2. Active: sub_active (DNS resolution via puredns/dnsx)
3. Brute: sub_brute, sub_permut (gotator), sub_regex_permut, sub_ia_permut
4. Enrichment: sub_noerror, sub_dns, sub_srv, sub_ptr_cidrs
5. Post-active: sub_scraping, sub_analytics, recursive_passive, recursive_brute
Phase 3: Web Detection
webprobe_full (httpx on all ports)
Parallel: screenshot, cdnprovider, portscan, favirecon_tech
Sequential: geo_info, tls_ip_pivots, virtualhosts
Phase 4: Web Analysis
waf_checks -> nuclei_check -> graphql_scan -> fuzz -> iishortname
urlchecks -> jschecks -> sub_js_extract -> well_known_pivots
websocket_checks -> param_discovery -> grpc_reflection -> llm_probe
Phase 5: Finalization
cms_scanner -> url_gf -> wordlist_gen -> password_dict -> url_ext
Phase 6 (optional): Vulnerability Scanning (-a flag)
Group 1: crlf_checks, xss, ssrf_checks, lfi
Group 2: ssti, sqli, command_injection, smuggling
Group 3: webcache, spraying, brokenLinks
Sequential: fuzzparams, nuclei_dast, 4xxbypass, test_ssl
Key Command Pipelines
# Full recon, single target
reconftw -d target.com -r
# Full recon + vulns (aggressive)
reconftw -d target.com -a
# OSINT only
reconftw -d target.com -o
# Passive only (no active scanning)
reconftw -d target.com -p
# Subdomain enumeration only
reconftw -d target.com -s
# Web analysis only
reconftw -d target.com -w
# Multi-target recon
reconftw -l targets.txt -r
# Deep mode (no limits, exhaustive)
reconftw -d target.com -r --deep
# Parallel mode with incremental scanning
reconftw -d target.com -r --parallel --incremental
# Monitor mode (continuous)
reconftw -d target.com -r --monitor --monitor-interval 60
# With Axiom distributed fleet
reconftw -d target.com -r -v --vps-count 10
OSINT Module Details
| Function | Tools Used | Output |
|---|---|---|
google_dorks |
dorks_hunter | osint/dorks.txt |
github_dorks |
gitdorks_go | osint/gitdorks.txt |
github_repos |
enumerepo, gitleaks/titus/noseyparker, trufflehog | osint/github_company_secrets.json |
github_leaks |
ghleaks | osint/github_leaks.json |
github_actions_audit |
gato | osint/github_actions_audit.json |
metadata |
metagoofil, exiftool | osint/metadata_results.txt |
apileaks |
porch-pirate, SwaggerSpy, postleaksNg, trufflehog | osint/postman_leaks.txt, swagger_leaks.txt |
emails |
EmailHarvester, LeakSearch | osint/emails.txt, osint/passwords.txt |
domain_info |
whois, msftrecon, Scopify | osint/domain_info_general.txt |
spoof |
Spoofy | osint/spoof.txt |
mail_hygiene |
dig (TXT, DMARC) | osint/mail_hygiene.txt |
cloud_enum_scan |
cloud_enum | osint/cloud_enum.txt |
ip_info |
WhoisXML API | osint/ip_*_relations/whois/location.txt |
third_party_misconfigs |
misconfig-mapper | osint/3rdparts_misconfigurations.txt |
Secrets Engine Configuration
reconftw supports 4 engines for secret scanning, configurable via SECRETS_ENGINE:
gitleaks-- default, Git history scanningtitus-- with optional API validation of discovered secretsnoseyparker-- alternative with datastore-based analysishybrid-- runs gitleaks + titus together- All complemented by trufflehog for additional coverage
Subdomain Module Deep Techniques
Passive sources: subfinder (55+ APIs), crt.sh (CT logs with DNS time fencing), ASN enumeration (asnmap)
Active techniques:
- DNS resolution with puredns or dnsx (auto-detects NAT/CGNAT)
- Deep wildcard filtering (iterative random-probe method, max 5 iterations)
- NOERROR response bruteforcing
- DNS zone transfer attempts
- SRV record enumeration (~25 service types)
- NS delegation checking with AXFR on delegated zones
- PTR sweep over ASN CIDRs
- TLS certificate pivoting (SNI probing discovered IPs for new subdomains)
Permutation engines:
- gotator: depth-based permutations with number generation
- Regex-based permutations
- AI-powered permutations
- Auto wordlist sizing: short list if >100 subs, full otherwise
Recursive approaches:
- Recursive passive: top N subdomains fed back into passive sources
- Recursive brute: discovered patterns used as brute-force seeds
Configuration Highlights
# Key rate limits
HTTPX_RATELIMIT=150
NUCLEI_RATELIMIT=150
FFUF_RATELIMIT=0
# Adaptive rate limiting
ADAPTIVE_RATE_LIMIT=false # --adaptive-rate flag
MIN_RATE_LIMIT=10
RATE_LIMIT_BACKOFF_FACTOR=0.5 # halve on 429/503
RATE_LIMIT_INCREASE_FACTOR=1.2 # +20% on success
# DNS resolution strategy
DNS_RESOLVER=auto # auto|puredns|dnsx
# auto: detects NAT/CGNAT -> dnsx for home, puredns for VPS
# Parallel execution
PARALLEL_MODE=true
PERF_PROFILE="balanced" # low|balanced|max
# Deep mode thresholds
DEEP_LIMIT=500 # skip heavy modules unless DEEP
DEEP_LIMIT2=1500 # second limit for very heavy operations
# Portscan options
PORTSCAN_ACTIVE_OPTIONS="--top-ports 200 -sV -n -Pn --open --max-retries 2"
PORTSCAN_DEEP_OPTIONS="--top-ports 1000 -sV -n -Pn --open --max-retries 2 --script vulners"
PORTSCAN_STRATEGY=legacy # legacy|naabu_nmap (naabu pre-discovery then nmap)
2. BBOT: Event-Driven Scanning Framework
Architecture
BBOT is a Python-based scanner built around an event pipeline. Modules watch for specific event types, process them, and emit new events that downstream modules consume.
Module System
Every module inherits from BaseModule and declares:
class MyModule(BaseModule):
watched_events = ["DNS_NAME"] # events this module consumes
produced_events = ["IP_ADDRESS"] # events this module emits
flags = ["subdomain-enum", "safe", "passive"] # categorization flags
meta = {"auth_required": False, "description": "..."}
options = {"api_key": ""} # configurable options
deps_pip = ["some-lib"] # auto-installed dependencies
Module Categories (120+ modules)
Subdomain Enumeration (passive APIs): alienvault, anubisdb, bevigil, bufferoverrun, builtwith, c99, censys_dns, certspotter, chaos, crt, digitalyama, digitorus, dnsdumpster, fullhunt, hackertarget, hunterio, leakix, merklemap, myssl, otx, passivetotal, rapiddns, securitytrails, shodan_dns, sitedossier, subdomaincenter, subdomainradar, trickest, urlscan, virustotal, wayback
DNS & Brute Force: dnsbrute, dnsbrute_mutations, dnscaa, dnscommonsrv, dnstlsrpt, dnsbimi
Web Scanning: httpx, ffuf, ffuf_shortnames, gowitness (screenshots), nuclei, wpscan, smuggler, bypass403, hunt, retirejs, robots, securitytxt
Credential/Secret Discovery: trufflehog, badsecrets, git (exposed .git), gitdumper, git_clone, credshed, dehashed
Cloud Enumeration: bucket_amazon, bucket_google, bucket_microsoft, bucket_digitalocean, bucket_firebase, bucket_file_enum
Code Repository: code_repository, github_codesearch, github_org, github_usersearch, github_workflows, gitlab_com, gitlab_onprem, dockerhub, docker_pull
Service Discovery: portscan, fingerprintx, sslcert, ntlm, ipneighbor, ip2location, ipstack
OSINT: emailformat, social, pgp, newsletters, skymem, postman, postman_download
Web Vulnerability: generic_ssrf, graphql_introspection, host_header, iis_shortnames, medusa, lightfuzz/, paramminer_, reflected_parameters, telerik, url_manipulation, vhost
Output Modules (25+): json, csv, txt, sqlite, postgres, mysql, mongo, neo4j, elastic, splunk, kafka, nats, rabbitmq, slack, discord, teams, http (webhook), websocket, web_report, asset_inventory, nmap_xml, subdomains, emails, web_parameters
Presets (Composed Scan Profiles)
# subdomain-enum: Passive + brute-force subdomain discovery
bbot -t target.com -p subdomain-enum
# email-enum: Multi-source email gathering
bbot -t target.com -p email-enum
# cloud-enum: S3/GCS/Azure/DO/Firebase bucket scanning
bbot -t target.com -p cloud-enum
# code-enum: GitHub/GitLab/Docker code repository discovery
bbot -t target.com -p code-enum
# web-basic: Lightweight web vulnerability scan
bbot -t target.com -p web-basic
# web-thorough: Aggressive web assessment
bbot -t target.com -p web-thorough
# spider: Web crawling with data extraction
bbot -t target.com -p spider
# spider-intense: Deep crawling variant
bbot -t target.com -p spider-intense
# kitchen-sink: Everything (subdomain + cloud + code + email + spider + web + screenshots)
bbot -t target.com -p kitchen-sink
# baddns-intense: DNS takeover detection
bbot -t target.com -p baddns-intense
# tech-detect: Technology fingerprinting
bbot -t target.com -p tech-detect
Key Command Pipelines
# Basic subdomain enumeration
bbot -t evilcorp.com -p subdomain-enum -o /path/to/output
# Full offensive scan with deadly modules
bbot -t evilcorp.com -p kitchen-sink --allow-deadly
# Specific modules only
bbot -t evilcorp.com -m nmap,nuclei,ffuf
# With API keys configured
bbot -t evilcorp.com -p subdomain-enum -c modules.shodan_dns.api_key=KEY
# Multiple targets
bbot -t evilcorp.com evilcorp.net -p subdomain-enum
# Output to specific formats
bbot -t evilcorp.com -p subdomain-enum -om json,csv,neo4j
# Accepting all event types including out-of-scope
bbot -t evilcorp.com -p web-thorough --scope-report-distance 2
# Passive-only mode
bbot -t evilcorp.com -p subdomain-enum --flags passive
Event Pipeline Flow
Input Target (domain/IP/URL/email)
-> DNS_NAME events
-> Subdomain modules (passive APIs, brute force)
-> New DNS_NAME events
-> httpx module
-> URL events
-> Web scanning modules (nuclei, ffuf, etc.)
-> VULNERABILITY / FINDING events
-> Output modules (json, neo4j, slack, etc.)
3. NMAP: NSE Scripting Engine Deep Dive
Script Categories (15 categories, 613 scripts)
| Category | Count | Purpose | Risk Level |
|---|---|---|---|
auth |
~20 | Authentication credential testing/bypass | Medium |
broadcast |
~30 | Local network discovery via broadcast | Low |
brute |
~50 | Password guessing across protocols | High |
default |
~80 | Run with -sC; balanced speed/utility/safety |
Low-Med |
discovery |
~60 | Active network/service discovery | Low |
dos |
~10 | Denial of service testing | Critical |
exploit |
~15 | Active vulnerability exploitation | Critical |
external |
~5 | Third-party service queries | Low |
fuzzer |
~10 | Protocol fuzzing for unknown vulns | High |
intrusive |
~100 | High risk of crashes/detection | High |
malware |
~15 | Backdoor/malware detection | Low |
safe |
~200 | No crash/resource risk | Low |
version |
~20 | Version detection extensions (requires -sV) |
Low |
vuln |
~50 | Known vulnerability detection | Medium |
Critical NSE Scripts for Security Assessments
SMB Vulnerability Detection:
# EternalBlue (MS17-010) -- WannaCry/Petya
nmap -p445 --script smb-vuln-ms17-010 <target>
# Conficker detection
nmap -p445 --script smb-vuln-conficker <target>
# SambaCry (CVE-2017-7494)
nmap -p445 --script smb-vuln-cve-2017-7494 <target>
# All SMB vulns at once
nmap -p445 --script "smb-vuln-*" <target>
# SMB enumeration
nmap -p445 --script smb-enum-shares,smb-enum-users,smb-os-discovery <target>
SSL/TLS Analysis:
# Full cipher enumeration with grades (A-F)
nmap -sV --script ssl-enum-ciphers -p 443 <target>
# Heartbleed (CVE-2014-0160)
nmap -p 443 --script ssl-heartbleed <target>
# POODLE (CVE-2014-3566)
nmap -p 443 --script ssl-poodle <target>
# CCS Injection (CVE-2014-0224)
nmap -p 443 --script ssl-ccs-injection <target>
# DH parameter weakness
nmap -p 443 --script ssl-dh-params <target>
# Certificate details
nmap -p 443 --script ssl-cert <target>
# Internal IP disclosure in certificates
nmap -p 443 --script ssl-cert-intaddr <target>
HTTP Vulnerability Scripts:
# Apache Struts RCE (CVE-2017-5638)
nmap -p 80,443 --script http-vuln-cve2017-5638 <target>
# WordPress brute force
nmap -p 80,443 --script http-wordpress-brute <target>
# HTTP enumeration (directories, files, services)
nmap -p 80,443 --script http-enum <target>
# XSS detection
nmap -p 80,443 --script http-dombased-xss,http-stored-xss <target>
# SQL injection detection
nmap -p 80,443 --script http-sql-injection <target>
# HTTP methods testing
nmap -p 80,443 --script http-methods <target>
# Default credential checking
nmap -p 80,443 --script http-default-accounts <target>
# IIS WebDAV vulnerability
nmap -p 80 --script http-iis-webdav-vuln <target>
DNS Scripts:
# Zone transfer
nmap --script dns-zone-transfer -p 53 <nameserver>
# DNS brute force
nmap --script dns-brute <target>
# Cache snooping
nmap --script dns-cache-snoop --script-args 'dns-cache-snoop.domains={popular.com,evil.com}' -p 53 <dns-server>
# NSEC/NSEC3 zone walking
nmap --script dns-nsec-enum,dns-nsec3-enum -p 53 <nameserver>
# DNS recursion testing
nmap --script dns-recursion -p 53 <dns-server>
# SRV record enumeration
nmap --script dns-srv-enum <target>
Service-Specific Scripts:
# FTP anonymous login + backdoors
nmap -p 21 --script ftp-anon,ftp-vsftpd-backdoor,ftp-proftpd-backdoor <target>
# SMTP relay testing + user enumeration
nmap -p 25 --script smtp-open-relay,smtp-enum-users <target>
# SNMP full enumeration
nmap -p 161 --script snmp-info,snmp-interfaces,snmp-processes,snmp-sysdescr,snmp-win32-users <target>
# NFS showmount
nmap -p 2049 --script nfs-showmount,nfs-ls <target>
# RDP encryption check
nmap -p 3389 --script rdp-enum-encryption <target>
# VNC authentication check
nmap -p 5900 --script vnc-info,vnc-brute <target>
Script Selection Syntax
# Boolean expressions
nmap --script "vuln and safe" <target> # safe vuln checks only
nmap --script "not intrusive" <target> # everything except intrusive
nmap --script "(default or safe) and not http-*" # default/safe minus HTTP
# Category combinations
nmap --script default,safe <target>
nmap --script "auth and brute" <target>
# Script arguments
nmap --script http-brute --script-args 'http-brute.path=/admin,userdb=users.txt' <target>
nmap --script snmp-brute --script-args 'snmp-brute.communitiesdb=communities.txt' <target>
Comprehensive Assessment Patterns
# Quick vulnerability assessment
nmap -sV -sC --script vuln -p- <target>
# Full service enumeration + safe vuln check
nmap -sV -sC --script "default and safe" -p- <target>
# Stealth scan with version detection
nmap -sS -sV --script "safe and not broadcast" -T2 -p- <target>
# Aggressive full scan
nmap -A --script "default or vuln or discovery" -p- <target>
# Top-ports quick scan with service fingerprinting
nmap --top-ports 1000 -sV -n -Pn --open --max-retries 2 <target>
# Deep scan with vulners CVE matching (reconftw DEEP)
nmap --top-ports 1000 -sV -n -Pn --open --max-retries 2 --script vulners <target>
# UDP top-20 service scan
sudo nmap --top-ports 20 -sU -sV -n -Pn --open <target>
4. ProjectDiscovery Tool Chain
Subfinder: Passive Subdomain Enumeration
55+ Passive Sources: alienvault, anubis, bevigil, bufferover, builtwith, c99, censys, certspotter, chaos, chinaz, commoncrawl, crtsh, digitalyama, digitorus, dnsdb, dnsdumpster, dnsrepo, domainsproject, driftnet, facebook, fofa, fullhunt, github, gitlab, hackertarget, hudsonrock, intelx, leakix, merklemap, netlas, onyphe, profundis, pugrecon, quake, rapiddns, reconcloud, reconeer, redhuntlabs, riddler, robtex, rsecloud, securitytrails, shodan, sitedossier, thc, threatbook, threatcrowd, threatminer, urlscan, virustotal, waybackarchive, whoisxmlapi, windvane, zoomeyeapi
Provider Configuration (~/.config/subfinder/provider-config.yaml):
censys:
- AC_ID:AC_SECRET
chaos:
- API_KEY
github:
- token1
- token2
fofa:
- email:key
shodan:
- API_KEY
securitytrails:
- API_KEY
virustotal:
- API_KEY
intelx:
- HOST:API_KEY
Key Flags:
subfinder -d target.com -silent # quiet output, subs only
subfinder -d target.com -all # use all sources (slow)
subfinder -d target.com -recursive # recursive subdomain enum
subfinder -d target.com -s crtsh,github # specific sources only
subfinder -d target.com -es alienvault # exclude sources
subfinder -d target.com -oJ -o results.json # JSON output
subfinder -d target.com -nW # active validation only
subfinder -d target.com -rL resolvers.txt # custom resolvers
subfinder -d target.com -rl 10 # rate limit
subfinder -d target.com -rls "hackertarget=10/s" # per-provider rate limit
subfinder -d target.com -max-time 10 # timeout in minutes
httpx: HTTP Probing & Fingerprinting
30+ Probes: URL, IP, Title, Status Code, Content Length, TLS Certificate, CSP Header, Line/Word Count, Location Header, Web Server, WebSocket, Response Time, Favicon Hash, Body/Header Hash, Redirect Chain, CNAME, CDN, ASN, JARM, HTTP2, Pipeline, VHost, Tech Detection
Key Pipelines:
# Basic web probing from subdomain list
cat subs.txt | httpx -silent
# Full fingerprinting
cat subs.txt | httpx -title -tech-detect -status-code -follow-redirects
# JSON output with all probes
cat subs.txt | httpx -json -o results.json
# Screenshot capture
cat subs.txt | httpx -screenshot
# Filter by status code
cat subs.txt | httpx -mc 200,301,302
# Probe specific ports
cat subs.txt | httpx -ports 80,443,8080,8443
# Extract specific data
cat subs.txt | httpx -extract-regex 'api[_-]?key["\s:=]+[a-zA-Z0-9]{20,}'
# CDN/WAF detection
cat subs.txt | httpx -cdn -waf
# Technology detection with favicon hash
cat subs.txt | httpx -td -favicon
# Match by response condition (DSL)
cat subs.txt | httpx -mdc 'status_code == 200 && contains(body, "admin")'
# Full recon probe (reconftw-style)
cat subs.txt | httpx -follow-host-redirects -random-agent \
-status-code -title -tech-detect -web-server -ip -cname \
-cdn -content-length -favicon -json -threads 50 -rate-limit 150
dnsx: DNS Resolution & Analysis
Supported Query Types: A, AAAA, CNAME, NS, TXT, SRV, PTR, MX, SOA, ANY, AXFR, CAA
Key Pipelines:
# Resolve subdomains (filter active)
subfinder -silent -d target.com | dnsx -silent
# Full DNS recon (all record types)
echo target.com | dnsx -recon -json
# Specific record type queries
cat subs.txt | dnsx -a -resp # A records with response
cat subs.txt | dnsx -aaaa -resp-only # IPv6 only
cat subs.txt | dnsx -cname -resp # CNAME records
cat subs.txt | dnsx -txt -resp # TXT records (SPF, DKIM)
cat subs.txt | dnsx -mx -resp # Mail servers
cat subs.txt | dnsx -ns -resp # Nameservers
echo target.com | dnsx -axfr # Zone transfer attempt
echo target.com | dnsx -caa -resp # CAA records
# DNS brute force
dnsx -d target.com -w wordlist.txt -silent
# Wildcard-aware resolution
cat subs.txt | dnsx -wd target.com -silent
# With custom resolvers
cat subs.txt | dnsx -r resolvers.txt -silent
# CDN and ASN detection
cat subs.txt | dnsx -cdn -asn -silent
# Trace DNS resolution path
echo target.com | dnsx -trace
# Filter by response code
cat subs.txt | dnsx -rc noerror -silent
Complete ProjectDiscovery Recon Chain
# Phase 1: Subdomain discovery
subfinder -d target.com -all -silent | tee subs_passive.txt
# Phase 2: DNS resolution (filter active)
cat subs_passive.txt | dnsx -silent | tee subs_active.txt
# Phase 3: Web probing with full fingerprinting
cat subs_active.txt | httpx -title -tech-detect -status-code \
-web-server -ip -cdn -json -o web_probe.json | tee webs_alive.txt
# Phase 4: Vulnerability scanning
cat webs_alive.txt | nuclei -t ~/nuclei-templates/ \
-severity medium,high,critical -o vulns.txt
# Phase 5: URL extraction from archives
cat subs_active.txt | waybackurls | sort -u | tee urls_archive.txt
# Phase 6: Parameter fuzzing on discovered URLs
cat urls_archive.txt | gf sqli | tee sqli_candidates.txt
cat urls_archive.txt | gf xss | tee xss_candidates.txt
cat urls_archive.txt | gf ssrf | tee ssrf_candidates.txt
# Full one-liner chain
subfinder -d target.com -silent | dnsx -silent | httpx -silent | nuclei -silent
Advanced Chaining Patterns
# Subdomain -> DNS -> HTTP -> Tech Stack -> Targeted Nuclei
subfinder -d target.com -silent \
| dnsx -silent -a -resp \
| httpx -silent -td -json -o httpx_out.json \
| nuclei -t ~/nuclei-templates/technologies/ -silent
# CIDR-based scanning
echo "192.168.1.0/24" | dnsx -ptr -resp-only -silent \
| httpx -silent -title -status-code
# Reverse DNS from IP range
echo "192.168.1.0/24" | dnsx -ptr -resp-only | httpx -silent
# TLS certificate harvesting for new subdomains
echo target.com | httpx -tls-grab -tls-probe -json \
| jq -r '.tls.dns_names[]' | dnsx -silent
# Favicon-based technology fingerprinting
cat webs.txt | httpx -favicon -json | jq -r '.favicon_mmh3'
5. OSINT Tools Deep Dive
GHunt: Google Account Intelligence
Modules:
email-- Lookup by email address: profile photo, cover photo, last edit date, Gaia ID, user types, Google Chat data (entity type, customer ID), Google Plus data (enterprise user)gaia-- Lookup by Gaia ID (Google's internal user identifier): same data as email lookup but via numeric IDdrive-- Google Drive document OSINTgeolocate-- WiFi-based geolocation via Google's geolocation APIspiderdal-- Digital Asset Links spider for verified app/site relationships
APIs Used:
- People PA (People API internal): profile data, photos, user types, organizations
- Calendar: public calendar events
- Drive: shared document metadata
- Play Games: gaming profile, achievements
- Play Gateway: app associations
- Vision: image analysis (face detection in profile photos)
- Geolocation: WiFi AP-based positioning
- Identity Toolkit: authentication flow analysis
- Digital Asset Links: verified domain/app relationships
Data Points Extracted:
- Full name, profile/cover photos (custom vs default)
- Last profile edit timestamp
- Gaia ID (cross-references to other Google services)
- User type classification (consumer, G Suite, enterprise)
- Google Chat entity type and customer ID
- Enterprise user status
- Google Maps reviews and photos
- Play Games profile (achievements, display name)
- Calendar event data
- YouTube channel association
Command Usage:
# Setup authentication
ghunt login
# Email lookup
ghunt email target@gmail.com
# Gaia ID lookup
ghunt gaia 123456789
# Drive document analysis
ghunt drive "https://docs.google.com/document/d/..."
# JSON output
ghunt email target@gmail.com --json results.json
Maigret: Username OSINT Across 35,000+ Sites
Architecture:
data.json: 35,921-line site database defining URL templates, detection methods, response parsing ruleschecking.py: async HTTP checker with retry, proxy support, cookie managementpermutator.py: username permutation engine (separators: "", "_", "-", ".")report.py: multi-format output (CSV, HTML, PDF, XMind, TXT, JSON, graph)socid_extractor: extracts user IDs from profile pages for recursive search
Supported ID Types for Recursive Search: username, yandex_public_id, gaia_id, vk_id, ok_id, wikimapia_uid, steam_id, uidme_uguid, yelp_userid
Recursive Search Logic:
- Search username across all sites
- From found profiles, extract additional IDs (socid_extractor)
- Extract linked profile URLs
- Parse extracted IDs against the database for cross-platform mapping
- Generate new username permutations from discovered names
- Repeat with discovered identifiers
Permutation Engine:
Given elements like {first: "john", last: "doe"}, generates:
- Single:
john,doe,_john,john_ - Combined with separators:
johndoe,john_doe,john-doe,john.doe - All orderings:
doejohn,doe_john,doe-john,doe.john
Command Usage:
# Basic username search
maigret username
# Multiple usernames
maigret user1 user2 user3
# With permutations (e.g., from first+last name)
maigret --permute "John Doe"
# Recursive search (follow discovered IDs)
maigret username --recursive
# Parse a profile URL for IDs
maigret --parse "https://twitter.com/username"
# Specific sites only
maigret username --site twitter.com --site github.com
# Output formats
maigret username --csv --html --pdf --json simple
# With proxy
maigret username --proxy socks5://127.0.0.1:9050
# Timeout and retries
maigret username --timeout 10
# Self-check (verify site database)
maigret --self-check
Octosuite: GitHub OSINT (by Bellingcat)
Capabilities via GitHub API:
- User profiling: bio, location, company, email, blog, followers/following, public repos/gists, creation date, last activity
- Organization analysis: members, repos, teams, billing, description
- Repository analysis: contributors, languages, branches, commits, issues, pull requests, forks, stars, watchers
- Event tracking: user activity timeline (push events, issue events, PR events, etc.)
- Gist analysis: public gists content and metadata
- Cross-referencing: connection mapping between users, orgs, and repos
Architecture:
api/github.py: GitHub API client with caching and response sanitization (strips API URLs, null values)app/cli/: command-line interfaceapp/tui/: terminal UI (interactive mode)api/cache.py: request caching layerapi/models.py: data models for GitHub entities
Key Features:
- Response caching to avoid rate limits
- Automatic sanitization of API response data
- Both CLI and TUI (interactive terminal) interfaces
- Export capabilities
Usage Patterns:
# User investigation
octosuite user <username>
# Organization analysis
octosuite org <orgname>
# Repository analysis
octosuite repo <owner>/<repo>
# User event timeline
octosuite events <username>
6. OSINT Methodology Framework
Phase 1: Passive Footprinting (Zero Interaction)
Target Identification
├── Domain WHOIS (registrant, registrar, dates, nameservers)
├── DNS records (A, AAAA, MX, NS, TXT, SOA, SRV, CAA)
├── Certificate Transparency logs (crt.sh, certspotter)
├── ASN/BGP analysis (asnmap, bgp.he.net)
├── Reverse WHOIS (organization, email, phone correlation)
├── Historical DNS (SecurityTrails, DNSHistory)
└── Passive DNS (VirusTotal, PassiveTotal, OTX)
Digital Presence Mapping
├── Subdomain enumeration (subfinder -all, 55+ sources)
├── Web archive analysis (Wayback Machine, CommonCrawl)
├── Google dorking (site:, inurl:, intitle:, filetype:, ext:)
├── Social media profiling (maigret, sherlock)
├── Email harvesting (EmailHarvester, Hunter.io, Phonebook.cz)
├── GitHub/GitLab OSINT (octosuite, gitdorks_go, ghleaks)
├── Document metadata (metagoofil + exiftool)
├── Cloud storage enumeration (cloud_enum, S3Scanner)
└── Job posting analysis (LinkedIn, Indeed -> tech stack hints)
Infrastructure Analysis
├── Shodan/Censys/Fofa queries
├── CDN/WAF fingerprinting
├── IP geolocation and ISP mapping
├── Netblock/ASN ownership
├── Related domains (reverse IP, shared hosting)
└── Technology fingerprinting (Wappalyzer, httpx -td)
Phase 2: Semi-Passive (Minimal Interaction)
Web Probing
├── HTTP probing (httpx with fingerprinting)
├── Screenshot capture (gowitness, httpx -ss)
├── TLS certificate analysis (ssl-cert, tlsx)
├── Favicon hash correlation (httpx -favicon, Shodan favicon:)
├── robots.txt / sitemap.xml collection
├── security.txt discovery
└── .well-known endpoint enumeration
DNS Active Validation
├── Resolution validation (dnsx)
├── Zone transfer attempts (dnsx -axfr, dig axfr)
├── Wildcard detection (deep_wildcard_filter)
├── DNSSEC validation
├── SPF/DMARC/DKIM analysis
└── NS delegation audit
Phase 3: Active Reconnaissance (Direct Interaction)
Port Scanning
├── Fast port discovery (naabu, masscan)
├── Service fingerprinting (nmap -sV)
├── NSE script scanning (nmap --script vuln,safe)
├── UDP service detection (nmap -sU --top-ports 20)
└── Service-specific enumeration (SMB, SNMP, LDAP, etc.)
Web Analysis
├── Directory/file fuzzing (ffuf, feroxbuster)
├── Parameter discovery (arjun, paramspider)
├── JavaScript analysis (LinkFinder, JSA, jsluice)
├── API endpoint discovery (Swagger, GraphQL introspection)
├── CMS detection (CMSeeK, wpscan)
├── WAF detection (wafw00f)
└── Virtual host discovery (vhost fuzzing)
Vulnerability Assessment
├── Nuclei template scanning (info through critical)
├── SSL/TLS assessment (testssl.sh, ssl-enum-ciphers)
├── Subdomain takeover (nuclei, dnstake)
├── CORS misconfiguration
├── Security header analysis
└── Known CVE matching (vulners, nuclei)
Phase 4: People OSINT
Identity Resolution
├── Username enumeration (maigret, sherlock across 2000+ sites)
├── Email-to-person mapping (GHunt, hunter.io)
├── Google account profiling (GHunt: Gaia ID, user type, Maps reviews)
├── Social media deep dive (profile, connections, activity)
├── Breach database lookup (LeakSearch, dehashed)
├── PGP key server search
└── Resume/CV OSINT (LinkedIn, job boards)
Username Permutation Strategy
├── Direct username search
├── Name variations: john.doe, johndoe, john_doe, j.doe, jdoe
├── With numbers: johndoe1, johndoe123, johndoe90
├── Platform-specific patterns (GitHub: first-last, Twitter: @handle)
└── Recursive: found username A -> profile reveals username B -> search B
7. Operational Pipelines
Pipeline 1: Full External Recon (Bug Bounty / Pentest)
# Step 1: Subdomain enumeration (cast wide net)
subfinder -d target.com -all -silent | tee step1_passive.txt
# Parallel: certificate transparency
echo target.com | dnsx -cname -resp 2>/dev/null
# Step 2: DNS brute force + permutations
puredns bruteforce wordlist.txt target.com -r resolvers.txt | tee step2_brute.txt
cat step1_passive.txt step2_brute.txt | sort -u | tee all_subs.txt
gotator -sub all_subs.txt -perm permutations.txt -depth 1 -numbers 3 | \
puredns resolve -r resolvers.txt | tee step2_permuted.txt
# Step 3: DNS resolution (filter live)
cat all_subs.txt step2_permuted.txt | sort -u | dnsx -silent | tee live_subs.txt
# Step 4: Web probing + fingerprinting
cat live_subs.txt | httpx -title -tech-detect -status-code -web-server \
-ip -cdn -json -o httpx_results.json | tee live_webs.txt
# Step 5: Port scanning on non-CDN IPs
cat httpx_results.json | jq -r 'select(.cdn == false) | .host' | \
naabu -top-ports 1000 -silent | tee open_ports.txt
nmap -sV -sC --script vuln -iL open_ports.txt -oA nmap_results
# Step 6: URL collection
cat live_webs.txt | katana -silent -d 3 | tee crawled_urls.txt
cat live_subs.txt | waybackurls | sort -u | tee archive_urls.txt
cat crawled_urls.txt archive_urls.txt | sort -u | tee all_urls.txt
# Step 7: Vulnerability scanning
cat live_webs.txt | nuclei -t ~/nuclei-templates/ -severity medium,high,critical
cat all_urls.txt | gf xss | dalfox pipe --silence
cat all_urls.txt | gf sqli | sqlmap --batch --level 3
cat all_urls.txt | gf ssrf | qsreplace "COLLAB_URL" | httpx -silent
# Step 8: JavaScript analysis
cat live_webs.txt | katana -silent -jc | grep '\.js$' | sort -u | tee js_files.txt
cat js_files.txt | nuclei -t ~/nuclei-templates/exposures/ -silent
Pipeline 2: OSINT Investigation (Person/Entity)
# Step 1: Email OSINT
ghunt email target@gmail.com --json ghunt_results.json
# Extract Gaia ID from results for cross-reference
ghunt gaia <extracted_gaia_id>
# Step 2: Username enumeration
maigret "targetusername" --recursive --json simple -o maigret_results
# Step 3: GitHub OSINT
octosuite user targetusername
# Search GitHub for leaked secrets
gitdorks_go -gd dorks.txt -target target.com -tf .github_tokens
# Step 4: Domain OSINT
whois target.com
dig any target.com
subfinder -d target.com -all -silent
echo target.com | dnsx -recon -json
# Step 5: Breach/leak check
# Search breach databases for email
# porch-pirate for Postman collections
porch-pirate -s target.com --dump
# Step 6: Document metadata
metagoofil -d target.com -t pdf,docx,xlsx -l 50 -o docs/
exiftool -r docs/ | grep -i "author\|creator\|email\|producer"
# Step 7: Social media correlation
# Cross-reference discovered usernames across platforms
# Map organizational relationships
Pipeline 3: Cloud Infrastructure Recon
# Step 1: Cloud storage enumeration
cloud_enum -k target.com -k targetcompany
# S3 specific
aws s3 ls s3://target-bucket --no-sign-request 2>/dev/null
# Step 2: Azure tenant discovery
python3 msftrecon.py -d target.com
# Step 3: Subdomain-based cloud discovery
cat live_subs.txt | grep -iE "aws|azure|gcp|s3|blob|cloud" | tee cloud_subs.txt
# Step 4: bbot cloud scan
bbot -t target.com -p cloud-enum -o cloud_results
# Step 5: GitHub Actions audit
gato e --enum_wf_artifacts -O orgs.txt -oJ gato_results.json
Pipeline 4: reconftw Automated (Set and Forget)
# Quick passive-only scan
reconftw -d target.com -p
# Standard recon (recommended starting point)
reconftw -d target.com -r --parallel
# Deep comprehensive scan
reconftw -d target.com -a --deep --parallel
# Continuous monitoring
reconftw -d target.com -r --monitor --monitor-interval 60 --incremental
# Multi-target campaign
reconftw -l scope.txt -r --parallel
# With notifications
# Edit reconftw.cfg: NOTIFICATION=true, configure notify provider
reconftw -d target.com -r --parallel
8. Detection Evasion Considerations (Purple Team)
Recon Activity Detection Points
| Activity | Detection Method | MITRE ATT&CK |
|---|---|---|
| Mass DNS queries | DNS query volume anomaly, NOERROR spike | T1596.001 |
| Port scanning | Firewall/IDS connection rate alerts | T1046 |
| Directory bruting | WAF rate limiting, 404 spike | T1595.003 |
| Subdomain bruting | DNS query pattern analysis | T1596.001 |
| Web crawling | User-Agent analysis, request pattern | T1595.002 |
| Certificate scraping | CT log monitoring alerts | T1596.002 |
| API key abuse | Provider rate limit / access logging | T1589 |
| GitHub dork scanning | GitHub API audit logs | T1593.003 |
| Nuclei scanning | WAF signature matching, scan pattern | T1595.002 |
Evasion Techniques Used by Tools
- Rate limiting: reconftw adaptive rate (RATE_LIMIT_BACKOFF_FACTOR=0.5)
- Resolver rotation: reconftw auto-detects NAT, uses resolver lists
- User-Agent rotation: configurable headers, random agents
- Proxy support: all tools support HTTP/SOCKS proxies
- Distributed scanning: Axiom fleet spreads traffic across IPs
- Passive-first: subfinder/bbot passive modes avoid direct contact
- Smart timing: nmap -T2 for stealth, configurable thread counts
Defender Countermeasures
# Sigma rule: Mass DNS enumeration
title: High Volume DNS Queries from Single Source
logsource:
category: dns
product: network
detection:
selection:
query_count|gt: 1000
timeframe: 5m
condition: selection
level: medium
tags:
- attack.t1596.001
- attack.reconnaissance
# Sigma rule: Directory brute force
title: HTTP Directory Enumeration Detected
logsource:
category: webserver
product: apache/nginx
detection:
selection:
response_code: 404
src_ip|count_distinct(request_path)|gt: 100
timeframe: 1m
condition: selection
level: medium
tags:
- attack.t1595.003
9. API Key Requirements Summary
Free Tier Available
| Service | Keys Needed | Free Limit |
|---|---|---|
| VirusTotal | 1 API key | 500 req/day |
| Shodan | 1 API key | 100 queries/month (community) |
| SecurityTrails | 1 API key | 50 req/month |
| Censys | ID + Secret | 250 queries/month |
| GitHub | 1+ PAT tokens | 5000 req/hour |
| Hunter.io | 1 API key | 25 searches/month |
| WhoisXML | 1 API key | 500 credits |
| URLScan | 1 API key | 100 scans/day |
Paid/Premium Sources
| Service | Notes |
|---|---|
| Chaos (PD) | ProjectDiscovery Cloud Platform |
| FOFA | Chinese search engine, paid tiers |
| IntelX | Intelligence X, paid for full access |
| BinaryEdge | Paid API |
| C99 | Paid subdomain API |
| DNSDB (Farsight) | Enterprise DNS intelligence |
| Netlas | Paid internet scanner |
| Onyphe | Paid cyber defense search |
Maximum Coverage Configuration
For maximum subdomain coverage, configure at minimum:
- GitHub tokens (multiple for rate limit distribution)
- Shodan API key
- SecurityTrails API key
- Censys API credentials
- VirusTotal API key
- Chaos API key (ProjectDiscovery Cloud)
10. Tool Selection Decision Matrix
| Scenario | Primary Tool | Why |
|---|---|---|
| Full automated recon | reconftw | Orchestrates 50+ tools, handles all phases |
| Quick subdomain enum | subfinder | Fast, passive, 55+ sources |
| Active subdomain brute | puredns + gotator | Handles wildcards, permutations |
| Web fingerprinting | httpx | 30+ probes, JSON output, fast |
| Vulnerability scanning | nuclei | Template-based, community-maintained |
| Port scanning | nmap + naabu | naabu for speed, nmap for depth |
| JS analysis | katana + jsluice | Crawl + extract endpoints/secrets |
| Secret scanning | trufflehog + gitleaks | Git history + filesystem scanning |
| Username OSINT | maigret | 35k+ sites, recursive search |
| Google OSINT | GHunt | People API, Maps, Calendar, Drive |
| GitHub OSINT | octosuite + gitdorks | User/org/repo analysis + dork search |
| Cloud enum | bbot cloud-enum | S3/GCS/Azure/Firebase buckets |
| Event-driven scanning | bbot | Modular, extensible, graph output |
| DNS analysis | dnsx | All record types, brute force, trace |
| SSL/TLS audit | testssl.sh + nmap | Cipher enum, vuln detection |
| WAF detection | wafw00f + httpx | Identify protection layers |
| API testing | porch-pirate + SwaggerSpy | Postman leaks + Swagger discovery |