Phishing detection guide

A neutral, open reference to how modern phishing kits work, what defenders watch for, and which industry frameworks to map against.

What phishing is

Definition

Phishing is a social-engineering attack where an adversary impersonates a trusted brand, service or individual to trick the target into revealing credentials, deploying malware or transferring funds.

MITRE ATT&CK classifies it as technique T1566 with three sub-techniques.

Sub-techniques

  • T1566.001 Spearphishing Attachment - weaponised file
  • T1566.002 Spearphishing Link - URL to fake login
  • T1566.003 Spearphishing via Service - DM / SaaS
  • T1566.004 Spearphishing Voice - vishing

What phishing is not

Phishing is distinct from malware delivery (the kit may stage malware, but the technique is identity-targeted), from generic spam (which has no identity-impersonation step), and from Business Email Compromise (BEC) - which manipulates a real person's mailbox without the spoofed-domain landing page that defines phishing.

How phishing kits work

Modern off-the-shelf phishing kits chain four stages. Detection at each stage feeds different signals into the pipeline that powers phishunt's active feed.

1. Lure

Email, SMS (smishing), instant message or QR code (quishing) carrying a contextual hook - shipping notice, payroll, MFA reset, fake invoice.

Defender signal: SPF / DKIM / DMARC alignment, sender reputation, typosquat domain in the link.

2. Landing

The hosted fake-login page. Today's kits are HTML / JS bundles deployed on cheap shared hosting, Telegram-bot exfil included by default.

Defender signal: brand-keyword in domain, fresh TLS cert in CT logs, asset-hash collision with a known kit, JavaScript obfuscation patterns.

3. Credential capture

The form posts username + password (and any second-factor field) to an operator endpoint. Adversary-in-the-middle kits relay live to the real service to harvest the post-MFA session cookie.

Defender signal: action= attribute pointing off-domain, mismatched form-encryption, missing CSRF token.

4. Exfil

Telegram bot, simple email to a Gmail dropbox, or HTTP POST to a staging server. The session cookie or credential pair is then sold or replayed against the victim's enterprise SSO.

Defender signal: outbound traffic from compromised endpoints to known operator staging hosts; phishunt blocklists the staging hosts when the operator pivots through CT-visible certs.

Detection signals

Defenders combine three signal layers. No single layer is sufficient on its own; current phishing kits routinely defeat any one of them in isolation.

LayerWhat it watchesSample signals
Network Anything visible before the page renders. Certificate Transparency log streams, DNS reputation, brand-keyword lexical scoring on newly registered domains, JA3 / JA4 TLS fingerprints, MX/SPF on the lure domain.
Content The rendered page itself. Brand-asset image-hash collision, form action= off-domain, missing BIMI on lure email sender, HTML smuggling patterns, page-DOM perceptual-hash match against a known kit.
Behavioral How the page reacts to visitors. Geo / IP filtering (only victim country sees real page), CAPTCHA / Turnstile gate before kit reveals itself, OAST canary-callback evidence (e.g. interactsh) when probing for SSRF.

Industry frameworks

OWASP Top 10

Phishing maps primarily to A07:2021 Identification and Authentication Failures, since the kit's purpose is to defeat authentication. Anti-phishing controls also touch A05:2021 Security Misconfiguration when the impersonated site fails to enforce HSTS, secure cookies or strict CSP.

owasp.org/www-project-top-ten

NIST SP 800-53 / 800-46

The relevant control families are AC-2 account management, IA-2 identification and authentication of organizational users, AT-2 security awareness training, and SI-3 malicious code protection. NIST SP 800-46 (Telework) calls out phishing-resistant MFA explicitly.

SP 800-53 Rev. 5

MITRE ATT&CK technique tree

T1566 (Phishing) sits inside the Initial Access (TA0001) tactic. Once the credential is stolen, follow-on TTPs typically chain through TA0006 Credential Access (T1078 Valid Accounts) and TA0008 Lateral Movement. Defender mappings should include detective controls at each tactic transition, not just at T1566 itself.

attack.mitre.org/techniques/T1566

Phishing-resistant MFA

Modern adversary-in-the-middle (AitM) kits relay credentials and post-MFA session cookies live to the real service. SMS-OTP, time-based OTP and push notifications all fall to AitM. Only origin-bound MFA defeats it.

FactorPhishing-resistant?Why / why not
SMS OTPNoCode is transferable; AitM kit relays it instantly. Also vulnerable to SIM-swap.
TOTP (Google Authenticator, Authy)No30-second window is plenty for an AitM relay.
Push approval (Duo, Microsoft Authenticator)PartialNumber-matching helps; basic "approve / deny" prompts get fatigue-pushed and fall to AitM.
FIDO2 / WebAuthn passkeyYesCryptographic challenge bound to the origin (RP ID). The attacker's lookalike domain has the wrong RP ID, so the authenticator refuses to sign.
Smartcard / PIVYesSame origin-binding property as WebAuthn.

Common evasion techniques

Adversary-in-the-Middle (AitM)

Kits like Evilginx, Tycoon-2FA, Modlishka reverse-proxy the real login site and capture the session cookie post-MFA. Defeats SMS / TOTP / push-approval factors.

CAPTCHA / Turnstile gates

Kits gate the malicious page behind a Cloudflare Turnstile or Google reCAPTCHA. Automated crawlers (and many sandboxes) fail the challenge and never see the kit; the victim does.

Geo / IP filtering

Server-side filters return a benign decoy page to any IP outside the victim country, ASN or specific subnet range. Detection requires fetching from the same egress as the target audience.

HTML smuggling

The lure email / page contains a JavaScript blob that Blob-reconstructs the malicious payload client-side, bypassing email-gateway content scanners that inspect attachments at rest.

Glossary

ACME
Automatic Certificate Management Environment - the protocol Let's Encrypt and Google Trust Services use for free / automated TLS issuance.
AitM
Adversary-in-the-Middle - a kit pattern that reverse-proxies the real login site to harvest the post-MFA session cookie.
BIMI
Brand Indicators for Message Identification - a verified-logo signal in email envelopes; an absent BIMI on a sender claiming to be a brand is a phishing tell.
CT log
Certificate Transparency log - a public append-only log every public CA must submit certificates to. Defenders monitor CT logs for brand-keyword domains as an early-warning signal.
DV / OV / EV
Certificate validation tiers: Domain Validated (proof of domain control), Organization Validated, Extended Validation. DV is the most common and the cheapest to abuse.
HSTS
HTTP Strict Transport Security - forces browsers to use HTTPS only. HSTS preload (the browser-baked-in list) is near-irreversible and should not be set lightly.
JA3 / JA4
TLS-handshake fingerprints. Phishing kits often reuse the same TLS stack across thousands of staged domains, producing recognizable fingerprints.
MTA-STS
SMTP MTA Strict Transport Security - mail-server analogue of HSTS. Reduces ability of an attacker to downgrade SMTP to plaintext for MITM.
OAST
Out-of-band Application Security Testing - canary URLs (e.g. interactsh, Burp Collaborator) that record DNS / HTTP / SMTP requests, used to confirm blind vulnerabilities.
RP ID
Relying Party ID in WebAuthn - the origin the authenticator binds the credential to. Phishing-resistant MFA's core property.

FAQ

Is phishunt's data CC0?

Yes. The data published through TXT, JSON, CSV feeds and the public REST API is released under Creative Commons CC0 1.0. The website code, branding and trademarks are not covered by CC0.

How fresh is phishunt's data?

The detection pipeline runs hourly and active sites are rechecked every six hours. Domains stay in the active feed only while they keep returning a phishing page; sites that go down or get cleaned up leave the feed.

How does phishunt compare to PhishTank or OpenPhish?

All three publish phishing IOCs but they target different ingest modes: PhishTank is community-curated, OpenPhish is a commercial paid feed, phishunt is detection-driven from CT logs and newly registered domains. The three feeds complement each other rather than replace each other.

How do I report a phishing site (or a false positive)?

Email [email protected] with the URL and any context. False-positive reports go to the same address; reviewed and removed within 24 hours on weekdays.

Can I integrate phishunt into my SOC tooling?

Yes. The /api/ page documents the REST endpoints (free, no auth, CC0 data). For agentic / LLM-driven workflows the MCP server at mcp.phishunt.io exposes the same data as JSON-RPC tools (see /agents/ for setup snippets).

Why "suspicious" rather than "confirmed malicious"?

phishunt favors recall over precision. Sites that pass the brand-impersonation threshold ship to the active feed without third-party confirmation. Pair with Google Safe Browsing, urlscan or VirusTotal when you need confirmed verdicts.

Open educational reference. Last updated 2026-04-29. Suggestions and corrections via [email protected].