Security

How to Detect Phishing Domains Programmatically: WHOIS, DNS, and SSL Signals

A practical guide for security engineers and threat intelligence teams — with API code examples.

April 7, 202610 min readSecurity · Phishing · WHOIS · DNS · SSL

Introduction

Phishing infrastructure moves fast. A domain is registered, weaponised, and taken down — often within 24 to 72 hours. At that velocity, manual analysis cannot scale. Security teams that rely on user-reported phishing URLs or reactive blocklist updates are always several hours behind the attack window. The targets are already in the inbox before the domain is blacklisted.

The better approach is to detect phishing domains before they become active threats. Every newly registered phishing domain leaves detectable traces in three public data layers: WHOIS registration data, DNS records, and the SSL certificate it provisions. Each layer alone provides a weak signal. Combining them produces a risk score strong enough to triage automatically at scale.

This guide covers the four technical signals that security engineers can query today, the API calls that return structured data for each, and a composable Python scoring function that ties them together into an actionable risk assessment. All examples use the WhoisJSON API — base URL code https://whoisjson.com/api/v1 | , authentication via the code Authorization: TOKEN=YOUR_API_KEY | header.

The Anatomy of a Phishing Domain

Phishing domains share a cluster of observable characteristics that distinguish them from legitimate registrations. Understanding these patterns is the prerequisite for building automated detection.

Freshly registered

The majority of phishing domains used in active campaigns were registered within the previous 30 days. Legitimate domains accumulate age over months and years. A domain under 30 days old is a meaningful baseline signal.

Privacy proxy or redacted contacts

Threat actors systematically hide registrant details behind privacy proxy services (Domains by Proxy, Withheld for Privacy). Absent or fully redacted WHOIS contacts are a consistent characteristic.

Cheap or abused TLD

Low-cost TLDs (.xyz, .top, .click, .online) are disproportionately represented in phishing feeds because they lower the cost of burning domains quickly. .com remains common for higher-quality brand impersonation campaigns.

Typosquatting the target brand

Character substitution (rn for m), extra hyphens, subdomain abuse (paypal.login-secure.xyz), and IDN homoglyph attacks. The domain visually or phonetically impersonates a known brand or institution.

These characteristics are detectable via structured API calls before the phishing campaign goes live. The key insight: most phishing domains provision SSL and configure DNS in the hours before the first email is sent. That window is the detection opportunity.

Signal #1 — Domain Age via WHOIS

Domain age is the single most predictive individual signal. The WhoisJSON API returns a pre-computed code age | object when the domain record is served via RDAP, eliminating the need to parse raw date strings and compute deltas manually.

Key fields from the /whois endpoint:
  • age.isNewlyRegistered — boolean, true when the domain is 30 days old or less
  • age.isYoung — boolean, true when the domain is 365 days old or less
  • age.days — integer, exact age in days since creation date
  • created — ISO timestamp of the original registration date
  • source — either whois or rdap. The age object is only present when source is rdap

cURL example:

curlShell
curl -X GET "https://whoisjson.com/api/v1/whois?domain=suspected-phishing.com" \
  -H "Authorization: TOKEN=YOUR_API_KEY"

Python example — extracting domain age signals:

whois_age.pyPython
import requests

API_KEY  = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"

def get_age_signals(domain: str) -> dict:
    """
    Returns age signals from the WhoisJSON /whois endpoint.
    The 'age' object is present only when source == 'rdap'.
    Falls back to parsing 'created' manually for WHOIS-only TLDs.
    """
    resp = requests.get(
        f"{BASE_URL}/whois",
        params={"domain": domain},
        headers={"Authorization": f"TOKEN={API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()

    age = data.get("age") or {}
    return {
        "domain":              domain,
        "source":              data.get("source"),
        "created":             data.get("created"),
        "age_days":            age.get("days"),
        "is_newly_registered": age.get("isNewlyRegistered", False),
        "is_young":            age.get("isYoung", False),
    }

result = get_age_signals("suspected-phishing.com")
print(result)
# {'domain': 'suspected-phishing.com', 'source': 'rdap',
#  'created': '2026-04-03 08:14:52', 'age_days': 4,
#  'is_newly_registered': True, 'is_young': True}

When code is_newly_registered | is code True | , flag the domain immediately and queue it for further signal enrichment. Do not block on age alone — legitimate services spin up new domains regularly — but treat it as a mandatory precondition for escalation.

Signal #2 — Registrant Privacy and Missing Fields

When WHOIS contacts are not suppressed by GDPR or registry policy, the registrant object reveals how much the domain owner has disclosed. Phishing operators systematically use privacy proxy services or submit minimal, often fictitious, contact information.

The relevant fields come from the code contacts.owner | array in the WhoisJSON WHOIS response:

FieldPhishing indicator
organizationAbsent, or set to a privacy proxy name (e.g. "Domains By Proxy, LLC", "Withheld for Privacy ehf")
emailAbsent, redacted, or a generic proxy address (e.g. "[email protected]")
name"Redacted for Privacy", "REDACTED FOR PRIVACY", or a clearly disposable identity
countryAbsent or inconsistent with the registrar's declared jurisdiction
addressAbsent or a clearly fake street address

Scoring registrant opacity is straightforward: count the number of meaningful contact fields present and divide by the total expected. A score of 0.0 means fully redacted; 1.0 means all standard contact data is disclosed. Any score below 0.4 combined with a freshly registered domain is a strong combined signal.

registrant_opacity.pyPython
PRIVACY_KEYWORDS = {"redacted", "withheld", "privacy", "proxy", "gdpr"}

def registrant_opacity_score(contacts: dict) -> float:
    """
    Returns 0.0 (fully opaque) to 1.0 (fully transparent).
    Pass the 'contacts' object from the WhoisJSON /whois response.
    """
    owner_list = contacts.get("owner") or []
    if not owner_list:
        return 0.0

    owner  = owner_list[0]
    fields = ["name", "organization", "email", "country", "address"]
    filled = 0

    for field in fields:
        value = (owner.get(field) or "").lower()
        if value and not any(kw in value for kw in PRIVACY_KEYWORDS):
            filled += 1

    return filled / len(fields)

Signal #3 — DNS Patterns

DNS records expose how a domain is configured operationally. Phishing domains tend to follow predictable patterns: no MX records (the campaign uses a third-party sending service or has no email at all), nameservers on cheap shared hosting or known bulletproof providers, and no published DMARC or SPF policy.

The WhoisJSON code /nslookup | endpoint returns structured DNS records in a single call: A, AAAA, MX, NS, TXT, CNAME, CAA, SOA, DMARC, BIMI, MTA-STS, and TLSRPT. For phishing detection, focus on MX, NS, DMARC, and TXT.

DNS signals to evaluate:
  • MX absent or pointing to a free provider — no MX records means the domain is not configured to receive email legitimately. An MX pointing to Gmail or Outlook means the operator uses a free consumer account, not dedicated mail infrastructure.
  • No DMARC record — legitimate sending domains increasingly publish DMARC. Its absence, combined with other signals, adds weight to a phishing classification.
  • No SPF record in TXT — domains with no SPF cannot reliably claim to restrict who sends email on their behalf. Most phishing kits skip this configuration entirely.
  • NS on shared or low-cost infrastructure — providers disproportionately present in phishing domains include parking name servers, free DNS providers, and bulletproof hosters.

Node.js example — DNS signal extraction:

dns_signals.jsJavaScript
const API_KEY  = process.env.WHOISJSON_API_KEY;
const BASE_URL = 'https://whoisjson.com/api/v1';

const FREE_MX_PROVIDERS = [
    'google.com', 'googlemail.com', 'outlook.com',
    'hotmail.com', 'protonmail.ch', 'yandex.net'
];

async function getDnsSignals(domain) {
    const res = await fetch(`${BASE_URL}/nslookup?domain=${domain}`, {
        headers: { 'Authorization': `TOKEN=${API_KEY}` },
        signal: AbortSignal.timeout(8000)
    });
    if (!res.ok) throw new Error(`DNS API error ${res.status} for ${domain}`);
    const data = await res.json();

    const mx    = data.MX    ?? [];
    const ns    = data.NS    ?? [];
    const txt   = data.TXT   ?? [];
    const dmarc = data.DMARC ?? [];

    const noMx = mx.length === 0;
    const freeMx = mx.some(r =>
        FREE_MX_PROVIDERS.some(p => (r.exchange ?? '').includes(p))
    );
    const noDmarc = dmarc.length === 0;
    const noSpf   = !txt.some(t => t.startsWith('v=spf1'));

    return { domain, noMx, freeMx, noDmarc, noSpf, ns, mx };
}

getDnsSignals('suspected-phishing.com').then(console.log);
// {
//   domain: 'suspected-phishing.com',
//   noMx: true, freeMx: false,
//   noDmarc: true, noSpf: true,
//   ns: ['ns1.cheaphost.net', 'ns2.cheaphost.net'],
//   mx: []
// }

Signal #4 — SSL Certificate Anomalies

A valid SSL certificate is not a trust indicator. Let's Encrypt issues certificates for free in seconds, with no manual vetting. Phishing operators routinely provision Let's Encrypt certificates to display the padlock icon in the browser address bar — a detail that still misleads many users into treating the site as legitimate.

The WhoisJSON code /ssl-cert-check | endpoint returns issuer details, validity window, Subject Alternative Names, and full certificate metadata for any domain.

SSL signals to evaluate:
  • Let's Encrypt on a domain under 30 days old — Let's Encrypt alone is weak evidence. Combined with a freshly registered domain and sparse DNS, it is a meaningful composite signal.
  • Certificate issued within hours or days of domain registration — fast provisioning after registration is a hallmark of automated phishing kit deployment. Check valid_from against the WHOIS created date.
  • Subject Alternative Names containing brand keywords — a SAN covering paypal-secure-login.xyz alongside brand-related terms is a strong impersonation signal.
ssl_signals.pyPython
import requests
from datetime import datetime, timezone

API_KEY  = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"

def get_ssl_signals(domain: str) -> dict:
    resp = requests.get(
        f"{BASE_URL}/ssl-cert-check",
        params={"domain": domain},
        headers={"Authorization": f"TOKEN={API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()

    issuer_org = (data.get("issuer") or {}).get("O", "")
    valid_from = data.get("valid_from")  # ISO string, e.g. "2026-04-03T10:22:00.000Z"
    valid      = data.get("valid", False)
    san        = (data.get("details") or {}).get("subjectaltname", "")

    cert_age_days = None
    if valid_from:
        issued = datetime.fromisoformat(valid_from.replace("Z", "+00:00"))
        cert_age_days = (datetime.now(timezone.utc) - issued).days

    return {
        "domain":          domain,
        "issuer_org":      issuer_org,
        "is_lets_encrypt": "Let's Encrypt" in issuer_org,
        "cert_age_days":   cert_age_days,
        "valid":           valid,
        "san":             san,
    }

Combining Signals: A Risk Scoring Function

No individual signal is sufficient to classify a domain as malicious. A newly registered domain could be a legitimate startup. A Let's Encrypt certificate is used by millions of legitimate sites. A domain with no DMARC record may simply belong to an organisation that has not yet configured email authentication. The detection value comes from combining signals: each weak indicator multiplies the evidence weight of the others to produce a composite risk score significantly more reliable than any single check.

The following Python function combines all four signals into a score from 0 to 100:

phishing_score.pyPython
"""
Phishing domain risk scorer — WhoisJSON API
Requires: pip install requests

Score breakdown (max 100):
  Signal #1 — Domain age       : up to 40 points
  Signal #2 — Registrant opac. : up to 20 points
  Signal #3 — DNS anomalies    : up to 25 points
  Signal #4 — SSL anomalies    : up to 15 points
"""

import requests
from datetime import datetime, timezone

API_KEY  = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"
HEADERS  = {"Authorization": f"TOKEN={API_KEY}"}

FREE_MX    = {"google.com", "googlemail.com", "outlook.com",
              "hotmail.com", "protonmail.ch", "yandex.net"}
PRIVACY_KW = {"redacted", "withheld", "privacy", "proxy", "gdpr"}


def _fetch(endpoint, domain):
    r = requests.get(
        f"{BASE_URL}/{endpoint}",
        params={"domain": domain},
        headers=HEADERS,
        timeout=10,
    )
    r.raise_for_status()
    return r.json()


def score_domain(domain: str) -> dict:
    score   = 0
    reasons = []

    # ── Signal #1: Domain age (up to 40 pts) ─────────────────────────────
    whois    = _fetch("whois", domain)
    age      = whois.get("age") or {}
    age_days = age.get("days")

    if age.get("isNewlyRegistered"):       # <= 30 days
        score += 40
        reasons.append("newly registered (<=30d)")
    elif age_days is not None and age_days <= 90:
        score += 25
        reasons.append(f"young domain ({age_days}d)")
    elif age.get("isYoung"):               # <= 365 days
        score += 10
        reasons.append("domain < 1 year")

    # ── Signal #2: Registrant opacity (up to 20 pts) ─────────────────────
    contacts = whois.get("contacts") or {}
    owners   = contacts.get("owner") or []

    if not owners:
        score += 20
        reasons.append("no registrant data")
    else:
        owner  = owners[0]
        fields = ["name", "organization", "email", "country", "address"]
        filled = sum(
            1 for f in fields
            if (owner.get(f) or "").lower()
            and not any(kw in (owner.get(f) or "").lower()
                        for kw in PRIVACY_KW)
        )
        opacity = 1 - (filled / len(fields))
        pts     = round(opacity * 20)
        score  += pts
        if opacity > 0.5:
            reasons.append(f"registrant {opacity:.0%} opaque")

    # ── Signal #3: DNS anomalies (up to 25 pts) ──────────────────────────
    dns   = _fetch("nslookup", domain)
    mx    = dns.get("MX")    or []
    txt   = dns.get("TXT")   or []
    dmarc = dns.get("DMARC") or []

    if not mx:
        score += 10
        reasons.append("no MX records")
    elif any(p in (r.get("exchange", "") for r in mx)
             for p in FREE_MX):
        score += 5
        reasons.append("MX on free provider")

    if not dmarc:
        score += 10
        reasons.append("no DMARC record")

    if not any(t.startswith("v=spf1") for t in txt):
        score += 5
        reasons.append("no SPF record")

    # ── Signal #4: SSL anomalies (up to 15 pts) ──────────────────────────
    try:
        ssl       = _fetch("ssl-cert-check", domain)
        issuer    = (ssl.get("issuer") or {}).get("O", "")
        vf        = ssl.get("valid_from")
        cert_days = None

        if vf:
            issued    = datetime.fromisoformat(vf.replace("Z", "+00:00"))
            cert_days = (datetime.now(timezone.utc) - issued).days

        if "Let's Encrypt" in issuer and (age_days or 9999) <= 30:
            score += 10
            reasons.append("Let's Encrypt on newly registered domain")
        elif "Let's Encrypt" in issuer:
            score += 3

        if cert_days is not None and cert_days <= 3:
            score += 5
            reasons.append(f"certificate issued {cert_days}d ago")

    except Exception:
        pass  # Domain not yet serving HTTPS — not penalised

    return {
        "domain":  domain,
        "score":   min(score, 100),
        "risk":    "high"   if score >= 60 else
                   "medium" if score >= 35 else "low",
        "reasons": reasons,
    }


if __name__ == "__main__":
    import json
    print(json.dumps(score_domain("suspected-phishing.com"), indent=2))
Suggested alert thresholds:
  • Score ≥ 60 — High risk: block or quarantine automatically, notify analyst immediately.
  • Score 35–59 — Medium risk: log and watch, add to threat intelligence watchlist for 7 days.
  • Score < 35 — Low risk: no immediate action, continue passive monitoring.

Automating Detection at Scale

Running a single-domain check is a proof of concept. Production threat intelligence pipelines need to process hundreds or thousands of domains per day — from newly registered domain feeds, threat intelligence sharing platforms (MISP, OpenCTI), or internal logs of clicked URLs extracted from email gateways and proxy logs.

Key architecture considerations:

  • Rate limiting: The WhoisJSON API enforces per-minute rate limits. For bulk workloads, add a small delay between requests (3 per second is a safe sustained rate on paid plans) and handle 429 responses with exponential back-off. The Remaining-Requests response header shows your remaining monthly quota in real time.
  • Caching: WHOIS and DNS data do not change minute to minute. Cache responses by domain for at least 6 hours in Redis or a local database to avoid redundant API calls when the same suspicious domain appears multiple times in your pipeline.
  • Prioritise newly registered domains: Process the freshest domains first — they represent the highest-velocity threat. Feed newly registered domain lists directly into your scoring pipeline and deprioritise domains older than 90 days unless a specific trigger (clicked link, blocklist hit) warrants a fresh lookup.
  • Integration targets: Risk scores can be pushed to your SIEM as custom events, written to a threat intelligence feed (STIX/TAXII), or used to trigger firewall rule updates via webhook. Scores above the high-risk threshold should trigger an immediate alert — Slack, PagerDuty, or email — regardless of business hours.
For teams that want aggregated, pre-scored domain risk assessments without building the full pipeline, domainrisk.io combines WHOIS age, registration patterns, NS provider reputation, and lookalike proximity into a single scored output — no custom implementation required.

Conclusion

Phishing domains do not hide. They leave a clear fingerprint in public data: freshly registered, WHOIS contacts redacted, DNS sparse, and an SSL certificate provisioned hours after registration. No individual signal is conclusive. The combination is.

Four signals, three endpoints, one API key:

  • Signal #1 — Domain age: query /whois, read age.isNewlyRegistered and age.days.
  • Signal #2 — Registrant opacity: evaluate contacts.owner for redacted or absent fields.
  • Signal #3 — DNS anomalies: query /nslookup, check for absent MX, absent DMARC, and no SPF in TXT.
  • Signal #4 — SSL anomalies: query /ssl-cert-check, correlate issuer.O and valid_from against domain age.

All four signals are available from a single API key. The free tier — 1,000 requests per month, no credit card — is sufficient to validate the pipeline against your own threat feeds before scaling up.

For Developers

WHOIS, DNS, and SSL data via REST API. 1,000 free requests/month — no credit card.

Get API Key

For Security Teams

Pre-scored domain risk assessments without building the pipeline.

Try domainrisk.io
Phishing Detection & Domain Intelligence

Start Detecting Threats Before They Activate

Query WHOIS, DNS, and SSL data with a single API token. Build your phishing detection pipeline in minutes.

WHOIS + DNS + SSLStructured JSON responses1,000 free requests/month

Developers

Structured domain data via REST API. WHOIS, DNS, and SSL in one JSON response.

Get Free API Key

Security Teams

Aggregated risk scoring without building the pipeline. Pre-scored domain risk assessments.

Try domainrisk.io