Introduction
Phishing infrastructure moves fast. A domain is registered, weaponised, and taken down — often within 24 to 72 hours. At that velocity, manual analysis cannot scale. Security teams that rely on user-reported phishing URLs or reactive blocklist updates are always several hours behind the attack window. The targets are already in the inbox before the domain is blacklisted.
The better approach is to detect phishing domains before they become active threats. Every newly registered phishing domain leaves detectable traces in three public data layers: WHOIS registration data, DNS records, and the SSL certificate it provisions. Each layer alone provides a weak signal. Combining them produces a risk score strong enough to triage automatically at scale.
This guide covers the four technical signals that security engineers can query today, the API calls that return structured data for each, and a composable Python scoring function that ties them together into an actionable risk assessment. All examples use the WhoisJSON API — base URL code https://whoisjson.com/api/v1 | , authentication via the code Authorization: TOKEN=YOUR_API_KEY | header.
The Anatomy of a Phishing Domain
Phishing domains share a cluster of observable characteristics that distinguish them from legitimate registrations. Understanding these patterns is the prerequisite for building automated detection.
Freshly registered
The majority of phishing domains used in active campaigns were registered within the previous 30 days. Legitimate domains accumulate age over months and years. A domain under 30 days old is a meaningful baseline signal.
Privacy proxy or redacted contacts
Threat actors systematically hide registrant details behind privacy proxy services (Domains by Proxy, Withheld for Privacy). Absent or fully redacted WHOIS contacts are a consistent characteristic.
Cheap or abused TLD
Low-cost TLDs (.xyz, .top, .click, .online) are disproportionately represented in phishing feeds because they lower the cost of burning domains quickly. .com remains common for higher-quality brand impersonation campaigns.
Typosquatting the target brand
Character substitution (rn for m), extra hyphens, subdomain abuse (paypal.login-secure.xyz), and IDN homoglyph attacks. The domain visually or phonetically impersonates a known brand or institution.
These characteristics are detectable via structured API calls before the phishing campaign goes live. The key insight: most phishing domains provision SSL and configure DNS in the hours before the first email is sent. That window is the detection opportunity.
Signal #1 — Domain Age via WHOIS
Domain age is the single most predictive individual signal. The WhoisJSON API returns a pre-computed code age | object when the domain record is served via RDAP, eliminating the need to parse raw date strings and compute deltas manually.
/whois endpoint:age.isNewlyRegistered— boolean, true when the domain is 30 days old or lessage.isYoung— boolean, true when the domain is 365 days old or lessage.days— integer, exact age in days since creation datecreated— ISO timestamp of the original registration datesource— eitherwhoisorrdap. Theageobject is only present when source isrdap
cURL example:
curl -X GET "https://whoisjson.com/api/v1/whois?domain=suspected-phishing.com" \
-H "Authorization: TOKEN=YOUR_API_KEY"Python example — extracting domain age signals:
import requests
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"
def get_age_signals(domain: str) -> dict:
"""
Returns age signals from the WhoisJSON /whois endpoint.
The 'age' object is present only when source == 'rdap'.
Falls back to parsing 'created' manually for WHOIS-only TLDs.
"""
resp = requests.get(
f"{BASE_URL}/whois",
params={"domain": domain},
headers={"Authorization": f"TOKEN={API_KEY}"},
timeout=10,
)
resp.raise_for_status()
data = resp.json()
age = data.get("age") or {}
return {
"domain": domain,
"source": data.get("source"),
"created": data.get("created"),
"age_days": age.get("days"),
"is_newly_registered": age.get("isNewlyRegistered", False),
"is_young": age.get("isYoung", False),
}
result = get_age_signals("suspected-phishing.com")
print(result)
# {'domain': 'suspected-phishing.com', 'source': 'rdap',
# 'created': '2026-04-03 08:14:52', 'age_days': 4,
# 'is_newly_registered': True, 'is_young': True}When code is_newly_registered | is code True | , flag the domain immediately and queue it for further signal enrichment. Do not block on age alone — legitimate services spin up new domains regularly — but treat it as a mandatory precondition for escalation.
Signal #2 — Registrant Privacy and Missing Fields
When WHOIS contacts are not suppressed by GDPR or registry policy, the registrant object reveals how much the domain owner has disclosed. Phishing operators systematically use privacy proxy services or submit minimal, often fictitious, contact information.
The relevant fields come from the code contacts.owner | array in the WhoisJSON WHOIS response:
| Field | Phishing indicator |
|---|---|
organization | Absent, or set to a privacy proxy name (e.g. "Domains By Proxy, LLC", "Withheld for Privacy ehf") |
email | Absent, redacted, or a generic proxy address (e.g. "[email protected]") |
name | "Redacted for Privacy", "REDACTED FOR PRIVACY", or a clearly disposable identity |
country | Absent or inconsistent with the registrar's declared jurisdiction |
address | Absent or a clearly fake street address |
Scoring registrant opacity is straightforward: count the number of meaningful contact fields present and divide by the total expected. A score of 0.0 means fully redacted; 1.0 means all standard contact data is disclosed. Any score below 0.4 combined with a freshly registered domain is a strong combined signal.
PRIVACY_KEYWORDS = {"redacted", "withheld", "privacy", "proxy", "gdpr"}
def registrant_opacity_score(contacts: dict) -> float:
"""
Returns 0.0 (fully opaque) to 1.0 (fully transparent).
Pass the 'contacts' object from the WhoisJSON /whois response.
"""
owner_list = contacts.get("owner") or []
if not owner_list:
return 0.0
owner = owner_list[0]
fields = ["name", "organization", "email", "country", "address"]
filled = 0
for field in fields:
value = (owner.get(field) or "").lower()
if value and not any(kw in value for kw in PRIVACY_KEYWORDS):
filled += 1
return filled / len(fields)
Signal #3 — DNS Patterns
DNS records expose how a domain is configured operationally. Phishing domains tend to follow predictable patterns: no MX records (the campaign uses a third-party sending service or has no email at all), nameservers on cheap shared hosting or known bulletproof providers, and no published DMARC or SPF policy.
The WhoisJSON code /nslookup | endpoint returns structured DNS records in a single call: A, AAAA, MX, NS, TXT, CNAME, CAA, SOA, DMARC, BIMI, MTA-STS, and TLSRPT. For phishing detection, focus on MX, NS, DMARC, and TXT.
- MX absent or pointing to a free provider — no MX records means the domain is not configured to receive email legitimately. An MX pointing to Gmail or Outlook means the operator uses a free consumer account, not dedicated mail infrastructure.
- No DMARC record — legitimate sending domains increasingly publish DMARC. Its absence, combined with other signals, adds weight to a phishing classification.
- No SPF record in TXT — domains with no SPF cannot reliably claim to restrict who sends email on their behalf. Most phishing kits skip this configuration entirely.
- NS on shared or low-cost infrastructure — providers disproportionately present in phishing domains include parking name servers, free DNS providers, and bulletproof hosters.
Node.js example — DNS signal extraction:
const API_KEY = process.env.WHOISJSON_API_KEY;
const BASE_URL = 'https://whoisjson.com/api/v1';
const FREE_MX_PROVIDERS = [
'google.com', 'googlemail.com', 'outlook.com',
'hotmail.com', 'protonmail.ch', 'yandex.net'
];
async function getDnsSignals(domain) {
const res = await fetch(`${BASE_URL}/nslookup?domain=${domain}`, {
headers: { 'Authorization': `TOKEN=${API_KEY}` },
signal: AbortSignal.timeout(8000)
});
if (!res.ok) throw new Error(`DNS API error ${res.status} for ${domain}`);
const data = await res.json();
const mx = data.MX ?? [];
const ns = data.NS ?? [];
const txt = data.TXT ?? [];
const dmarc = data.DMARC ?? [];
const noMx = mx.length === 0;
const freeMx = mx.some(r =>
FREE_MX_PROVIDERS.some(p => (r.exchange ?? '').includes(p))
);
const noDmarc = dmarc.length === 0;
const noSpf = !txt.some(t => t.startsWith('v=spf1'));
return { domain, noMx, freeMx, noDmarc, noSpf, ns, mx };
}
getDnsSignals('suspected-phishing.com').then(console.log);
// {
// domain: 'suspected-phishing.com',
// noMx: true, freeMx: false,
// noDmarc: true, noSpf: true,
// ns: ['ns1.cheaphost.net', 'ns2.cheaphost.net'],
// mx: []
// }
Signal #4 — SSL Certificate Anomalies
A valid SSL certificate is not a trust indicator. Let's Encrypt issues certificates for free in seconds, with no manual vetting. Phishing operators routinely provision Let's Encrypt certificates to display the padlock icon in the browser address bar — a detail that still misleads many users into treating the site as legitimate.
The WhoisJSON code /ssl-cert-check | endpoint returns issuer details, validity window, Subject Alternative Names, and full certificate metadata for any domain.
- Let's Encrypt on a domain under 30 days old — Let's Encrypt alone is weak evidence. Combined with a freshly registered domain and sparse DNS, it is a meaningful composite signal.
- Certificate issued within hours or days of domain registration — fast provisioning after registration is a hallmark of automated phishing kit deployment. Check
valid_fromagainst the WHOIScreateddate. - Subject Alternative Names containing brand keywords — a SAN covering
paypal-secure-login.xyzalongside brand-related terms is a strong impersonation signal.
import requests
from datetime import datetime, timezone
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"
def get_ssl_signals(domain: str) -> dict:
resp = requests.get(
f"{BASE_URL}/ssl-cert-check",
params={"domain": domain},
headers={"Authorization": f"TOKEN={API_KEY}"},
timeout=10,
)
resp.raise_for_status()
data = resp.json()
issuer_org = (data.get("issuer") or {}).get("O", "")
valid_from = data.get("valid_from") # ISO string, e.g. "2026-04-03T10:22:00.000Z"
valid = data.get("valid", False)
san = (data.get("details") or {}).get("subjectaltname", "")
cert_age_days = None
if valid_from:
issued = datetime.fromisoformat(valid_from.replace("Z", "+00:00"))
cert_age_days = (datetime.now(timezone.utc) - issued).days
return {
"domain": domain,
"issuer_org": issuer_org,
"is_lets_encrypt": "Let's Encrypt" in issuer_org,
"cert_age_days": cert_age_days,
"valid": valid,
"san": san,
}
Combining Signals: A Risk Scoring Function
No individual signal is sufficient to classify a domain as malicious. A newly registered domain could be a legitimate startup. A Let's Encrypt certificate is used by millions of legitimate sites. A domain with no DMARC record may simply belong to an organisation that has not yet configured email authentication. The detection value comes from combining signals: each weak indicator multiplies the evidence weight of the others to produce a composite risk score significantly more reliable than any single check.
The following Python function combines all four signals into a score from 0 to 100:
"""
Phishing domain risk scorer — WhoisJSON API
Requires: pip install requests
Score breakdown (max 100):
Signal #1 — Domain age : up to 40 points
Signal #2 — Registrant opac. : up to 20 points
Signal #3 — DNS anomalies : up to 25 points
Signal #4 — SSL anomalies : up to 15 points
"""
import requests
from datetime import datetime, timezone
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://whoisjson.com/api/v1"
HEADERS = {"Authorization": f"TOKEN={API_KEY}"}
FREE_MX = {"google.com", "googlemail.com", "outlook.com",
"hotmail.com", "protonmail.ch", "yandex.net"}
PRIVACY_KW = {"redacted", "withheld", "privacy", "proxy", "gdpr"}
def _fetch(endpoint, domain):
r = requests.get(
f"{BASE_URL}/{endpoint}",
params={"domain": domain},
headers=HEADERS,
timeout=10,
)
r.raise_for_status()
return r.json()
def score_domain(domain: str) -> dict:
score = 0
reasons = []
# ── Signal #1: Domain age (up to 40 pts) ─────────────────────────────
whois = _fetch("whois", domain)
age = whois.get("age") or {}
age_days = age.get("days")
if age.get("isNewlyRegistered"): # <= 30 days
score += 40
reasons.append("newly registered (<=30d)")
elif age_days is not None and age_days <= 90:
score += 25
reasons.append(f"young domain ({age_days}d)")
elif age.get("isYoung"): # <= 365 days
score += 10
reasons.append("domain < 1 year")
# ── Signal #2: Registrant opacity (up to 20 pts) ─────────────────────
contacts = whois.get("contacts") or {}
owners = contacts.get("owner") or []
if not owners:
score += 20
reasons.append("no registrant data")
else:
owner = owners[0]
fields = ["name", "organization", "email", "country", "address"]
filled = sum(
1 for f in fields
if (owner.get(f) or "").lower()
and not any(kw in (owner.get(f) or "").lower()
for kw in PRIVACY_KW)
)
opacity = 1 - (filled / len(fields))
pts = round(opacity * 20)
score += pts
if opacity > 0.5:
reasons.append(f"registrant {opacity:.0%} opaque")
# ── Signal #3: DNS anomalies (up to 25 pts) ──────────────────────────
dns = _fetch("nslookup", domain)
mx = dns.get("MX") or []
txt = dns.get("TXT") or []
dmarc = dns.get("DMARC") or []
if not mx:
score += 10
reasons.append("no MX records")
elif any(p in (r.get("exchange", "") for r in mx)
for p in FREE_MX):
score += 5
reasons.append("MX on free provider")
if not dmarc:
score += 10
reasons.append("no DMARC record")
if not any(t.startswith("v=spf1") for t in txt):
score += 5
reasons.append("no SPF record")
# ── Signal #4: SSL anomalies (up to 15 pts) ──────────────────────────
try:
ssl = _fetch("ssl-cert-check", domain)
issuer = (ssl.get("issuer") or {}).get("O", "")
vf = ssl.get("valid_from")
cert_days = None
if vf:
issued = datetime.fromisoformat(vf.replace("Z", "+00:00"))
cert_days = (datetime.now(timezone.utc) - issued).days
if "Let's Encrypt" in issuer and (age_days or 9999) <= 30:
score += 10
reasons.append("Let's Encrypt on newly registered domain")
elif "Let's Encrypt" in issuer:
score += 3
if cert_days is not None and cert_days <= 3:
score += 5
reasons.append(f"certificate issued {cert_days}d ago")
except Exception:
pass # Domain not yet serving HTTPS — not penalised
return {
"domain": domain,
"score": min(score, 100),
"risk": "high" if score >= 60 else
"medium" if score >= 35 else "low",
"reasons": reasons,
}
if __name__ == "__main__":
import json
print(json.dumps(score_domain("suspected-phishing.com"), indent=2))- Score ≥ 60 — High risk: block or quarantine automatically, notify analyst immediately.
- Score 35–59 — Medium risk: log and watch, add to threat intelligence watchlist for 7 days.
- Score < 35 — Low risk: no immediate action, continue passive monitoring.
Automating Detection at Scale
Running a single-domain check is a proof of concept. Production threat intelligence pipelines need to process hundreds or thousands of domains per day — from newly registered domain feeds, threat intelligence sharing platforms (MISP, OpenCTI), or internal logs of clicked URLs extracted from email gateways and proxy logs.
Key architecture considerations:
- Rate limiting: The WhoisJSON API enforces per-minute rate limits. For bulk workloads, add a small delay between requests (3 per second is a safe sustained rate on paid plans) and handle 429 responses with exponential back-off. The
Remaining-Requestsresponse header shows your remaining monthly quota in real time. - Caching: WHOIS and DNS data do not change minute to minute. Cache responses by domain for at least 6 hours in Redis or a local database to avoid redundant API calls when the same suspicious domain appears multiple times in your pipeline.
- Prioritise newly registered domains: Process the freshest domains first — they represent the highest-velocity threat. Feed newly registered domain lists directly into your scoring pipeline and deprioritise domains older than 90 days unless a specific trigger (clicked link, blocklist hit) warrants a fresh lookup.
- Integration targets: Risk scores can be pushed to your SIEM as custom events, written to a threat intelligence feed (STIX/TAXII), or used to trigger firewall rule updates via webhook. Scores above the high-risk threshold should trigger an immediate alert — Slack, PagerDuty, or email — regardless of business hours.
Conclusion
Phishing domains do not hide. They leave a clear fingerprint in public data: freshly registered, WHOIS contacts redacted, DNS sparse, and an SSL certificate provisioned hours after registration. No individual signal is conclusive. The combination is.
Four signals, three endpoints, one API key:
- Signal #1 — Domain age: query
/whois, readage.isNewlyRegisteredandage.days. - Signal #2 — Registrant opacity: evaluate
contacts.ownerfor redacted or absent fields. - Signal #3 — DNS anomalies: query
/nslookup, check for absent MX, absent DMARC, and no SPF in TXT. - Signal #4 — SSL anomalies: query
/ssl-cert-check, correlateissuer.Oandvalid_fromagainst domain age.
All four signals are available from a single API key. The free tier — 1,000 requests per month, no credit card — is sufficient to validate the pipeline against your own threat feeds before scaling up.
For Developers
WHOIS, DNS, and SSL data via REST API. 1,000 free requests/month — no credit card.
Get API KeyFor Security Teams
Pre-scored domain risk assessments without building the pipeline.
Try domainrisk.io