25 Jan 2026
When .onion Sites Leak Their Real Hosting
Headers, certs, favicons, and the occasional staging box left wide open.
I keep running into the same mistake on ransomware leak sites and older marketplaces: Tor is doing its job, and the web stack behind it is not. You pull headers through SOCKS, grep Shodan for an ETag or cert serial, and suddenly you are looking at a clearnet IP that was never supposed to exist in public indexes.
None of this needs relay compromise or exploits. It is mostly reading what operators already mirrored onto the open internet.
How Tor Actually Fails Here
Hidden services terminate at localhost. nginx or Apache answers on
127.0.0.1, Tor forwards the circuit, and that is the whole
anonymity boundary for a lot of deployments.
If the same daemon also listens on 0.0.0.0, or if
someone copies the same cert and static files to a VPS, the fingerprint
is shared. Shodan does not care which path you used to generate the
response.
Juarez et al.'s CARONTE paper (CCS 2015) crawled 1,974 onions and tied 101 back to clearnet IPs (87 unique addresses, 163 domains). What surprised me reading it: about half of those mappings were intentional. Operators linked the onion to clearnet on purpose. The rest were leaks they probably did not know about.
HTTP Headers
First step for me is always:
curl -s -I --socks5-hostname 127.0.0.1:9050 \
http://ed8kd8cslf2349d[.]onionEvery header value is a candidate search string. Most will be useless alone.
Server banners
Default Apache on CentOS still vomits the full build string:
Server: Apache/2.4.6 (CentOS) mpm-itk/2.4.7-04 OpenSSL/1.0.2k-fips PHP/5.4.16
X-Powered-By: PHP/5.4.16
Too common for a one-shot hit. Still shrinks the search space when you stack it with something weirder.
Nginx is tamer:
Server: nginx/1.18.0 (Ubuntu)
product:"nginx" version:"1.18.0" os:"Ubuntu" port:80
ETag and Ragnar Locker
SttyK's Capcom work from November 2020 is the example I point people at when they ask if headers matter.
Ragnar's file-hosting onion returned a boring Apache
Server line and an ETag that was not boring at all:
HTTP/1.1 304 Not Modified
Date: Wed, 11 Nov 2020 17:09:12 GMT
Server: Apache/2.4.6 (CentOS) mpm-itk/2.4.7-04 OpenSSL/1.0.2k-fips PHP/5.4.16
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
ETag: "0-5a4a8aa76f2f0"
Apache builds ETags from inode, size, and mtime. Same empty index file on the onion host and on a clearnet mirror means the same tag.
Shodan for ETag: "0-5a4a8aa76f2f0" came back with one
host:
5[.]45[.]65[.]52
Same filenames on both surfaces. FBI Flash IC3-22-030712 later lined up with that IP.
If you host identical static content on Tor and on a public VPS, do not assume the ETag will differ.
Date and timezone
Date is supposed to be UTC. Sometimes it is not:
Date: Wed, 11 Nov 2020 22:39:12 +0530
That is only a timezone hint. It gets more interesting if you also look at hidden-service descriptor upload timing (descriptors refresh on a schedule and the hour can drift with local clock). I treat it as a filter, not a smoking gun.
Silk Road
FBI SA Christopher Tarbell's 2014 declaration in US v. Ulbricht says investigators saw a non-Tor source IP in packet headers from the login page CAPTCHA flow. Typing that IP into a normal browser showed the same CAPTCHA. Box was in Iceland.
Defense disputed the story. I am not re-litigating that. The useful part for infra work is narrower: misbound apps and bad proxy configs can spit clearnet addresses into HTTP metadata. That class of bug shows up in other cases too.
TLS Certs
Shodan and Censys photograph port 443 across the whole IPv4 space. Reuse a self-signed cert on the onion portal and the clearnet admin panel and the link is already indexed.
Helper I reuse (same idea as the WhatsApp phishing post):
import hashlib
import socket
import ssl
from cryptography import x509
from cryptography.hazmat.primitives import serialization
def grab_cert_serial(host, port=443):
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with socket.create_connection((host, port), timeout=8) as s:
with ctx.wrap_socket(s, server_hostname=host) as tls:
raw = tls.getpeercert(binary_form=True)
cert = x509.load_der_x509_certificate(raw)
serial_hex = format(cert.serial_number, "x")
spki = cert.public_key().public_bytes(
serialization.Encoding.DER,
serialization.PublicFormat.SubjectPublicKeyInfo,
)
return {
"serial": serial_hex,
"spki_sha256": hashlib.sha256(spki).hexdigest(),
"subject": cert.subject.rfc4514_string(),
"issuer": cert.issuer.rfc4514_string(),
}Serial search:
ssl.cert.serial:<serial_hex>
SPKI is better if they reissue the cert but keep the key:
services.tls.certificates.leaf_data.subject_key_info.fingerprint_sha256: "<hash>"
Broad hunt for onions baked into SANs on clearnet:
ssl:".onion" port:443
Yonathan Klijnsma told BleepingComputer in 2018 he kept finding these. Some hosts only had the cert on 443 and nothing on 80.
DarkAngels (Talos, 2022)
Talos pulled the victim portal cert from the Tor site, searched the
serial on Shodan, landed on 89[.]38[.]225[.]166 (M247,
AS9009). Same self-signed material on clearnet. Page loaded the chat UI,
countdown, negotiation flow. Related names included
login.myob[.]link and myob[.]live. There was
also a public .env with APP_SECRET and a DB
string on that host, which is a separate kind of disaster.
Snatch
Snatch put snatch[.]press on the onion page. Umbrella
data showed the A record hopping almost daily. The cert serial on
snatch[.]press still tied back to two stable Swedish
hosting IPs. Rotating domains did not rotate the key material.
Quantum (favicon, same Talos write-up)
Quantum's leak blog favicon hashed to a single Shodan hit at
185[.]38[.]185[.]32 (AS60781, NL). Soufiane Tahiri
(@S0ufi4n3) reported the same finding independently. Check Point later
automated favicon + cert hunts across a bunch of ransomware
families.
Favicons
Shodan stores mmh3 of favicon.ico (base64 the bytes
first, same as their indexer):
import base64
import codecs
import mmh3
import requests
def favicon_hash(onion_host, proxy="socks5h://127.0.0.1:9050"):
proxies = {"http": proxy, "https": proxy}
r = requests.get(
f"http://{onion_host}/favicon.ico",
proxies=proxies,
timeout=30,
)
favicon_b64 = codecs.encode(r.content, "base64")
return mmh3.hash(favicon_b64)http.favicon.hash:<value>
Works for any static asset that survives unchanged on both sides:
custom 404 pages, CSS, bundled JS with a fixed string, fonts.
ssdeep helps when they tweak bytes but keep layout.
Clock Skew
Murdoch's CCS 2006 paper ("Hot or Not") is the weird one. TCP timestamps drift with temperature. Load the onion, CPU warms up, oscillator shifts, TSval on clearnet candidates in the same /24 moves with you if you are hitting the right box.
I have not run this end-to-end myself on a live ransomware host. It needs a candidate range first, which usually means you already got there via cert or banner work. Zander and Murdoch (USENIX 2008) tightened the sampling so high-latency paths are less painful. Cloud VMs are messier than bare metal in a dedicated cage.
Staging and Sloppy Ops
Hansa
Before Bayonet in 2017, someone found a Hansa staging VPS in the Netherlands with no auth on the public internet. Marketplace source, Tor frontend config, DB dumps, deploy scripts. Dutch police used it as the wedge and then ran the market undercover for weeks after AlphaBay went down.
That was not a clever pivot. It was nginx on 443 in a scan result.
Nokoyawa (Talos, 2022)
Victim chat .onion had a traversal bug in file=:
download?id=...&file=../../../../var/log/auth.log&type=download_upload
Process ran as root. auth.log showed SSH from
5[.]230[.]29[.]12 (German VPS hop) and
176[.]119[.]0[.]195 where someone skipped the proxy
once.
How I Chain It
Rough mental model:
.onion
-> curl -I via Tor (Server, ETag, Date)
-> cert serial / SPKI / SANs
-> favicon mmh3
-> optional clock skew on a /24 you already suspect
-> staging host, .env, logs if you are unlucky for them
One Shodan hit is interesting. Five unrelated signals on the same ASN is usually the same operator being lazy.
After an IP pops, I hit InternetDB (no API key):
curl -s https://internetdb.shodan.io/89.38.225.166 | python3 -m json.toolHostnames from there often give you the next cert to pull.
Script
Dump headers, TLS fields, and favicon hash, print Shodan strings:
#!/usr/bin/env python3
"""onion_fingerprint.py"""
import codecs
import hashlib
import socket
import ssl
import sys
import requests
from cryptography import x509
from cryptography.hazmat.primitives import serialization
SOCKS = {
"http": "socks5h://127.0.0.1:9050",
"https": "socks5h://127.0.0.1:9050",
}
def get_headers(host):
r = requests.get(f"http://{host}", proxies=SOCKS, timeout=30)
return dict(r.headers)
def get_tls_artifacts(host, port=443):
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with socket.create_connection((host, port), timeout=10) as s:
with ctx.wrap_socket(s, server_hostname=host) as tls:
raw = tls.getpeercert(binary_form=True)
cert = x509.load_der_x509_certificate(raw)
pub = cert.public_key().public_bytes(
serialization.Encoding.DER,
serialization.PublicFormat.SubjectPublicKeyInfo,
)
sans = []
try:
san_ext = cert.extensions.get_extension_for_class(x509.SubjectAlternativeName)
sans = san_ext.value.get_values_for_type(x509.DNSName)
except x509.ExtensionNotFound:
pass
return {
"serial_hex": format(cert.serial_number, "x"),
"spki_sha256": hashlib.sha256(pub).hexdigest(),
"sans": sans,
}
def get_favicon_hash(host):
try:
import mmh3
r = requests.get(f"http://{host}/favicon.ico", proxies=SOCKS, timeout=30)
favicon_b64 = codecs.encode(r.content, "base64")
return mmh3.hash(favicon_b64)
except Exception:
return None
def main(host):
hdrs = get_headers(host)
print("[headers]")
print(f" Server: {hdrs.get('Server', '')}")
print(f" ETag: {hdrs.get('ETag', '').strip('\"')}")
fav = get_favicon_hash(host)
print("\n[shodan]")
etag = hdrs.get("ETag", "").strip('"')
if etag:
print(f' http.html:"{etag}"')
if fav is not None:
print(f" http.favicon.hash:{fav}")
tls = get_tls_artifacts(host)
print("\n[tls]")
print(f" serial: {tls['serial_hex']}")
print(f" spki: {tls['spki_sha256']}")
print(f" sans: {tls['sans']}")
print(f" ssl.cert.serial:{tls['serial_hex']}")
if __name__ == "__main__":
main(sys.argv[1])Paste the printed queries into Shodan. One result is usually the origin. Ten results means keep pivoting hostnames from InternetDB until something converges.
Shared hosting and stale DNS still fool you. I log everything as a lead until two independent artifacts agree.
Closing
CARONTE's split is still how I think about these takedowns: a lot of operators literally published the onion-to-clearnet mapping, and the rest leaked artifacts they never sanitized. Tor was fine. nginx configs were not.
Talos has good public walkthroughs for DarkAngels, Quantum, and Nokoyawa. SttyK's ETag write-up on Ragnar is worth reading if you have not seen it. Murdoch (2006) and Juarez et al. (2015) for the older academic side. Tarbell's declaration if you want the Silk Road primary source.
For intel work the output is boring in a useful way: ASN, provider, co-hosted names, cert clusters. That is the graph you hand off before anyone talks about warrants.