What it is
Discover who is really behind your visitors
This way you can —automagically— discover who is really behind your visitors: governments, police, military, intelligence, banks, universities, AI crawlers (such as Anthropic), cloud/hosting providers, datacenters... by matching their WHOIS organization against the words listed on keywords.txt.
You can set several folders of logs on sources.txt to —sneak— all your projects at once (compressed .gz logs are also analyzed). Finally you can generate a report with the results or send them to a list of email recipients like an alert.
Who it is for
For people who want to know who is watching
Sysadmins & self-hosters
See which organizations hit your boxes, not just anonymous IPs.
Privacy & infosec researchers
Spot government, intelligence or law-enforcement ranges in your traffic.
Bloggers & activists
Detect surveillance, scraping or unwanted institutional attention.
Journalists
See who is probing your site — institutions, state-linked actors or sources.
AI-wary site owners
Catch AI crawlers (Anthropic & friends) harvesting your content.
OSINT & pentesters
Attribute visits to the real owning organization for recon and reporting.
SOC & CERT teams
Triage incidents by attributing source IPs to their owning organization.
Webmasters & SEOs
Tell genuine visitors from crawlers, scrapers and bots hitting your site.
How it works
From a log line to a name
Sends 'dogs' to read every folder listed on sources.txt, compressed .gz logs included.
For every visitor IP it queries its WHOIS record to get the network description — the organization that owns that IP range.
Looks for the specific keywords listed on keywords.txt inside that network description.
Generate a report with the results or send them to a list of email recipients like an alert.
Install
Up and running in a minute
It requires Python 3 and the ipwhois library.
sudo apt-get install python3-pip
pip3 install ipwhois --userOr pin it with pip3 install -r docs/requirements.txt --user.
wget https://pydog4apache.03c8.net/pydog4apache/pydog4apache-0.3.tar.gz
tar xzf pydog4apache-0.3.tar.gz
cd pydog4apache-0.3
python3 pydog4apache -vConfigure
sources.txt— one folder per line with the logs to analyze (default:/var/log/apache2).keywords.txt— one keyword per line to look for in the WHOIS description (case-insensitive, substring).
Options
-u, --update check for latest stable version
-v, --verbose active verbose output
-f, --flush-cache delete whois cache (re-query all IPs)
-r FILE generate a report file with the 'visitants'
-n EMAILS notify via email (a@x.net,b@y.org,...)Examples
- Verbose output:
python3 pydog4apache -v - Write a report:
python3 pydog4apache -r visitants.txt - Email alert:
python3 pydog4apache -n root@localhost - Combine:
python3 pydog4apache -v -r visitants.txt -n you@mail.org - Re-query every IP:
python3 pydog4apache --flush-cache
Web server setup
Point it at your Apache or Nginx logs
PyDog4Apache parses access logs whose line begins with the client IP — the standard combined format on both servers.
Apache2
In /etc/apache2/apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %O" combined
CustomLog ${APACHE_LOG_DIR}/access.log combinedThen point sources.txt to:
/var/log/apache2Nginx
In the http{} block of /etc/nginx/nginx.conf:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
access_log /var/log/nginx/access.log combined;Then point sources.txt to:
/var/log/nginxDownload
Get PyDog4Apache
Released under the GNU GPLv3. Clone from Git to use the built-in --update.