Overview

Bubblegum proxypot (current version: 20030508) <proxypot>:

   1. Pretends to be a SOCKS 4, SOCKS 5, or HTTP CONNECT proxy, allowing
      connections to SMTP servers or other proxy servers (forming a
      proxy chain)
   2. Can fake everything internally, or actually proxy traffic up to
      the SMTP DATA command, and just fake that.
   3. Configurable allow/deny table allows "real" and "fake" connections
      to be applied based on the source and destination.
   4. Rate-limits everything so even when it's really proxying, it can't
      overload your system or anyone else's.
   5. Logs everything in a grep-friendly file.
   6. Optionally delivers attempted SMTP messages to an mbox or maildir.
   7. Requires nothing but perl and the modules that come with perl.
   8. Runs with perl -T (Don't you wish everyone did?) 


  Rationale

On the Internet there are some bad people. Bad people like to do bad
things, and they don't like to get caught. One way they avoid getting
caught is by using open proxy servers.

An *open proxy* is a server that forwards Internet connections from
anywhere to anywhere, no questions asked. If you want to do something
bad, and don't want to get caught, all you have to do is find an open
proxy and tell it to do it. Nobody will know who did it, except the open
proxy, and even there the records are usually short-lived or nonexistent.

An open proxy honeypot (*proxypot*) is a server that pretends to be an
open proxy, taking requests from bad people to do bad things, and
responding with a simulation instead of doing the evil deed. The goal is
to fool the bad people into thinking they've done their bad thing and
got away with it, while actually they didn't do it, and they got caught
anyway!

The proxypot found here is designed primarily to catch one kind of
Internet bad guy: the mail spammer.


  Notes for potential users

The proxypot might fool white-hat testers (I know it fools undernet, and
dalnet is apparently fooled by simply accepting a connection on port
1080), resulting in a blacklisting, so running it on an IP address that
needs to maintain good connectivity to outside SMTP, NNTP, or IRC
servers is a bad idea.

ISPs which are concerned about their public image may be displeased with
customers who run convincing proxypots.

Modify your proxypot! Or write your own! The more variation among
proxypots, the harder it will be for abusers to detect them.

The proxypot's development platform is Linux, but because it uses only
core perl features, it should run on any reasonable system. There are
reports that it even runs on some unreasonable ones, with cygwin.

Perl 5.8 has a memory leak in socketpair(), which will cause the
proxypot to eat all your memory, increasing at a rate of about 8K per
incoming connection. You're better off using perl 5.6.1 instead. (Or
5.8.1 if it has been released by the time you read this.)


  Extras

deliverone
    delivers a message from a proxypot maildir to the intended
    recipient. If a bad guy is testing your proxypot by sending a
    message to himself, he won't be fooled unless he gets the message.
    You can find messages like this by looking for clients that only
    send one or two messages, and for subject lines containing the
    proxypot's IP address (sometimes obfuscated). When you've found one,
    deliver it:
    |deliverone /home/spamdump/Maildir/cur/1053539016.3026.pp:2,CX|
    For maximum effectiveness, run this command on the same host and
    under the same userid that the proxypot runs. 
log2mbox
    reads a proxypot log, and builds an mbox or maildir. Attempts to
    match the output that the proxypot would have created if it had been
    configured to delivery directly to the mbox or maildir. This is only
    intended for use on old log files, from before proxypot had the
    ability to do its own delivery. It has some big restrictions: don't
    give it more log than you can fit into memory, and don't direct its
    output to an mbox or maildir that isn't empty and idle. Sample usage:
    |log2mbox /var/log/OLD/proxypot.1050* > $HOME/Mailbox.pp|
    or
    |mkdir $HOME/Maildir.pp && log2mbox -d $HOME/Maildir.pp
    /var/log/OLD/proxypot.1050*| 
spamstat
    reads a maildir full of messages written by proxypot, and compiles a
    database of source IP addresses, spamvertized web sites, mail drops,
    and phone numbers. From the database you can generate reports about
    where spam is coming from and where it is asking its victims to go.

    Sample usage: |spamstat /home/spamdump/Maildir | more| (assuming
    proxypot is configured with |$maildir='/home/spamdump/Maildir';|).
    Each time you run *spamstat*, it reloads the statistics it generated
    previously, and adds in the new messages that have been received
    since. There are many options; try |--all --verbose| for a very
    detailed report, and |--help| to see what else is available.

    If you need further motivation to try ou spamstat, check out this
    report <report.gz>, showing lots of interesting data from my own
    proxypot. It contains the vital statistics of over 800,000 spam
    messages - 2 gigs worth! The report uncompresses to about 2 megs. It
    was generated with |spamstat --verbose --all --safe|

    *spamstat* does some pretty heavy content analysis, and it needs
    lots of perl modules. The following modules are needed, and do not
    come with perl 5.6, so you will have to download them if you haven't
    already installed them for some other purpose. It's also a good idea
    to have perl compiled with 64-bit integer support (check with |perl
    -V:use64bitint|)

        * Storable
        * MIME::Parser
        * HTML::Parser version 3.28 or better
        * HTML::LinkExtor
        * HTML::TreeBuilder
        * URI::Heuristic
        * Net::DNS 


  Coming developments

*spamstat* is just the beginning of statistic-gathering. Eventually
proxypot itself should generate some statistics like:

    * min/avg/max messages per SMTP session
    * min/avg/max proxy chain length 

and *spamstat* should calculate things like:

    * number of unique recipient addresses 

Besides delivery to an mbox or maildir, another useful option would be
delivery to a pipe. This would allow for example the automatic feeding
of messages to filters that work by comparing unknown messages to known
spam. This feature may not be necessary though, because it's already
easy to pick up messages periodically from an mbox or maildir and feed
them to whatever program you like.

Since the goal of the proxypot is to expose spammers, it would make
sense to provide a public interface to query the known spam messages and
their source IP addresses. This must not be unlimited access to the full
proxypot log, because that would mean additional exposure of innocent
victim addresses, and might provide clues to help spammers find out
which proxies are fake. spamstat already has the --safe option to
generate a report that would be useful for this purpose.

There should be some way to reread configuration data on SIGHUP. It
makes things so much more complicated though...

There are many other ideas for improvements, contained within the code,
makred by the word |TODO|.


  Tips

If some client is hammering your proxypot with too many connections, it
can help to drop some of the packets with a kernel-based packet filter
in addition to the simultaneous connection limit imposed by the
proxypot. I use this set of iptables rules to limit incoming proxy
connections to 1 per second, with an initial burst of 5:

iptables -F INPUT
iptables -P INPUT ACCEPT
iptables -A INPUT -p tcp --syn \
                  -m multiport --dports 1080,3128,8080 \
                  -m limit --limit 1/second --limit-burst 5 \
                  -j ACCEPT
iptables -A INPUT -p tcp --syn \
                  -m multiport --dports 1080,3128,8080 \
                  -j DROP


  News

25 Jul 2003 - minor fixes
22 May 2003 - sample report published, deliverone published
21 May 2003 - spamstat gets MUCH cooler, and also much bigger.
8 May 2003 - spamstat gets cooler, maildir delivery gets safer, log2mbox
gets published


  Download


    Quick start (if you're lucky)

( snarf http://world.std.com/~pacman/proxypot ||
wget http://world.std.com/~pacman/proxypot ||
lynx -source http://world.std.com/~pacman/proxypot > proxypot ) &&
chmod 700 proxypot &&
nohup ./proxypot &


    Detailed instructions

Download http://world.std.com/~pacman/proxypot <proxypot>. Edit it.
Between the line that says "Configure:" and the one that says "Stop
configuring", you will find all the tunable parameters of the proxypot,
and explanations of what they do. Read the explanations and make the
necessary changes, then install the program wherever you normally
install your programs, and run it in the background.

Watch the log file. If you don't see "Bubblegum proxypot NNNNNNNN
starting" shortly after you run it, it's not working. Try running it in
the foreground so it has a better chance of getting error messages to
you. Once it has started and the log is working, put it in the
background again. All further errors will be sent to the log.

If it starts up, check back on the log file once in a while. If you have
enabled mbox or maildir output, configure an MUA (mailreader) to check
the mbox or maildir. It might take a while for bad guys to discover your
proxypot and start using it, so be patient. You might want to add
proxypot to your system startup files. Pretty soon you'll be getting
such a flood you won't be able to make sense of it. (That's why there's
spamstat <#spamstat>.)

                                                                                                                                                                                                                                                                                                                                                                                                   