Saturday, June 18, 2011

Nightly Nmap scans with Ndiff

I was recently playing around with running Nmap scans from a cron job, and I thought I could do it better than I was. Here's what I was doing before:

#m h dom mon dow command
0  3 *   *   *   /usr/local/bin/nmap -v --open -oA /root/nmap/lan-\%y\%m\%d

So this would run every night at 3am, performing a verbose TCP SYN scan of my network, showing only open ports, and creating output files in all 3 formats (Normal, XML, and Greppable) in the /root/nmap/ directory. Just getting this far presented some challenges, since I was unfamiliar with some aspects of the crontab file format:

  • Commands run by cron have their environment stripped down for security reasons. Specifically, the PATH variable is set to /usr/bin:/bin, which is pretty restrictive. Since cron just logs that it ran the command, and not the output of the command, I was very confused as to why the logs showed it being run, but no output was generated.
  • Percent signs (%) are interpreted as newlines by cron. Anything after the first line is passed to the command on STDIN, similar to a here-doc in shell programming. To pass the time format specifiers to Nmap, I needed to escape the percent signs with backslashes.

So this was pretty good, but it left a lot to be desired. To get an idea of what had changed, I needed to manually run an Ndiff on the last two scans. Also, I wasn't taking advantage of Nmap's advanced version detection capabilities. So I decided to automate the diffing process and do a follow-up in-depth scan of new services I detected.

To schedule a complicated job like this, I needed to move the logic out of the crontab and into a shell script. I broke the task down into 3 basic steps:

  1. Scan the network
  2. Perform a diff
  3. Scan new stuff for version information

In order to make it worthwhile to scan things twice, I wanted my first scan to be fast. I decided early to ignore UDP ports, since scanning firewalled hosts for UDP can take hours. I also decided to use a more aggressive timing template. Nmap runs at T3 by default, but since all of my targets are just one hop away, I can easily bump that up to T4. I don't consider T5 to be worth the possible loss in accuracy, but for such a small network, it could have been useful. Finally, since I will only be looking at differences, I don't need all the extra output files, just the XML. Here's the command to do all that:

nmap -v --open -T4 -oX lan-%y%m%d

Next, I needed to do a diff. Nmap ships with a great tool called Ndiff, which is written in Python. It takes two Nmap XML files and generates a text or XML diff. This was a tricky decision: I wanted to be able to review the diff every morning, so text output would be best for that. But I also wanted to have my script scan all the new hosts and services, which meant parsing the output. Luckily, I have done some development work on Ndiff, so I knew that it would have the whole diff in a data structure before printing it. I just needed to run through it and pull out the new stuff.

Ndiff, like any well-written Python program, consists of a bunch of class and function definitions, and a conditional statement to run the main function if the program is run as a program, not imported as a module. This ensures there are no side-effects if it IS imported, which I planned on doing. I started by making a symlink to the ndiff program in my working directory

ln -s /usr/local/bin/ndiff

I tried using the PYTHONPATH environment variable set to /usr/local/bin, but Ndiff is not installed with a .py extension, so the interpreter complained that it couldn't find the ndiff module. The symlink ends up being the way to go here.

Next, I fired up vim and began my program,

#!/usr/bin/env python

from ndiff import *

def main():

if __name__ == "__main__":

Not a lot of functionality yet. I wanted a similar invocation to the ndiff program itself, so I started by copying the main function from ndiff and stripping out the options I didn't need: help, text, and xml.

def main():
    global verbose
    diffout = "diff.xml"
    cmdout = ""

        opts, input_filenames = getopt.gnu_getopt(sys.argv[1:],
            "hv", ["verbose", "diffout=", "cmdout="])
    except getopt.GetoptError, e:
    for o, a in opts:
        if o == "--diffout":
            diffout = a
        elif o == "--cmdout":
            cmdout = a
        elif o == "-v" or o == "--verbose":
            verbose = True

    if len(input_filenames) != 2:
        usage_error(u"need exactly two input filenames.")

    filename_a = input_filenames[0]
    filename_b = input_filenames[1]

        scan_a = Scan()
        scan_b = Scan()
    except IOError, e:
        print >> sys.stderr, u"Can't open file: %s" % str(e)

    diff = ScanDiff(scan_a, scan_b)

So at this point, the main function doesn't produce any output. It just creates a ScanDiff object from the two scans. The original ndiff.main function just prints out the text or XML representation of that object, but I wanted more. I wanted a list of new hosts and ports, so that I could generate a shell script to do the details scan. Here's what I wanted the shell script to look like:

test -z "$1" && OUTFILE=$1
nmap -v -p $PORTS -sV -sC -oA $1 $TARGETS

The first two lines set up a default output filename but let me pass a different one as the first argument ($1). I debated using the -A or -O flags (which would both add Operating System fingerprinting), but since I'm only scanning ports that I know are open, OS fingerprinting wouldn't be as accurate. Nmap needs both open and closed ports to get a complete fingerprint.

Back in, I needed to build a list of targets and ports. Targets would just be a subset of the first scan's results, which would not include duplicates, so I can use a list to hold them. Ports, on the other hand, could show up on multiple targets. I only want to specify each port once, though, so I stored them as keys to a dictionary, which guarantees no duplicates.

    targets = []
    ports = {}

    if diff.cost > 0:
        for host,h_diff in diff.host_diffs.iteritems():
            if h_diff.cost > 0 and h_diff.host_b.state == "up":
                scan_host = False
                for port,p_diff in h_diff.port_diffs.iteritems():
                    if (p_diff.port_a.state != p_diff.port_b.state and
                        p_diff.port_b.state is not None and
                            scan_host = True
                if scan_host:

Here's what's happening: ScanDiff and HostDiff objects have a property called cost that tells how many changes it would take to change one object (scan or host) into another. If it's greater than zero, then there is a difference, and I want to scan it, but only if the host is still up in the latest scan, and only if the host has new open ports.

Nearly done with! I just needed to write my two output files: the text-format diff, and the shell script for running the followup scan.

        difffile = open(diffout, 'w')

        cmdfile = open(cmdout, 'w')
        cmdfile.write('test -z "$1" && OUTFILE=$1\n')
        cmdfile.write("/usr/local/bin/nmap -v --open -p %s -sV -oA $1 %s\n"
                % ( ",".join(map(lambda x: str(x), ports.keys())),
                    " ".join(targets)))

Writing the diff out is straightforward, since that's the original purpose of ndiff. The shell script was also fairly easy, once I remembered to use the absolute path to nmap. The one complexity was getting a comma-separated list of ports. My first attempt used string.join, but here's how that went:

>>> ports = {80:1,443:1}
>>> ",".join(ports.keys())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, int found

string.join needs a list of strings, not integers. Using map, I just converted each of the keys to a string, then joined those. I also considered using reduce, like this:

reduce(lambda x,y: str(x)+","+str(y), ports.keys())

I decided that was too complicated, and probably less efficient, due to all the string concatenations and extra calls to str().

So finally, was complete. My last step was to put it all together into a shell script to be called from cron. Here's how that turned out:


NMAPOUT=lan-cron-$(date +%F)

cd /root/nmap

#Fast port scan
$NMAP -v --open -T4 -oX $NMAPOUT.xml

#Do diff, generate details-scan command
python --diffout $NMAPOUT.diff --cmdout \

#run details scan
sh $NMAPOUT-details

#re-point symlink

And it works like a charm!


  1. Nice post. I was looking around for something similar. I'd like to do the same function but instead of doing a detailed nmap scan the second time, I'm looking to pass the diff to nessus. Thats something I could probably do by tweaking your code a little.

    For now, I thought I'll test your code exactly as it is. I get an error:
    Traceback (most recent call last):
    File "", line 70, in
    File "", line 43, in main
    if diff.cost > 0:
    AttributeError: 'ScanDiff' object has no attribute 'cost'

    Looking at that i use it looks like on hostdiff has the cost property.


    1. Since I wrote this, the internal structure of Ndiff has changed a bit. The new version is more memory efficient, but doesn't work with my code. You can get the full source, including the last version of Ndiff that worked properly, from my GitHub repository (

      If you decide to update this for the most recent Ndiff, feel free to make a pull request on GitHub or send me a patch so others can benefit.