Script: Parse Mail Headers

2012-11-07

Here’s a perl script I put together that uses Email::Simple to extract the headers from a message. See link below.

I’m using it to examine spam. It parses all the headers, with a focus on the Received headers. It should be easy to alter it to examine any header you want. As it is currently written, it:

finds all the Received headers
finds the first Received header that was added to the mail (presumably the header added by the first MTA that received it)
extracts the IP from that header
does an rDNS lookup
if there’s a hostname, it looks up the nameservers for the base domain.

By “base domain” I mean that if the rDNS returns a hostname like “1234.my.example.domain.com”, the base domain would be “domain.com”.

The script is easily alterable if you need info for different headers. The base domain extraction from the hostname is less than optimal. I need to rewrite that portion of the code. I didn’t have time to implement checks using Net::Domain::TLD. Hopefully, I’ll get to this soon.

A couple of people have asked, “What’s the purpose?” Well, I read a couple of articles about spammers using lots of different domains but having the same nameservers. So I wanted to check out some of the spam I have been getting to see if I could see any patterns. None so far.

#! /usr/bin/perl

use strict;
use Email::Simple;
use Net::Nslookup;

#To run the script on all the messages in your top level email folder, use the find command:
#find /path/to/top/level/email/folder -type f -exec headers_all.pl {} \; > originating_ip_headers.txt

# declare variables
my $raw_email;
my $mail;
my $received_header;
my $untrusted_header;
my $key;
my $value;
my $domain;
my $nsdomain;

open( MESSAGE, "< $ARGV[0]" ) || die "Couldn't open email $ARGV[0]\n";
undef $/;
$raw_email = <>;
close MESSAGE;

my $mail = Email::Simple->new($raw_email);

# Create array for header pairs
my @headers = $mail->header_pairs;

# create a hash for header pairs
my %pairs = $mail->header_pairs;

# Create array for received headers
my @received_headers = $mail->header("Received");

# Extract the ip from the last entry in @received_headers. Currently only does ipV4
# ipV6 - would it be possible to just get the contents between [ and ]
my $ip = $1
  if $received_headers[-1] =~ /\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]/;

# Print Output
print "Sender original IP is:\t$ip\n\n";

# Print the last received header.  Should contain the ip that the e-mail originated from
print "Original received header for this ip is:\n";
print "$received_headers[-1]\n\n";

# Print the array of received headers
print "All received headers for this message are:\n";
foreach my $rec_header (@received_headers) {
    print "$rec_header\n";
}

# Get the rDNS
my $revDNS = gethostbyaddr( $ip, 2 );

# Check and print.  Exit script if there is no rDNS.
if ( $revDNS ne '' ) {
    print "\nReverse DNS is:\t$revDNS\n\n";
}
else {
    print "\nNo reverse DNS for $ip\n\n";
    exit;
}

# Get the base domain name if there was rDNS entry
my @domain = split( /\./, $revDNS );
# See http://search.cpan.org/~alexp/Net-Domain-TLD-1.69/lib/Net/Domain/TLD.pm
# Need to recode to check last array value for valid tld and use that check
# in the if statement
if ( length( $domain[$#domain] ) > 2 ) {
    $nsdomain = $domain[ $#domain - 1 ] . '.' . $domain[$#domain];
}
else {
    $nsdomain =
        $domain[ $#domain - 2 ] . '.'
      . $domain[ $#domain - 1 ] . '.'
      . $domain[$#domain];
}

print "NS domain is $nsdomain\n\n";

# look up nameserver records
my @ns = nslookup( domain => "$nsdomain", type => "NS" );
print "NS records are:\n";
foreach my $ns (@ns) {
    print "$ns\n";
}

Download it from Bitbucket.

perl