Biz & IT —

Monitoring network traffic with Ruby and Pcap

Ever needed to monitor instant message traffic on your network? Linux.Ars has …

Introduction

Linux.Ars returns with yet another fun-filled edition. It seems like many of our readers are interested in learning how to take advantage of specific Linux technologies. Based on reader input, we have decided to place a stronger emphasis on technical tutorials and code examples. This week, we have some nifty stuff for you. I wrote an introductory tutorial to network filtering with libpcap and Ruby in which you will learn how to make a script that intercepts AIM instant messages sent from or received by any computer on your local network. Ian wrote an excellent tutorial that describes how to use Perl to automate image processing with the GIMP. I also wrote an introduction to Conglomerate, a unique XML editor that runs on Linux.

In the last edition of Linux.Ars, we used syntax highlighting in order to improve the readability of some of the code examples. Unfortunately, we were only able to specify style information for the default template, so the code examples were difficult to read for those that still use the old template. We would like to apologize for any inconvenience that this created for those of you that still prefer the old Ars look. We now have syntax highlighting set up for both style sheets, and we will attempt to use it in all future editions that include code. We would like some feedback about the syntax highlighting. Do you feel that it makes the code easier to read, or is it just a distraction?

Linux.Ars is all about you, so don't be afraid to get involved! Want to do a section for a future edition? Have a suggestion for a topic that you want us to write about? I would love some feedback. We want your comments, complaints, suggestions, requests, free hardware, death threats, or disparaging remarks about my assorted deficiencies. Send me an e-mail or post a comment in the discussion thread!

Developer's Corner

Monitoring network traffic with Ruby and Pcap

There are many situations where the ability to monitor network traffic can save a lot of time and effort. If you want to reverse engineer a network protocol, keep an eye on junior's browsing habits, or blackmail your evil boss, Ruby and libpcap can make it easy! Libpcap is a packet sniffing library originally designed by the Lawrence Berkeley National Laboratory for use with their tcpdump utility. With this excellent Ruby binding for libpcap, you can monitor traffic all over your network with only a few simple lines of code. Let's start with a simple script that will display the URLs of remote files accessed by local network users via web browser.

In order to intercept browser page requests, we have to capture all packets destined for a remote web server, and extract browser GET requests from the packet data. Libpcap allows you to use a simple query language to describe packet filters. In order to describe our filter, we must know the port number to which GET requests are sent. I'm sure most of us know that web servers run on port 80, but for other protocols, a quick glance through /etc/services can generally help us figure out what we need to know. At the command line, grep for 'www' in /etc/services:

$ grep www /etc/services
www             80/tcp          http            # WorldWideWeb HTTP

This tells us that web servers use port 80 and tcp. With this information and a helpful packet filter query guide, we can determine that the relevant query string is: "tcp and dst port 80" Now that we know what our query string should be, we can build a simple packet filter script that will capture the relevant packets and display the packet contents.

packet_filter.rb

#!/usr/bin/env ruby

# this line imports the libpcap ruby bindings
require 'pcaplet'

# create a sniffer that grabs the first 1500 bytes of each packet
$network = Pcaplet.new('-s 1500')

# create a filter that uses our query string and the sniffer we just made
$filter = Pcap::Filter.new('tcp and dst port 80', $network.capture)

# add the new filter to the sniffer
$network.add_filter($filter)

# iterate over every packet that goes through the sniffer
for p in $network
  # print packet data for each packet that matches the filter
  puts p.tcp_data if $filter =~ p
end

Now run the script as root from the command line, load a page in your web browser, and watch the results:

$ sudo ruby packet_filter.rb

You should see something like this:

GET /index.ars HTTP/1.1
Host: arstechnica.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Firefox/1.0.7
   (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

A quick glance at the packet data allows us to see that the relevant information is in the first two lines. We need to extract 'arstechnica.com/index.ars' from those lines and the best way to do it is with a regular expression:

 /GET(.*)HTTP.*Host:([^rn]*)/xm

Now lets rewrite the packet sniffer loop and use the regular expression to extract the relevant information:

for p in $network
  # if the packet matches the filter and the regexp...
  if $filter =~ p and p.tcp_data =~ /GET(.*)HTTP.*Host:([^rn]*)/xm
    # print the local IP of the requestor and the requested URL
    puts "#{p.src} - http://#{$2.strip}#{$1.strip}"
  end
end

Now, when you run the script and load a page, you should see something like the following:

1.1.2.4 - http://arstechnica.com/index.ars
1.1.2.4 - https://cdn.arstechnica.net/Templates/ArsTechnica/style.css
1.1.2.4 - https://cdn.arstechnica.net/Templates/ArsTechnica/style.css
1.1.2.4 - https://cdn.arstechnica.net/Templates/ArsTechnica/StyleSheets/Layout.css
1.1.2.4 - https://cdn.arstechnica.net/Templates/ArsTechnica/StyleSheets/FrontPage.css

1.1.2.4 is the local network IP address of the computer that I used to load the Ars Technica web site. If you load web sites on other computers on the network, the URLs will also appear in the list with the associated IPs.

Now for something a bit more ambitious. I used Ruby and libpcap to make a complete AOL Instant Messenger snooper with only 30 lines of code. It intercepts all AIM messages sent and received by computers on the local network. The AIM protocol uses port 5190, so we need to create new filter strings that intercept packets sent to and from port 5190 on remote systems. We also have to create a function that can parse AIM packets and extract the message and the screen name of the user that sent it. Unfortunately, the OSCAR protocol used by AIM is a byzantian mess and it is very difficult to parse consistently. Different AIM clients seem to use slight variations and the parsing mechanism has to be able to account for that.

In order to effectively dissect the packet data, I had to find some reference material to help me out. Unofficial OSCAR documentation provided some critical hints. I figured out which part of the packet string contains the length of the screen name, and I use that to extract the screen name. I was unable to find a position at which the message consistently starts, so I use regular expressions to extract all the contents of the html tags within the packet, and then I strip the html out of that to leave me with the message in plain text form.

The following is the complete AIM sniffer, written with Ruby and libpcap:

aim_sniffer.rb

#!/usr/bin/env ruby

# this line imports the libpcap ruby bindings
require 'pcaplet'

# create a sniffer that grabs the first 1500 bytes of each packet
$network = Pcaplet.new('-s 1500')

def has_nonprint? n
  # figure out if the string has non-printable characters
  n.each_byte {|x| return false if x < 32 or x > 126}
end

def aim_msg_parse p
  # figure out how many text characters are in the screen name
  name_length = p.tcp_data[26..26].unpack("c")
  # extract the screen name from the packet
  name = p.tcp_data[27..(27 + name_length[0])]
  # filter out all other text
  p.tcp_data[85..-1][/<[^>]+>(.*)<//]
  msg = $1.gsub(/<[^>]+>/,"").strip

  # make sure that it is an actual message and then return it
  return [name, msg] if msg and not has_nonprint?(name) and
    name =~ /^[a-zA-Z]/ and not name.include?("/")

  # if it isn't really a text message, return nothing
  nil
rescue
end

# make a filter to capture all packets sent to port 80 on a remote server
$www_filter = Pcap::Filter.new('tcp and dst port 80', $network.capture)

# make a filter to capture all packets sent from port 5190 on a remote server
$aim_recv_filter = Pcap::Filter.new('tcp and src port 5190', $network.capture)

# make a filter to capture all packets sent to port 5190 on a remote server
$aim_send_filter = Pcap::Filter.new('tcp and dst port 5190', $network.capture)

# add all the filters
$network.add_filter($aim_recv_filter | $aim_send_filter | $www_filter)

for p in $network
  # if the packet matches the www filter and the regexp...
  if $www_filter =~ p and p.tcp_data =~ /GET(.*)HTTP.*Host:([^rn]*)/xm
    # print the local IP of the requestor and the requested URL
    puts "#{p.src} - http://#{$2.strip}#{$1.strip}"
  # if the packet matches the incoming AIM filter...
  elsif $aim_recv_filter =~ p
    # parse the packet and extract the sn/message
    name, msg = aim_msg_parse p
    # display the local IP, the screen name of the user and the message
    puts "(<-) <#{p.dst}> from #{name}: #{msg}" if name and msg
  # if the packet matches the outgoing AIM filter...
  elsif $aim_send_filter =~ p
    # parse the packet and extract the sn/message
    name, msg = aim_msg_parse p
    # display the local IP, the screen name of the user and the message
    puts "(->) <#{p.src}> to #{name}: #{msg}" if name and msg
  end
end

The output of the AIM sniffer looks like this:

(->)  to Sumbuddy: i'm working on an article for ars technica
(<-)  from Sumbuddy: Cool. What is it about?
(->)  to Sumbuddy: packet sniffing
(<-)  from Sumbuddy: That sounds like a euphemism for something dirty.
(->)  to Sumbuddy: I don't think that reverse engineering chat protocols is dirty
(->)  to Sumbuddy: well, now that I think about it, the protocol certainly is messy

I have tested these scripts extensively on my personal wireless network. For some reason, packets sent from one of the computers connected directly to the router don't show up in the filter, but aside from that, the script works perfectly and it was capable of intercepting packets from three other computers with a variety of operating systems. With a little practice, you should be able to reverse engineer other protocols and capture virtually any data that gets sent through the network. I hope that this introduction to libpcap has helped to illuminate the potential of packet filtering as well as the risks associated with using applications that send data over the network as plain text.

Download the PDF
(This feature for Premier subscribers only.)

Channel Ars Technica