This semester I took a course called network forensics. It was a very interesting course, project based, which allowed the students to design any network forensics-related project they wished. For our project, completed with a classmate of mine, we analysed Cisco NetFlow data for our Fraternity house. There was quite a few administrative hoops to jump though, including authorization by the university's IRB (since we collected information about their students). I thought I'd share some of my experiences from the project. This summary will try to guide anyone interested in simple forensics with setting up a collection environment for their home network. Unfortunately it isn't a HOWTO or drop in system. Though if you try what I describe, you're bound to have some fun!
What Data are we collecting: Cisco NetFlow data includes simple header data about network communication. The network traffic is summarized into 'flows' or communications. So if you start up a TCP connection with your favorite video game and play for 20 minutes, that'll be 1 flow. The flow includes Source/Destination IP and Port, start/end time, flags, protocol, packets, and bytes and more. What the flows do no include are payload information. This makes it easy for preforming simple forensics tasks at home. You wont have a huge amount of data to look though, you wont collect anything harmful (at least nothing too harmful), and you wont overload your router.
What you'll need: A router running DD-WRT/OpenWRT, and PC/VM running linux (I used Ubuntu for my home setup but Gentoo is much easier to configure).
Step 1 - Generating the NetFlows (network information): No this does not mean sitting your kid brother down and telling him to generate some traffic. By generator I mean some device either gateway or sniffer which has information about the network traffic and can report that information, in NetFlow form, to a collector. For this you can use a neat little program called rFlow. For my part of the project I used my personal Buffalo router with DD-WRT installed, although OpenWRT works just fine too. Installing custom firmware on your home router is another topic, I'll just assume you can Google the procedure. The rFlow configuration for DD-WRT (version 24-sp2) can be found under Services->Services->RFlow / MACupd.
Once enabling the service you'll need to give it an IP address as a "Server IP". What this means is, "who is going to receive the NetFlow data once I generate it?" Point this to some PC/Laptop/VM running Linux.
Step 2 - Collecting the NetFlows: To collect the flows I recommend using flow-tools, which will also function as a NetFlow analyzer. Another great application for collecting and analyzing NetFlows is SiLK, produced by CERT. The reasons I recommend using flow-tools include compatibility with the visualization tools I'll recommend and flow-tools is included in popular Linux distribution. Flow-tools will install a daemon that should be configured to on the port configured in rFlow. Based on some literature, I would recommend configuring flow-tools to capture on a 5-minute interval. Also take a look at: http://forums.cacti.net/about12393.html for some configuration recommendations. Specifically the line where the author writes about their configuration
-z0 -V5 -n 288 -N0 -w /home/flows -E2G 0/0/2056. As they also highlight "-z0 -V5 -n 288 -N0" this is important for another application which will use the collected data. The "n 288" will provide the 5-minute interval. ;) At this point you should be collecting NetFlow data into flow-tools binary files.
Step 3 - Visualize: Now we need a few tools to translate the captured data into visualizations. For this I'll recommend a few tools. The first is FlowScan and CUFlow/CUGrapher which should be included as installables for many popular distributions. CUFlow is a module for FlowScan which allows you to generate graphs based on application usage. CUGrapher is a web based application for viewing the output of CUFlow. Install these, configure them, and you should have a nice web interface for viewing generic network usage statistics. The second tool I'll recommend is FlowViewer. You'll have to download and copy the FlowViewer contents into your cgi-bin and configure one file to match your flow-tools settings. This will tell FlowViewer where to find the flow-tools binary files and how they are stored. FlowViewer will provide you with a nice web interface for generating selection filters. With it, you have a lot of power to select and bin data. When you're happy with your filter, grab it from the temporary working directory FlowViewer uses, and use a flow-tool called flow-export to create a comma separated value file. Use this to import data into programs like R, Matlab, and Excel.
All this is pretty straight forward on a Gentoo distribution, the configurations come a bit more dummy-proof and manageable. When I was setting up the project environment I ran into a bit of trouble on Ubuntu but following the tutorial on the Cacti forum should start you on the right track. There were a few interesting behaviors we found during the project. I'll be sure to write some more about them, as well as post the corresponding papers we authored.