Tutorial: Flow name labelling

Introduction

The fnameLabel plugin helps you tag flows which originate from different files or interfaces. It is especially useful for the -R or -D options, discussed in the multifileIO tutorial.

Preparation

First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins and compile the following plugins:

$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build -f tranalyzer2 basicFlow tcpStates fnameLabel txtSink
...
BUILD SUCCESSFUL

If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:

$ mkdir ~/data ~/results
$

The anonymized sample PCAP used in this tutorial can be downloaded here: faf-exercise.pcap Please extract it under your data folder.

As we are using the multifileIO option, the pcap needs to be chunked. Just invoke the commands below:

$ cd data
$ mkdir F
$ tcpdump -r faf-exercise.pcap -w F/faf-exercise.pcap -C 1
...
$ ls F
faf-exercise.pcap  faf-exercise.pcap1  faf-exercise.pcap2  faf-exercise.pcap3  faf-exercise.pcap4  faf-exercise.pcap5
$

Now you are all set for the following chapter.

plugin fnameLabel

The fnameLabel plugin not only tags flows but also adds a hash value or a label which represents the number contained in a file or a specific letter. It is predominantly used to automatically separate flows created by the -R or -D option for training of classifiers. In order to see the configuration, move to the fnameLabel plugin and look into the fnameLabel.h file.

$ cd src
$ ls
$ vi fnameLabel.h
...
// user defines

#define FNL_LBL        1 // 1: Output label derived from input
                         //    (Use fileNum for Tranalyzer -D option, otherwise refer to FNL_IDX)
#define FNL_IDX        0 // Use the 'FNL_IDX' letter of the filename as label
                         // (Tranalyzer -R/-i/-r options) [require FNL_LBL=1]
#define FNL_HASH       0 // 1: Output hash of filename
#define FNL_FLNM       1 // 1: Output filename
#define FNL_FREL       1 // Use absolute (0) or relative (1) filenames for fnLabel, fnHash and fname

#define FNL_NAMELEN 1024 // Max length for filename
...

If -D is utilized the label denotes the file number in the file name regex. In all other cases the constant FNL_IDX defines the position of the character in the filename to be taken as label. Note that if FNL_FREL=1, then the position refers to the relative filename (position 0 refers to the first character after the last slash). If FNL_FREL=0, the position refers to the absolute path. For example:

Filename FNL_IDX FNL_FREL=1 FNL_FREL=0
/home/user/data/F/faf-exercise.pcap.pcap 0 f /
/home/user/data/F/faf-exercise.pcap.pcap 2 o l

If you like you may switch on FNL_HASH as well, as it produces a unique number representing the filename. Here, we leave everything else as default.

$ t2 -D ~/data/F/faf-exercise.pcap1,5 -w ~/results/
================================================================================
Tranalyzer 0.8.8 (Anteater), Tarantula. PID: 26967
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.8
    02: tcpStates, 0.8.8
    03: fnameLabel, 0.8.8
    04: txtSink, 0.8.8
[INF] IPv4 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 389458 (389.46 K)
[INF] IPv6 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 49429 (49.43 K)
Processing file: /home/wurst/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file: /home/user/data/F/faf-exercise.pcap2
Processing file: /home/user/data/F/faf-exercise.pcap3
Processing file: /home/user/data/F/faf-exercise.pcap4
Processing file: /home/user/data/F/faf-exercise.pcap5
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
...
Number of processed   flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request     flows: 2 [50.00%]
Number of reply       flows: 2 [50.00%]
...
$

Only 4 flows? Why is that? If you run faf-exercise.pcap with t2 -r we have 72 flows. This is because most of the flows are generated in the first chunk F/faf-exercise.pcap. We started with index 1, remember?! If you wanted to process all the chunks, you could modify the -D option as follows: -D ~/data/F/faf-exercise.pcap,5

If you look now into the resulting flow file faf-exercise_flows.txt you will see flows with fnLabel 1 and 5, which match the number in fname (the filename). This means each of those files caused a flow to be created.

$ tcol ~/results/faf-exercise_flows.txt
%dir  flowInd  flowStat            timeFirst          timeLast           duration    numHdrDesc  numHdrs  hdrDesc       srcMac             dstMac             ethType  ethVlanID  srcIP          srcIPCC  srcIPOrg           srcPort  dstIP          dstIPCC  dstIPOrg           dstPort  l4Proto  tcpStates  fnLabel  fname
A     1        0x0400000000004000  1258594168.120912  1258594185.427506  17.306594   1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             64334    192.168.1.105  07       "Private network"  49330    6        0x03       1        "faf-exercise.pcap1"
B     1        0x0400000000004001  1258594168.121080  1258594191.015208  22.894128   1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49330    143.166.11.10  us       "Dell"             64334    6        0x43       1        "faf-exercise.pcap1"
A     2        0x0400000000004000  1258594185.618346  1258594185.618346  0.000000    1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49329    143.166.11.10  us       "Dell"             21       6        0x03       5        "faf-exercise.pcap5"
B     2        0x0400000000004001  1258594185.427515  1258594491.683288  306.255773  1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             21       192.168.1.105  07       "Private network"  49329    6        0x43       5        "faf-exercise.pcap5"

Note that if you set FNL_FREL to 0, then the absolute path, e.g., /home/user/data/F/faf-exercise.pcap1, would be printed insted of the relative one, e.g., faf-exercise.pcap1.

In case of the -R option, we first have to create a pcap file list.

$ cd ~/data/F
$ t2caplist *[0-9] > faf-exercise.txt
$ cat faf-exercise.txt
/home/user/data/F/faf-exercise.pcap1
/home/user/data/F/faf-exercise.pcap2
/home/user/data/F/faf-exercise.pcap3
/home/user/data/F/faf-exercise.pcap4
/home/user/data/F/faf-exercise.pcap5
$

Then FNL_IDX needs to be set to the character position where the number is to be expected. And then invoke T2 on this very list.

$ t2conf fnameLabel -D FNL_IDX=17
$ t2build fnameLabel
$ t2 -R ~/data/F/faf-exercise.txt -w ~/results/
================================================================================
Tranalyzer 0.8.8 (Anteater), Tarantula. PID: 27767
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Checking list file
    checking file '/home/user/data/F/faf-exercise.pcap1'
    checking file '/home/user/data/F/faf-exercise.pcap2'
    checking file '/home/user/data/F/faf-exercise.pcap3'
    checking file '/home/user/data/F/faf-exercise.pcap4'
    checking file '/home/user/data/F/faf-exercise.pcap5'
Active plugins:
    01: basicFlow, 0.8.8
    02: tcpStates, 0.8.8
    03: fnameLabel, 0.8.8
    04: txtSink, 0.8.8
[INF] IPv4 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 389458 (389.46 K)
[INF] IPv6 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 49429 (49.43 K)
Processing list file: /home/user/data/F/faf-exercise.txt
Processing file no. 1 of 5: /home/user/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file no. 2 of 5: /home/user/data/F/faf-exercise.pcap2
Link layer type: Ethernet [EN10MB/1]
Processing file no. 3 of 5: /home/user/data/F/faf-exercise.pcap3
Link layer type: Ethernet [EN10MB/1]
Processing file no. 4 of 5: /home/user/data/F/faf-exercise.pcap4
Link layer type: Ethernet [EN10MB/1]
Processing file no. 5 of 5: /home/user/data/F/faf-exercise.pcap5
Link layer type: Ethernet [EN10MB/1]
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
Finished processing. Elapsed time: 0.002678 sec
Finished unloading flow memory. Time: 0.002699 sec
...
Number of processed   flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request     flows: 2 [50.00%]
Number of reply       flows: 2 [50.00%]
...

And you see the same result as before with the -D option.

$ tcol ~/results/faf-exercise_flows.txt
%dir  flowInd  flowStat            timeFirst          timeLast           duration    numHdrDesc  numHdrs  hdrDesc       srcMac             dstMac             ethType  ethVlanID  srcIP          srcIPCC  srcIPOrg           srcPort  dstIP          dstIPCC  dstIPOrg           dstPort  l4Proto  tcpStates  fnLabel  fname
A     1        0x0400000000004000  1258594168.120912  1258594185.427506  17.306594   1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             64334    192.168.1.105  07       "Private network"  49330    6        0x03       1        "faf-exercise.pcap1"
B     1        0x0400000000004001  1258594168.121080  1258594191.015208  22.894128   1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49330    143.166.11.10  us       "Dell"             64334    6        0x43       1        "faf-exercise.pcap1"
A     2        0x0400000000004000  1258594185.618346  1258594185.618346  0.000000    1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49329    143.166.11.10  us       "Dell"             21       6        0x03       5        "faf-exercise.pcap5"
B     2        0x0400000000004001  1258594185.427515  1258594491.683288  306.255773  1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             21       192.168.1.105  07       "Private network"  49329    6        0x43       5        "faf-exercise.pcap5"
$

Have fun!