Tutorial: Flow name labelling
Contents
Introduction
The fnameLabel plugin helps you tag flows which originate from different files or interfaces. It is especially useful for the -R
or -D
options, discussed in the multifileIO tutorial.
Preparation
First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins
and compile the following plugins:
$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build -f tranalyzer2 basicFlow tcpStates fnameLabel txtSink
...
BUILD SUCCESSFUL
If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:
$ mkdir ~/data ~/results
$
The anonymized sample PCAP used in this tutorial can be downloaded here: faf-exercise.pcap Please extract it under your data
folder.
As we are using the multifileIO option, the pcap needs to be chunked. Just invoke the commands below:
$ cd data
$ mkdir F
$ tcpdump -r faf-exercise.pcap -w F/faf-exercise.pcap -C 1
...
$ ls F
faf-exercise.pcap faf-exercise.pcap1 faf-exercise.pcap2 faf-exercise.pcap3 faf-exercise.pcap4 faf-exercise.pcap5
$
Now you are all set for the following chapter.
plugin fnameLabel
The fnameLabel plugin not only tags flows but also adds a hash value or a label which represents the number contained in a file or a specific letter. It is predominantly used to automatically separate flows created by the -R
or -D
option for training of classifiers. In order to see the configuration, move to the fnameLabel plugin and look into the fnameLabel.h file.
$ fnameLabel
$ vi src/fnameLabel.h
...
/* ========================================================================== */
/* ------------------------ USER CONFIGURATION FLAGS ------------------------ */
/* ========================================================================== */
#define FNL_LBL 1 // 1: Output label derived from input
// (Use fileNum for Tranalyzer -D option, otherwise refer to FNL_IDX)
#define FNL_IDX 0 // Use the 'FNL_IDX' letter of the filename as label
// ( -R/-i/-r options) [require FNL_LBL=1]
#define FNL_HASH 0 // 1: Output hash of filename
#define FNL_FLNM 1 // 1: Output filename
#define FNL_FREL 1 // Use absolute (0) or relative (1) filenames for fnLabel, fnHash and fname
#define FNL_NAMELEN 1024 // Max length for filename
/* ========================================================================== */
/* ------------------------- DO NOT EDIT BELOW HERE ------------------------- */
/* ========================================================================== */
...
If -D
is utilized the label denotes the file number in the file name regex. In all other cases the constant FNL_IDX
defines the position of the character in the filename to be taken as label. Note that if FNL_FREL=1
, then the position refers to the relative filename (position 0 refers to the first character after the last slash). If FNL_FREL=0
, the position refers to the absolute path. For example:
Filename | FNL_IDX |
FNL_FREL=1 |
FNL_FREL=0 |
---|---|---|---|
/home/user/data/F/faf-exercise.pcap | 0 |
f |
/ |
/home/user/data/F/faf-exercise.pcap | 1 |
a |
h |
If you like you may switch on FNL_HASH
as well, as it produces a unique number representing the filename. Here, we leave everything else as default.
fnanameLabel using the -D option
$ t2 -D ~/data/F/faf-exercise.pcap1,5 -w ~/results/
================================================================================
Tranalyzer 0.8.14 (Anteater), Tarantula. PID: 56586
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
01: basicFlow, 0.8.14
02: tcpStates, 0.8.14
03: fnameLabel, 0.8.14
04: txtSink, 0.8.14
[INF] IPv4 Ver: 5, Rev: 16122020, Range Mode: 0, subnet ranges loaded: 406208 (406.21 K)
[INF] IPv6 Ver: 5, Rev: 17122020, Range Mode: 0, subnet ranges loaded: 51196 (51.20 K)
Processing file: /home/wurst/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file: /home/wurst/data/F/faf-exercise.pcap2
Processing file: /home/wurst/data/F/faf-exercise.pcap3
Processing file: /home/wurst/data/F/faf-exercise.pcap4
Processing file: /home/wurst/data/F/faf-exercise.pcap5
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
...
Number of processed flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request flows: 2 [50.00%]
Number of reply flows: 2 [50.00%]
...
Only 4 flows? Why is that? If you run faf-exercise.pcap
with t2 -r
we have 72 flows. This is because most of the flows are generated in the first chunk F/faf-exercise.pcap
. We started with index 1, remember?! Gotcha. If you wanted to process all the chunks, you could modify the -D
option as follows: -D ~/data/F/faf-exercise.pcap,5
If you look now into the resulting flow file faf-exercise_flows.txt
you will see flows with fnLabel
1 and 5, which match the number in fname
(the filename). This means each of those files caused a flow to be created.
$ tcol ~/results/faf-exercise_flows.txt
%dir flowInd flowStat timeFirst timeLast duration numHdrDesc numHdrs hdrDesc srcMac dstMac ethType ethVlanID srcIP srcIPCC srcIPOrg srcPort dstIP dstIPCC dstIPOrg dstPort l4Proto tcpStates fnLabel fname
A 1 0x0400000000004000 1258594168.120912 1258594185.427506 17.306594 1 3 eth:ipv4:tcp 00:19:e3:e7:5d:23 00:08:74:38:01:b4 0x0800 143.166.11.10 us "Dell" 64334 192.168.1.105 07 "Private network" 49330 6 0x03 1 "faf-exercise.pcap1"
B 1 0x0400000000004001 1258594168.121080 1258594191.015208 22.894128 1 3 eth:ipv4:tcp 00:08:74:38:01:b4 00:19:e3:e7:5d:23 0x0800 192.168.1.105 07 "Private network" 49330 143.166.11.10 us "Dell" 64334 6 0x43 1 "faf-exercise.pcap1"
A 2 0x0400000000004000 1258594185.618346 1258594185.618346 0.000000 1 3 eth:ipv4:tcp 00:08:74:38:01:b4 00:19:e3:e7:5d:23 0x0800 192.168.1.105 07 "Private network" 49329 143.166.11.10 us "Dell" 21 6 0x03 5 "faf-exercise.pcap5"
B 2 0x0400000000004001 1258594185.427515 1258594491.683288 306.255773 1 3 eth:ipv4:tcp 00:19:e3:e7:5d:23 00:08:74:38:01:b4 0x0800 143.166.11.10 us "Dell" 21 192.168.1.105 07 "Private network" 49329 6 0x43 5 "faf-exercise.pcap5"
Note that if you set FNL_FREL
to 0, then the absolute path, e.g., /home/user/data/F/faf-exercise.pcap1
, would be printed instead of the relative one, e.g., faf-exercise.pcap1
.
fnanameLabel using the -R option
In case of the -R
option, we first have to create a pcap file list.
$ cd ~/data/F
$ t2caplist *[0-9] > faf-exercise.txt
$ cat faf-exercise.txt
/home/user/data/F/faf-exercise.pcap1
/home/user/data/F/faf-exercise.pcap2
/home/user/data/F/faf-exercise.pcap3
/home/user/data/F/faf-exercise.pcap4
/home/user/data/F/faf-exercise.pcap5
$
Then FNL_IDX
needs to be set to the character position where the number is to be expected. And then invoke T2 on this very list.
$ t2conf fnameLabel -D FNL_IDX=17 && t2build fnameLabel
...
$ t2 -R ~/data/F/faf-exercise.txt -w ~/results/
================================================================================
Tranalyzer 0.8.14 (Anteater), Tarantula. PID: 27767
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Checking list file
checking file '/home/user/data/F/faf-exercise.pcap1'
checking file '/home/user/data/F/faf-exercise.pcap2'
checking file '/home/user/data/F/faf-exercise.pcap3'
checking file '/home/user/data/F/faf-exercise.pcap4'
checking file '/home/user/data/F/faf-exercise.pcap5'
Active plugins:
01: basicFlow, 0.8.14
02: tcpStates, 0.8.14
03: fnameLabel, 0.8.14
04: txtSink, 0.8.14
[INF] IPv4 Ver: 5, Rev: 16122020, Range Mode: 0, subnet ranges loaded: 406208 (406.21 K)
[INF] IPv6 Ver: 5, Rev: 17122020, Range Mode: 0, subnet ranges loaded: 51196 (51.20 K)
Processing list file: /home/user/data/F/faf-exercise.txt
Processing file no. 1 of 5: /home/user/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file no. 2 of 5: /home/user/data/F/faf-exercise.pcap2
Link layer type: Ethernet [EN10MB/1]
Processing file no. 3 of 5: /home/user/data/F/faf-exercise.pcap3
Link layer type: Ethernet [EN10MB/1]
Processing file no. 4 of 5: /home/user/data/F/faf-exercise.pcap4
Link layer type: Ethernet [EN10MB/1]
Processing file no. 5 of 5: /home/user/data/F/faf-exercise.pcap5
Link layer type: Ethernet [EN10MB/1]
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
Finished processing. Elapsed time: 0.002678 sec
Finished unloading flow memory. Time: 0.002699 sec
...
Number of processed flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request flows: 2 [50.00%]
Number of reply flows: 2 [50.00%]
...
And you see the same result as before with the -D
option.
$ tcol ~/results/faf-exercise_flows.txt
%dir flowInd flowStat timeFirst timeLast duration numHdrDesc numHdrs hdrDesc srcMac dstMac ethType ethVlanID srcIP srcIPCC srcIPOrg srcPort dstIP dstIPCC dstIPOrg dstPort l4Proto tcpStates fnLabel fname
A 1 0x0400000000004000 1258594168.120912 1258594185.427506 17.306594 1 3 eth:ipv4:tcp 00:19:e3:e7:5d:23 00:08:74:38:01:b4 0x0800 143.166.11.10 us "Dell" 64334 192.168.1.105 07 "Private network" 49330 6 0x03 1 "faf-exercise.pcap1"
B 1 0x0400000000004001 1258594168.121080 1258594191.015208 22.894128 1 3 eth:ipv4:tcp 00:08:74:38:01:b4 00:19:e3:e7:5d:23 0x0800 192.168.1.105 07 "Private network" 49330 143.166.11.10 us "Dell" 64334 6 0x43 1 "faf-exercise.pcap1"
A 2 0x0400000000004000 1258594185.618346 1258594185.618346 0.000000 1 3 eth:ipv4:tcp 00:08:74:38:01:b4 00:19:e3:e7:5d:23 0x0800 192.168.1.105 07 "Private network" 49329 143.166.11.10 us "Dell" 21 6 0x03 5 "faf-exercise.pcap5"
B 2 0x0400000000004001 1258594185.427515 1258594491.683288 306.255773 1 3 eth:ipv4:tcp 00:19:e3:e7:5d:23 00:08:74:38:01:b4 0x0800 143.166.11.10 us "Dell" 21 192.168.1.105 07 "Private network" 49329 6 0x43 5 "faf-exercise.pcap5"
$
Don’t forget to reset FNL_IDX
to its default value:
$ t2conf fnameLabel -D FNL_IDX=0 && t2build fnameLabel
...
$
Or use the new command: t2conf --reset fnameLabel
Have fun!