Tutorial: Multi File I/O

When files grow to infinity

It happenend to me and probably to you as well, somebody hands over pcaps of 20 TByte to you. You start multiple T2 as a background process and then after 7 TByte something goes wrong, and you have to start iall over again. Grrrrr.

Although T2 has no problem with huge pcap files it is a nuisance, I guess you concur. But what to do if you split them up and having 2000 files 10 GByte long? Don’t worry, the anteater can handle that.

Now you wrote the most sophisticated and genious on line post processing of your flow file and suddenly you run out of disk space. Bummer! Especially if you are only interested in a certain time span or selection of traffic you like to split the resulting flow files to a more manageble size.

And what happens, if the pcaps copied to your computer by an obscure process, and you don’t want T2 to timeout if he runs out of food. So he should wait for the new ones to come preserving it internal state. The polling mode will come to the rescue.

Preparation

In order to assure that no old or unnecessary plugins are being loaded please clean your plugin directory and rebuild standard plugins

$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build tranalyzer2 basicFlow basicStats tcpStates txtSink
...
$ 

A good practice for analysis and mining jobs is to create a separate data and results directory as follows:

$ mkdir ~/data
$ mkdir ~/results
$ 

Download the pcap annoloc2.pcap into your data folder if you dodn’t already. Now fragment them into a sequence of 10MB pcaps using tcpdump and editcap so that we can test some different filename formats.

$ cd data
$ tcpdump -r annoloc2.pcap -w S/annoloc2S.pcap -C 10
...
$ ls S
annoloc2S.pcap  annoloc2S.pcap1  annoloc2S.pcap2  annoloc2S.pcap3  annoloc2S.pcap4  annoloc2S.pcap5  annoloc2S.pcap6  annoloc2S.pcap7  annoloc2S.pcap8
$
$ editcap -c 100000 annoloc2.pcap T/annoloc2T.pcap
$ ls T
annoloc2T_00000_20020523183501.pcap  annoloc2T_00003_20020523183507.pcap  annoloc2T_00006_20020523183514.pcap  annoloc2T_00009_20020523183520.pcap  annoloc2T_00012_20020523183526.pcap
annoloc2T_00001_20020523183503.pcap  annoloc2T_00004_20020523183509.pcap  annoloc2T_00007_20020523183516.pcap  annoloc2T_00010_20020523183522.pcap
annoloc2T_00002_20020523183505.pcap  annoloc2T_00005_20020523183511.pcap  annoloc2T_00008_20020523183518.pcap  annoloc2T_00011_20020523183524.pcap
$

Now you are ready for some kungfu reading.

Read from several defined pcaps in a row

Assume you have a lot of files, e.g. which are not comfortably numbered as in our case, but in time sequence over months and years. Then you can use the -R where T2 accepts a file containing a list of pcaps.

-R PCAPLIST

It processes all the pcap files listed in PCAPLIST. T2 keeps its internal state during the file change, thus all pcaps are treated as one large pcap.

The processing order is defined by the location of the filenames in the text file, so no sequential numbering is necessary. Nevertheless, the absolute path has to be specified. To generate the PCAPLIST you may use the commands below.

$ ls $PWD/S/annoloc2S* | awk '{print}' | sort > pcap_Slist.txt
$ cat pcap_Slist.txt
/home/wurst/data/S/annoloc2S.pcap
/home/wurst/data/S/annoloc2S.pcap1
/home/wurst/data/S/annoloc2S.pcap2
/home/wurst/data/S/annoloc2S.pcap3
/home/wurst/data/S/annoloc2S.pcap4
/home/wurst/data/S/annoloc2S.pcap5
/home/wurst/data/S/annoloc2S.pcap6
/home/wurst/data/S/annoloc2S.pcap7
/home/wurst/data/S/annoloc2S.pcap8

Lines starting with ‘#’ are considered as comments and thus ignored by T2. An easier way is to use the t2caplist script to generate such a list.

$ t2caplist -h
Usage:
    t2caplist [OPTION...] <FILE|DIR>

Optional arguments:
    -d depth          List pcaps up to the given depth
    -L                Follow symbolic links
    -r                List pcaps recursively
    -s                Do not sort the list
    -v                Report invalid files to stderr
    -h, -?, --help    Show this help, then exit

It can even follow symbolic links, sort the files, but here we just generate a list and see what happens.

$ cd data
$ t2caplist S > pcap_Slist.txt
$ t2 -R pcap_Slist.txt -w ~/results/S/
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 29678
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Checking list file
    checking file '/home/wurst/data/S/annoloc2S.pcap'
    checking file '/home/wurst/data/S/annoloc2S.pcap1'
    checking file '/home/wurst/data/S/annoloc2S.pcap2'
    checking file '/home/wurst/data/S/annoloc2S.pcap3'
    checking file '/home/wurst/data/S/annoloc2S.pcap4'
    checking file '/home/wurst/data/S/annoloc2S.pcap5'
    checking file '/home/wurst/data/S/annoloc2S.pcap6'
    checking file '/home/wurst/data/S/annoloc2S.pcap7'
    checking file '/home/wurst/data/S/annoloc2S.pcap8'
Active plugins:
    01: basicFlow, 0.8.6
    02: tcpStates, 0.8.6
    03: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308731 (308.73 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing list file: /home/wurst/data/S/pcap_Slist.txt
Processing file no. 1 of 9: /home/wurst/data/S/annoloc2S.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file no. 2 of 9: /home/wurst/data/S/annoloc2S.pcap1
Link layer type: Ethernet [EN10MB/1]
Processing file no. 3 of 9: /home/wurst/data/S/annoloc2S.pcap2
Link layer type: Ethernet [EN10MB/1]
Processing file no. 4 of 9: /home/wurst/data/S/annoloc2S.pcap3
Link layer type: Ethernet [EN10MB/1]
Processing file no. 5 of 9: /home/wurst/data/S/annoloc2S.pcap4
Link layer type: Ethernet [EN10MB/1]
Processing file no. 6 of 9: /home/wurst/data/S/annoloc2S.pcap5
Link layer type: Ethernet [EN10MB/1]
Processing file no. 7 of 9: /home/wurst/data/S/annoloc2S.pcap6
Link layer type: Ethernet [EN10MB/1]
Processing file no. 8 of 9: /home/wurst/data/S/annoloc2S.pcap7
Link layer type: Ethernet [EN10MB/1]
Processing file no. 9 of 9: /home/wurst/data/S/annoloc2S.pcap8
Link layer type: Ethernet [EN10MB/1]
Dump stop : 1022171726.640398 sec (Thu 23 May 2002 16:35:26 GMT)
Total dump duration: 24.949226 sec
Finished processing. Elapsed time: 0.487618 sec
Finished unloading flow memory. Time: 0.600816 sec
Percentage completed: 100.00%
Number of processed packets: 1219015 (1.22 M)
Number of processed bytes: 64082726 (64.08 M)
Number of raw bytes: 844642686 (844.64 M)
Number of pcap bytes: 83587182 (83.59 M)
Number of IPv4 packets: 1218588 (1.22 M) [99.96%]
Number of IPv6 packets: 180 [0.01%]
Number of A packets: 564232 (564.23 K) [46.29%]
Number of B packets: 654783 (654.78 K) [53.71%]
Number of A bytes: 29448132 (29.45 M) [45.95%]
Number of B bytes: 34634594 (34.63 M) [54.05%]
Average A packet load: 52.19
Average B packet load: 52.89
--------------------------------------------------------------------------------
tcpStates: Aggregated anomaly flags: 0xdf
--------------------------------------------------------------------------------
Headers count: min: 2, max: 4, average: 3.01
Number of GRE packets: 20 [0.00%]
Number of IGMP packets: 12 [0.00%]
Number of ICMP packets: 3059 (3.06 K) [0.25%]
Number of ICMPv6 packets: 11 [0.00%]
Number of TCP packets: 948743 (948.74 K) [77.83%]
Number of TCP bytes: 52643546 (52.64 M) [82.15%]
Number of UDP packets: 266900 (266.90 K) [21.89%]
Number of UDP bytes: 11234272 (11.23 M) [17.53%]
Number of IPv4 fragmented packets: 2284 (2.28 K) [0.19%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 17589 (17.59 K)
Number of processed A flows: 9980 (9.98 K) [56.74%]
Number of processed B flows: 7609 (7.61 K) [43.26%]
Number of request     flows: 9933 (9.93 K) [56.47%]
Number of reply       flows: 7656 (7.66 K) [43.53%]
Total   A/B    flow asymmetry: 0.13
Total req/rply flow asymmetry: 0.13
Number of processed   packets/flows: 69.31
Number of processed A packets/flows: 56.54
Number of processed B packets/flows: 86.05
Number of processed total packets/s: 48859.83 (48.86 K)
Number of processed A+B packets/s: 48859.83 (48.86 K)
Number of processed A   packets/s: 22615.21 (22.61 K)
Number of processed   B packets/s: 26244.62 (26.24 K)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 704.99
Average full raw bandwidth: 270835712 b/s (270.84 Mb/s)
Average snapped bandwidth : 20548206 b/s (20.55 Mb/s)
Average full bandwidth : 270269600 b/s (270.27 Mb/s)
Max number of flows in memory: 15206 (15.21 K) [5.80%]
Memory usage: 0.07 GB [0.11%]
Aggregate flow status: 0x000018fa0202d044
[WRN] L3 SnapLength < Length in IP header
[WRN] L4 header snapped
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] IPv4/6 in IPv4/6
[INF] GRE encapsulation
[INF] SSDP/UPnP flows
[INF] Ethernet flows
[INF] ARP flows
$

First T2 checks all files, whether they exist and whether they are sound. Then he processes one pcap after the other listed in pcap_Slist.txt and terminates with a standard end report.

Read from a sequence of pcaps

Imagine you have a humongous amount of pcaps to process, and lucky you, they are produced with an index in the file name. Then the -D option is the way to go.

The -D option as specified below demands a FILEPREFIX, even as a regex *. If there is an extension, you have to specify it. The general option is shown below:

 –D FILEPREFIX[#Start][*][.ext][#Start][:SCHR][,#Stop]

Whereat #Start denotes the start index of the filename embedded in the file name or after the filename, #Stop the stop index. If the first is omitted T2 starts at ‘0’ or assume there is no number. If you omit the latter, T2 will wait for the next pcap if he runs out of food.

SCHR denotes the search characters for T2, where to find the #Start number in an arbitary file name. It can contain up to three characters. By default SCHR is set to p, as defined in tranalyzer.h. Open the latter and search for -D option parameters.

$ tranalyzer2
$ vi src/tranalyzer.h
...
// -D option parameters
#define RROP      0    // round robin operation
#define POLLTM    5    // poll timing in sec for files
#define MFPTMOUT  0    // > 0: timeout in sec for poll timing > POLLTM, 0: no poll timout
#define SCHR     'p'   // separating char for number (refer to the doc for examples)
...

The POLLTM denotes the poll interval T2 checks whether the next missing file is available under his data directory. If a file index is missing, aka no more food for the anteater, he will wait and poll every POLLTM seconds. This and the other constants will be discussed under polling timout.

We chose ‘p’ as the default because tcpdump adds the index at the end of the file name, behind the pcap extension i.e. out.pcapNUM. Nevertheless, t2 covers also the more complicated editcap filename format.

The following table summarises the supported naming patterns and the configuration required: Note the quotes (“) which are necessary to avoid preemptive interpretation of regex characters, e.g.”*“.

Filenames Command
out, out1, out2, … t2 -D out:t -w .
out.pcap, out.pcap1, out.pcap2, … t2 -D out.pcap -w .
out.pcap, out.pcap01, out.pcap02, … t2 -D out.pcap00 -w .
out.pcap, out1.pcap, out2.pcap, … t2 -D “out*.pcap:t" -w .
out0.pcap, out1.pcap, out2.pcap, … t2 -D out0.pcap:t -w .
out00.pcap, out01.pcap, out02.pcap, … t2 -D out00.pcap:t -w .
out_00_Wurst.pcap, out_01_Nudel.pcap, out_02_Knoedel.pcap t2 -D “out_00_*.pcap:t_,2" -w .
out_24.4.20h00.pcap, out_24.4.2016.20h00.pcap1, … t2 -D “out*.pcap" -w .
out_24.4.20h00.pcap00, out_24.04.20h00.pcap01, … t2 -D “out*.pcap00" -w .
out0.pcap, out1.pcap, ou2.pcap, … t2 -D out0.pcap:t -w .
out.pcap00, out.pcap01, out.pcap02, … t2 -D out.pcap00 -w .

So if you want to process all files in the tcpdump split format from index 2 to 4:

$ t2 -D "~/data/S/annoloc2S.pcap2,4" -w ~/results/S/
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 29704
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: tcpStates, 0.8.6
    03: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308731 (308.73 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171707.765746 sec (Thu 23 May 2002 16:35:07 GMT)
[WRN] snapL2Length: 42 - snapL3Length: 28 - IP length in header: 89
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Dump stop : 1022171716.626282 sec (Thu 23 May 2002 16:35:16 GMT)
Total dump duration: 8.860536 sec
Finished processing. Elapsed time: 0.183310 sec
Finished unloading flow memory. Time: 0.246442 sec
Percentage completed: 100.00%
Number of processed packets: 437472 (437.47 K)
Number of processed bytes: 23000464 (23.00 M)
Number of raw bytes: 304122809 (304.12 M)
Number of pcap bytes: 30000088 (30.00 M)
Number of IPv4 packets: 437336 (437.34 K) [99.97%]
Number of IPv6 packets: 41 [0.01%]
Number of A packets: 206259 (206.26 K) [47.15%]
Number of B packets: 231213 (231.21 K) [52.85%]
Number of A bytes: 10730714 (10.73 M) [46.65%]
Number of B bytes: 12269750 (12.27 M) [53.35%]
Average A packet load: 52.03
Average B packet load: 53.07
--------------------------------------------------------------------------------
tcpStates: Aggregated anomaly flags: 0xdf
--------------------------------------------------------------------------------
Headers count: min: 2, max: 4, average: 3.00
Number of GRE packets: 7 [0.00%]
Number of IGMP packets: 2 [0.00%]
Number of ICMP packets: 1106 (1.11 K) [0.25%]
Number of ICMPv6 packets: 1 [0.00%]
Number of TCP packets: 341543 (341.54 K) [78.07%]
Number of TCP bytes: 18937070 (18.94 M) [82.33%]
Number of UDP packets: 94711 (94.71 K) [21.65%]
Number of UDP bytes: 3989442 (3.99 M) [17.35%]
Number of IPv4 fragmented packets: 820 [0.19%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 8887 (8.89 K)
Number of processed A flows: 5060 (5.06 K) [56.94%]
Number of processed B flows: 3827 (3.83 K) [43.06%]
Number of request     flows: 5014 (5.01 K) [56.42%]
Number of reply       flows: 3873 (3.87 K) [43.58%]
Total   A/B    flow asymmetry: 0.14
Total req/rply flow asymmetry: 0.13
Number of processed   packets/flows: 49.23
Number of processed A packets/flows: 40.76
Number of processed B packets/flows: 60.42
Number of processed total packets/s: 49373.09 (49.37 K)
Number of processed A+B packets/s: 49373.09 (49.37 K)
Number of processed A   packets/s: 23278.39 (23.28 K)
Number of processed   B packets/s: 26094.70 (26.09 K)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 1002.99
Average full raw bandwidth: 274586400 b/s (274.59 Mb/s)
Average snapped bandwidth : 20766658 b/s (20.77 Mb/s)
Average full bandwidth : 274012928 b/s (274.01 Mb/s)
Max number of flows in memory: 8452 (8.45 K) [3.22%]
Memory usage: 0.07 GB [0.10%]
Aggregate flow status: 0x000018fa0202d044
[WRN] L3 SnapLength < Length in IP header
[WRN] L4 header snapped
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] IPv4/6 in IPv4/6
[INF] GRE encapsulation
[INF] SSDP/UPnP flows
[INF] Ethernet flows
[INF] ARP flows

The same for the editcap format: Note again the compulsory quotes for the regex processing.

$ t2 -D "~/data/T/annoloc2T_00002_*.pcap:T_,4" -w ~/results/T/
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 29707
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: tcpStates, 0.8.6
    03: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308731 (308.73 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/T/annoloc2T_00002_20020523183505.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171705.853848 sec (Thu 23 May 2002 16:35:05 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file: /home/wurst/data/T/annoloc2T_00003_20020523183507.pcap
Processing file: /home/wurst/data/T/annoloc2T_00004_20020523183509.pcap
Dump stop : 1022171711.974877 sec (Thu 23 May 2002 16:35:11 GMT)
Total dump duration: 6.121029 sec
Finished processing. Elapsed time: 0.216992 sec
Finished unloading flow memory. Time: 0.269568 sec
Percentage completed: 79.21%
Number of processed packets: 300000 (300.00 K)
Number of processed bytes: 15773420 (15.77 M)
Number of raw bytes: 207961760 (207.96 M)
Number of pcap bytes: 25973744 (25.97 M)
Number of IPv4 packets: 299892 (299.89 K) [99.96%]
Number of IPv6 packets: 21 [0.01%]
Number of A packets: 143169 (143.17 K) [47.72%]
Number of B packets: 156831 (156.83 K) [52.28%]
Number of A bytes: 7467202 (7.47 M) [47.34%]
Number of B bytes: 8306218 (8.31 M) [52.66%]
Average A packet load: 52.16
Average B packet load: 52.96
--------------------------------------------------------------------------------
tcpStates: Aggregated anomaly flags: 0xdf
--------------------------------------------------------------------------------
Headers count: min: 2, max: 4, average: 3.00
Number of GRE packets: 13 [0.00%]
Number of ICMP packets: 746 [0.25%]
Number of ICMPv6 packets: 5 [0.00%]
Number of TCP packets: 234132 (234.13 K) [78.04%]
Number of TCP bytes: 12984016 (12.98 M) [82.32%]
Number of UDP packets: 65014 (65.01 K) [21.67%]
Number of UDP bytes: 2737988 (2.74 M) [17.36%]
Number of IPv4 fragmented packets: 530 [0.18%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 7462 (7.46 K)
Number of processed A flows: 4237 (4.24 K) [56.78%]
Number of processed B flows: 3225 (3.23 K) [43.22%]
Number of request     flows: 4202 (4.20 K) [56.31%]
Number of reply       flows: 3260 (3.26 K) [43.69%]
Total   A/B    flow asymmetry: 0.14
Total req/rply flow asymmetry: 0.13
Number of processed   packets/flows: 40.20
Number of processed A packets/flows: 33.79
Number of processed B packets/flows: 48.63
Number of processed total packets/s: 49011.37 (49.01 K)
Number of processed A+B packets/s: 49011.37 (49.01 K)
Number of processed A   packets/s: 23389.70 (23.39 K)
Number of processed   B packets/s: 25621.67 (25.62 K)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 1219.08 (1.22 K)
Average full raw bandwidth: 271799744 b/s (271.80 Mb/s)
Average snapped bandwidth : 20615384 b/s (20.62 Mb/s)
Average full bandwidth : 271227328 b/s (271.23 Mb/s)
Max number of flows in memory: 7127 (7.13 K) [2.72%]
Memory usage: 0.06 GB [0.10%]
Aggregate flow status: 0x000018fa0202d044
[WRN] L3 SnapLength < Length in IP header
[WRN] L4 header snapped
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] IPv4/6 in IPv4/6
[INF] GRE encapsulation
[INF] SSDP/UPnP flows
[INF] Ethernet flows
[INF] ARP flows
$

The endreports differ because the fragments of tcpdump and editcap are different.

Polling timeout

If T2 is running out of files the default behaviour of the -D option is to wait for the next file. So you could leave him running somewhere, lurking for more food until you copy the next pcap into his bowl. Try this:

$ t2 -D ~/data/S/annoloc2S.pcap -w ~/results/S/
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 23626
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: basicStats, 0.8.6
    03: tcpStates, 0.8.6
    04: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308758 (308.76 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/S/annoloc2S.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
......
Processing file: /home/wurst/data/S/annoloc2S.pcap9
............

Now open another bash window and copy annoloc2S.pcap to No 9. It does not make sense, but it helps to demonstrate t2’s reaction.

$ cd ~/data/S
$ cp annoloc2S.pcap annoloc2S.pcap9
$

In the T2 window you will suddenly see that he grabs the new file, processes it and waits for the next victim. Now imagine that No 9 is missing, then T2 waits for ever, even if additional pcaps having a higher index are copied in his data folder. Sometimes No9 will never come and bring everything to a sudden halt. In order to avoid that, for certain overall statistical analysis, or monitoring it is preferable to skip the missing file and move on. For that purpose T2 implements a poll timeout constant MFPTMOUT. It defines the number of seconds until T2 moves on the next file index.

// -D option parameters
#define RROP      0    // round robin operation
#define POLLTM    5    // poll timing in sec for files
#define MFPTMOUT  0    // > 0: timeout n sec for poll timing > POLLTM, 0: no poll timout
#define SCHR     'p'   // separating char for number (refer to the doc for examples)

So mv index 9 and to index 10, so that we have a gap.

$ cd ~/data/S
$ mv annoloc2S.pcap9 annoloc2S.pcap10
$

Then set the timout for poll timing to 10 seconds, so that T2 waits for that period for the No 9 to arrive, otherwise he moves on to No 10. Recompile and rerun T2 on the same pcap.

$ t2conf tranalyzer2 -D MFPTMOUT=10
$ t2build tranalyzer2
...
$ t2 -D ~/data/S/annoloc2S.pcap -w ~/results/S/ 
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 24773
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: basicStats, 0.8.6
    03: tcpStates, 0.8.6
    04: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308758 (308.76 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/S/annoloc2S.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
.....Processing file: /home/wurst/data/S/annoloc2S.pcap10
...........

Round robin operation

In order to automate the flow file post processing and to conserve disk space a round robin approach is very helpful. The number of the round robin rollover should be adapted to the post processing speed and the size of the fragments. As a test switch on RROP, set the roll over index to 8 at the command line and reset the polling timeout mode, as we do not need it for the following demonstration:

$ t2conf tranalyzer2 -D RROP=1 -D MFPTMOUT=0
$ t2build tranalyzer2
...
$ t2 -D ~/data/S/annoloc2S.pcap,8 -w ~/results/S/
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 24084
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: basicStats, 0.8.6
    03: tcpStates, 0.8.6
    04: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308758 (308.76 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/S/annoloc2S.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
Processing file: /home/wurst/data/S/annoloc2S.pcap
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
Processing file: /home/wurst/data/S/annoloc2S.pcap
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
^C[INF] SIGINT: Stop flow creation: 0x0002
Processing file: /home/wurst/data/S/annoloc2S.pcap
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
Processing file: /home/wurst/data/S/annoloc2S.pcap
^C[INF] SIGINT: Stop flow creation: 0x0001
Dump stop : 1022171704.219830 sec (Thu 23 May 2002 16:35:04 GMT)
Total dump duration: 2.528658 sec
Finished processing. Elapsed time: 2.593898 sec
...

Interrupt it with 2 * ^C or send a t2stat -TERM command from another bash window.

Split output files

As with pcaps you can split flow files into smaller chunks, either measured in Bytes of number of flows. The general command line option is defined as follows:

–W PREFIX[:SIZE][,START]

The expression before the : defines the output file name prefix, the expression following denotes the maximal file size for each fragment; if omitted if defaults to OFRWFILELN defined in tranalyzer.h

// -W option parameters
#define OFRWFILELN 5E8 // default fragmented output file length (500MB)

START defines the index of the first file generated. If omitted it defaults to 0.

The SIZE of the files can be specified in bytes (default), KB (‘K’), MB (‘M’) or GB (‘G’). Scientific notation, i.e., 1e5 or 1E5 (=100000), can be used as well.d$ If no size is specified, If no size is specified, then the‘:’can be omitted.

If a ‘f’ is appended the unit is flow count. Hence, file chunks are produced containing the same amount of flows. Some typical examples are shown below.

Command Fragment Start IndexOutput Files
t2 -r ~/data/annoloc2.pcap -W ~/results/out:1.5E9,10 1.5GB 10 out10, out11, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out:1.5e9,5 1.5GB 5 out5, out6, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out:1.5G,1 1.5GB 1 out1, out2, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out:5000K 0.5MB 0 out0, out1, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out:5Kf 5000 Flows 0 out0, out1, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out:2.5G 2.5GB 0 out0, out1, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out,6 OFRWFILELN 0 out6, out7, …
t2 -r ~/data/annoloc2.pcap -W ~/results/out OFRWFILELN 0 out0, out1, …

Try them out and see what happens. Although being useful in production it is advisable to reset the round robin mode from the last chapter otherwise you end up in a loop with files constantly being overwritten.

$ t2conf tranalyzer2 -D RROP=0
$ t2build tranalyzer2
...
$

A prominent application in productive environments is a combination of the -D and -W option as shown below, with max 1000 flows per file and with the devil start index 666:

$ t2 -D ~/data/S/annoloc2S.pcap,8 -W ~/results/F/:1000f,666
================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 29723
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: tcpStates, 0.8.6
    03: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 308731 (308.73 K)
[INF] basicFlow: IPv6 Ver: 4, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/data/S/annoloc2S.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Processing file: /home/wurst/data/S/annoloc2S.pcap1
Processing file: /home/wurst/data/S/annoloc2S.pcap2
Processing file: /home/wurst/data/S/annoloc2S.pcap3
Processing file: /home/wurst/data/S/annoloc2S.pcap4
Processing file: /home/wurst/data/S/annoloc2S.pcap5
Processing file: /home/wurst/data/S/annoloc2S.pcap6
Processing file: /home/wurst/data/S/annoloc2S.pcap7
Processing file: /home/wurst/data/S/annoloc2S.pcap8
Dump stop : 1022171726.640398 sec (Thu 23 May 2002 16:35:26 GMT)
Total dump duration: 24.949226 sec
Finished processing. Elapsed time: 0.512757 sec
Finished unloading flow memory. Time: 0.626417 sec
Percentage completed: 100.00%
Number of processed packets: 1219015 (1.22 M)
Number of processed bytes: 64082726 (64.08 M)
Number of raw bytes: 844642686 (844.64 M)
Number of pcap bytes: 83587182 (83.59 M)
Number of IPv4 packets: 1218588 (1.22 M) [99.96%]
Number of IPv6 packets: 180 [0.01%]
Number of A packets: 564232 (564.23 K) [46.29%]
Number of B packets: 654783 (654.78 K) [53.71%]
Number of A bytes: 29448132 (29.45 M) [45.95%]
Number of B bytes: 34634594 (34.63 M) [54.05%]
Average A packet load: 52.19
Average B packet load: 52.89
--------------------------------------------------------------------------------
tcpStates: Aggregated anomaly flags: 0xdf
--------------------------------------------------------------------------------
Headers count: min: 2, max: 4, average: 3.01
Number of GRE packets: 20 [0.00%]
Number of IGMP packets: 12 [0.00%]
Number of ICMP packets: 3059 (3.06 K) [0.25%]
Number of ICMPv6 packets: 11 [0.00%]
Number of TCP packets: 948743 (948.74 K) [77.83%]
Number of TCP bytes: 52643546 (52.64 M) [82.15%]
Number of UDP packets: 266900 (266.90 K) [21.89%]
Number of UDP bytes: 11234272 (11.23 M) [17.53%]
Number of IPv4 fragmented packets: 2284 (2.28 K) [0.19%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 17589 (17.59 K)
Number of processed A flows: 9980 (9.98 K) [56.74%]
Number of processed B flows: 7609 (7.61 K) [43.26%]
Number of request     flows: 9933 (9.93 K) [56.47%]
Number of reply       flows: 7656 (7.66 K) [43.53%]
Total   A/B    flow asymmetry: 0.13
Total req/rply flow asymmetry: 0.13
Number of processed   packets/flows: 69.31
Number of processed A packets/flows: 56.54
Number of processed B packets/flows: 86.05
Number of processed total packets/s: 48859.83 (48.86 K)
Number of processed A+B packets/s: 48859.83 (48.86 K)
Number of processed A   packets/s: 22615.21 (22.61 K)
Number of processed   B packets/s: 26244.62 (26.24 K)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 704.99
Average full raw bandwidth: 270835712 b/s (270.84 Mb/s)
Average snapped bandwidth : 20548206 b/s (20.55 Mb/s)
Average full bandwidth : 270269600 b/s (270.27 Mb/s)
Max number of flows in memory: 15206 (15.21 K) [5.80%]
Memory usage: 0.07 GB [0.11%]
Aggregate flow status: 0x000018fa0202d044
[WRN] L3 SnapLength < Length in IP header
[WRN] L4 header snapped
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] IPv4/6 in IPv4/6
[INF] GRE encapsulation
[INF] SSDP/UPnP flows
[INF] Ethernet flows
[INF] ARP flows
$ ls ~/data/F
annoloc2S_flows.txt666  annoloc2S_flows.txt669  annoloc2S_flows.txt672  annoloc2S_flows.txt675  annoloc2S_flows.txt678  annoloc2S_flows.txt681  annoloc2S_headers.txt
annoloc2S_flows.txt667  annoloc2S_flows.txt670  annoloc2S_flows.txt673  annoloc2S_flows.txt676  annoloc2S_flows.txt679  annoloc2S_flows.txt682
annoloc2S_flows.txt668  annoloc2S_flows.txt671  annoloc2S_flows.txt674  annoloc2S_flows.txt677  annoloc2S_flows.txt680  annoloc2S_flows.txt683
$

How to process several different files

Often a multitude of different pcaps uncorrelated in time and source have to be processed in the background. For that you better write a script yourself. Here is an example.

#! /bin/bash
  
if [ -z $1 ]; then
    echo mtran filename extention startIndex endIndex
    exit;
fi

EXT=$2
START=$3
END=$4
for (( i=$START; i<=$END; i++)) do
    rfile=$HOME"/data/"$1$i.$EXT
    wfile=$HOME"/results/"$1$i
    echo "Processing: "$rfile, $wfile
    if [ -a $rfile ]; then
       t2 -r $rfile -w $wfile
    fi
done

Make sure that the polling timout and round robin mode is reset for the following tutorials, if not already done earlier.

$ t2conf tranalyzer2 -D MFPTMOUT=0 -D RROP=0
$ t2build tranalyzer2
...
$

Have fun and may the anteater be with you!