Tutorial: Geolocation and WHOIS behind it

Introduction

This tutorial details the different features of T2 concerning geolocation and the determination of the organization behind an IP address. There are two options:

basicFlow T2 geolocation and organization
geoip open source geolocation GeoIP/MaxMind DB

Note that the geoip DB is considerably slower than basicFlow.

Preparation

In order to do so, we need to prepare T2. If you did not complete the tutorials before, just follow the procedure described below.

First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins and compile the following plugins.

$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build -f tranalyzer2 basicFlow basicStats tcpStates connStat txtSink
...
BUILD SUCCESSFUL

If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:

$ mkdir ~/data ~/results
$ cd data

The anonymized sample PCAP used in this tutorial can be downloaded here: faf-exercise.pcap Please extract it under your data folder. Now you are all set for T2 IP label experiments.

basicFlow subnet and IP labeling

T2 provides its own geolabeling and IP identification service, so no need anymore to lookup a MaxMind DB or whois every IP address. The files necessary are always updated with each version of T2. The bzip2 subnet files for IPv4/6 are extracted by the autogen.sh script or by t2build using the programs under utils/. We will look at it below.

$ basicFlow
$ ls
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  Makefile.am  NEWS  README  src  subnets4.txt.bz2  subnets6.txt.bz2  t2plconf  tests  tor  utils
$

Now move to the src/ directory. The subnetHL[46].c files contain our binary-vector search algorithm. All .h files contain configuration constants.

$ cd src
$ ls
basicFlow.c  basicFlow.h  Makefile.am  subnetHL4.c  subnetHL4.h  subnetHL6.c  subnetHL6.h  utils.h
$

Open basicFlow.h and look for the user defined switches concerning subnets as shown below:

$ vi basicFlow.h

BFO_SUBNET_TEST activates the subnet labeling. It is switched on by default. If GRE, L2TP or TEREDO output switches (not shown here), are activated, then the subnet labeling can be activated separately for these addresses. We leave them off because the pcaps in this tutorial do not contain any of these encapsulations.

To be close to the default geoip plugin output we switch on the Autonomous Systems Numbers (ASN) and the longitude/latitude output as indicated below. The BFO_SUBNET_HEX toggles between a human readable whois output or a hex coded one, which can be a powerful selection mechanism when searching large flow files. We leave this option off for now.

Now open utils.h:

$ vi utils.h

The SUBRNG constant defines the search mode, either CIDR or ranges. The range mode has the advantage that any range can be defined by one single line whereas the CIDR notation would need many lines in the subnet file. We leave it at the default CIDR.

The WHOLEN, CNTYLEN and CTYLENconstants define the length of the County, City and WHOIS column respectively in the basicFlow output. The latter two are present since basicFlow 0.8.6 and controlled by CNTYCTY which is set by default off. So the generated binary subnet files do not contain this information, because we do not want to load information we do not use anyway.

SUBVERS defines the subnet version. Different versions are NOT compatible. t2build will warn you if there is a discrepancy. So leave it at the default value.

Save all open files and rebuild basicFlow, basicStats and connStat, because basicStats and connStat depend on the subnetHL[46].c routines if BFO_SUBNET_TEST is activated. You may also rebuild all plugins built so far, it is shorter to type. Instead of editing all the files you can also use the t2conf command:

$ t2conf basicFlow -D BFO_SUBNET_ASN=1 -D BFO_SUBNET_LL=1
$ t2build -R
...

t2 -r ~/data/faf-exercise.pcap -w ~/results

================================================================================
Tranalyzer 0.8.6 (Anteater), Tarantula. PID: 12542
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.6
    02: basicStats, 0.8.6
    03: tcpStates, 0.8.6
    04: connStat, 0.8.6
    05: txtSink, 0.8.6
[INF] basicFlow: IPv4 Ver: 3, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 312747 (312.75 K)
[INF] basicFlow: IPv6 Ver: 3, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/faf-exercise.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258544215.037210 sec (Wed 18 Nov 2009 11:36:55 GMT)
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 50276.646078 sec (13h 57m 56s)
Finished processing. Elapsed time: 0.004831 sec
Finished unloading flow memory. Time: 0.004860 sec
Percentage completed: 100.00%
Number of processed packets: 5902 (5.90 K)
Number of processed bytes: 4993414 (4.99 M)
Number of raw bytes: 4993414 (4.99 M)
Number of pcap bytes: 5087870 (5.09 M)
Number of IPv4 packets: 5902 (5.90 K) [100.00%]
Number of A packets: 1986 (1.99 K) [33.65%]
Number of B packets: 3916 (3.92 K) [66.35%]
Number of A bytes: 209315 (209.31 K) [4.19%]
Number of B bytes: 4784099 (4.78 M) [95.81%]
Average A packet load: 105.40
Average B packet load: 1221.68 (1.22 K)
--------------------------------------------------------------------------------
basicStats: Biggest Talker: 143.166.11.10 (US): 3101 (3.10 K) [52.54%] packets
basicStats: Biggest Talker: 143.166.11.10 (US): 4436320 (4.44 M) [88.84%] bytes
tcpStates: Aggregated anomaly flags: 0x4a
connStat: Number of unique source IPs: 25
connStat: Number of unique destination IPs: 26
connStat: Number of unique source/destination IPs connections: 10
connStat: Max unique number of source IP / destination port connections: 18
connStat: IP prtcon/sdcon, prtcon/scon: 1.800000, 0.720000
connStat: Source IP with max connections: 192.168.1.104: 2 connections
connStat: Destination IP with max connections: 77.67.44.206 (FR): 1 connections
--------------------------------------------------------------------------------
Headers count: min: 3, max: 3, average: 3.00
Number of TCP packets: 5902 (5.90 K) [100.00%]
Number of TCP bytes: 4993414 (4.99 M) [100.00%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 72
Number of processed A flows: 36 [50.00%]
Number of processed B flows: 36 [50.00%]
Number of request     flows: 36 [50.00%]
Number of reply       flows: 36 [50.00%]
Total   A/B    flow asymmetry: 0.00
Total req/rply flow asymmetry: 0.00
Number of processed   packets/flows: 81.97
Number of processed A packets/flows: 55.17
Number of processed B packets/flows: 108.78
Number of processed total packets/s: 0.12
Number of processed A+B packets/s: 0.12
Number of processed A   packets/s: 0.04
Number of processed   B packets/s: 0.08
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 0.00
Average full raw bandwidth: 795 b/s
Average full bandwidth : 792 b/s
Max number of flows in memory: 18 [0.01%]
Memory usage: 0.07 GB [0.11%]
Aggregate flow status: 0x0000000000004000
[INF] IPv4

Note that biggest talkers and connectors are now labeled with the country acronym, if one is found.

Let’s print the essential columns of the flow file relevant to geolocation and whois.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC  srcIPWho                       srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC  dstIPWho                   dstIPLat_Lng_relP
198.189.255.75  0         us       "California State University"  666_666_-1             192.168.1.104  0         07       "Private network"          666_666_-1
192.168.1.105   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.104   0         07       "Private network"              666_666_-1             77.67.44.206   3257      fr       "GTT Communications Inc."  48.71667_2.25_80
192.168.1.103   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.102   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.1     0         07       "Private network"              666_666_-1             192.168.1.103  0         07       "Private network"          666_666_-1
143.166.11.10   0         us       "Dell"                         30.51748_-97.67207_80  192.168.1.105  0         07       "Private network"          666_666_-1
77.67.44.206    3257      fr       "GTT Communications Inc."      48.71667_2.25_80       192.168.1.104  0         07       "Private network"          666_666_-1
63.245.221.11   395642    us       "Mozilla Corporation"          38.6409_-121.5228_80   192.168.1.104  0         07       "Private network"          666_666_-1

A 666 in the longitude, latitude column means that there is no location defined, also indicated by the radius -1. If you look in the subnets4.txt file you can confirm the IPv4 labeling. We will look at these files in detail below: Internal WHOIS: subnet your own

If you want to add the county and city column set CNTYCTY to 1 and recompile basicFlow with the -f option, because this information is now added in the binary subnet file and copied under the plugins directory. We will omit in the following the county and city info, to produce clearer output. If you want to add it just follow the steps below:

$ t2conf basicFlow -D CNTYCTY=1
$ t2build -R -f
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results
...

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC  srcIPCnty     srcIPCty      srcIPWho                       srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC  dstIPCnty     dstIPCty     dstIPWho                   dstIPLat_Lng_relP
198.189.255.75  0         us       "ca"          "bakersfiel"  "California State University"  666_666_-1             192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.105   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.104   0         07       "-"           "-"           "Private network"              666_666_-1             77.67.44.206   3257      fr       "Ile-de-Fra"  "Palaiseau"  "GTT Communications Inc."  48.71667_2.25_80
192.168.1.103   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.102   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.1     0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.103  0         07       "-"           "-"          "Private network"          666_666_-1
143.166.11.10   0         us       "Texas"       "Round Rock"  "Dell"                         30.51748_-97.67207_80  192.168.1.105  0         07       "-"           "-"          "Private network"          666_666_-1
77.67.44.206    3257      fr       "Ile-de-Fra"  "Palaiseau"   "GTT Communications Inc."      48.71667_2.25_80       192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1
63.245.221.11   395642    us       "California"  "Sacramento"  "Mozilla Corporation"          38.6409_-121.5228_80   192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1

Because we do not like to waste memory, some of the columns are cut. You can increase the values in the CNTYLEN and CTYLEN constants and redo the steps above.

TOR address labeling

By default TOR addresses are integrated in the subnet file by the subconv script under basicFlow/utils/ when t2build or autogen.sh are invoked. You can switch it off by editing the autogen.sh file and removing the -t option op subconv. Below a flow file is shown where TOR addresses are present, I currently do not have an anonymized pcap for you to play with. I’m on it.

$ t2 -r ~/data/wurst.pcap -w ~/results
...
Aggregate flow status: 0x010038f2c098fb04
[WRN] L3 SnapLength < Length in IP header
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] VLAN encapsulation
[INF] MPLS encapsulation
[INF] L2TP encapsulation
[INF] PPP/HDLC encapsulation
[INF] GRE encapsulation
[INF] AYIYA tunnel
[INF] Teredo tunnel
[INF] CAPWAP/LWAPP tunnel
[INF] Ethernet flows
[INF] Authentication Header (AH)
[INF] Encapsulating Security Payload (ESP)
[INF] TOR addresses

Note that the end report indicates that TOR addresses are present. In the flow file TOR addresses will be labeled by a TOR,, or just select all TOR traffic with the TORADD bit in flowStat as shown below.

tawk 'bitsanyset($flowStat,0x0100000000000000) { print $dir, $flowInd, $flowStat, wildcard("^(src|dst)IP") }' ~/results/wurst_flows.txt | tcol

%dir  flowInd  flowStat            srcIP         srcIPASN  srcIPCC  srcIPWho                       srcIPLat_Lng_relP    dstIP         dstIPASN  dstIPCC  dstIPWho                       dstIPLat_Lng_relP
A     29388    0x0100000000004300  N.U.D.E   	 3303      ch       "Bluewin"                      46.20222_6.14569_80  L.O.L.U       8437      at       "TOR,Hutchison Drei Austria "  16.37208_48.20849_1
B     29388    0x0100000000004301  L.O.L.U       8437      at       "TOR,Hutchison Drei Austria "  16.37208_48.20849_1  N.U.D.E       3303      ch       "Bluewin"                      46.20222_6.14569_80

geoip plugin

T2 supports the open source legacy GeoLite2 databases and the newer MaxMind GeoIP2 databases. Note that GeoIP does not provide any more updates for its legacy GeoLite2 DBs since January 2019.

Now move to the geoip plugin and look into it

$ geoip
$ ls
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  GeoLite2-City.mmdb.gz  GeoLiteCity.dat.gz  GeoLiteCityv6.dat.gz  Makefile.am  NEWS  README  scripts  src  t2plconf  tests
$

Note the GeoIP DB: GeoLiteCity.dat.gz and GeoLiteCityv6.dat.gz as well as the MaxMind DB: GeoLite2-City.mmdb.gz. If you move into the scripts folder you see two scripts:

genkml.sh map coordinates to google earth
updatedb.sh update DB

The first maps a flow file to a KML google earth file to produce an earth view with the location of the various IPs. The second updates the DBs. Run t2doc geoip for detailed information.

Now move to the src/ directory and look into the geoip.h file

$ cd src
$ ls
geoip.c  geoip.h  Makefile.am
$ vi geoip.h

Important is the selection of the type of DB. Since the 0.8.4 default is the MaxMind DB. As you can see the classification of source or destination IP can be separately enabled. Any output of country, city, language, etc, can also be enabled. For this tutorial we leave everything in default configuration as shown below.

...
// user defines
#define GEOIP_LEGACY     0 // Whether to use GeoLite2 (0) or the GeoLite legacy database (1)

#define GEOIP_SRC        1 // whether or not to display geo info for the source IP
#define GEOIP_DST        1 // whether or not to display geo info for the destination IP

#define GEOIP_CONTINENT  2 // 0: no continent, 1: name (GeoLite2), 2: two letters code
#define GEOIP_COUNTRY    2 // 0: no country, 1: name, 2: two letters code, 3: three letters code (Legacy)
#define GEOIP_CITY       1 // whether or not to display the city of the IP
#define GEOIP_POSTCODE   1 // whether or not to display the postal code of the IP
#define GEOIP_POSITION   1 // whether or not to display the position (latitude, longitude) of the IP
#define GEOIP_METRO_CODE 0 // whether or not to display the metro (dma) code of the IP (US only)

#if GEOIP_LEGACY == 0
#define GEOIP_ACCURACY   1    // whether or not to display the accuracy (GeoLite2)
#define GEOIP_TIMEZONE   1    // whether or not to display the time zone (GeoLite2)
#define GEOIP_LANG       "en" // Output language: en, de, fr, es, ja, pt-BR, ru, zh-CN, ...
#define GEOIP_BUFSIZE    64   // buffer size
#else // GEOIP_LEGACY == 1
#define GEOIP_REGION     1 // 0: no region,  1: name, 2: code
#define GEOIP_AREA_CODE  0 // whether or not to display the telephone area code of the IP
#define GEOIP_NETMASK    1 // 0: no netmask, 1: netmask as int (cidr),
                           // 2: netmask as hex (IPv4 only), 3: netmask as IP (IPv4 only)
#define GEOIP_DB_CACHE   2 // 0: read DB from file system (slower, least memory)
                           // 1: index cache (cache frequently used index only)
                           // 2: memory cache (faster, more memory)
#endif // GEOIP_LEGACY == 1

#define GEOIP_UNKNOWN    "--" // Representation of unknown locations (GeoIP's default)
...

So compile the plugin and rerun T2 on the said pcap.

$ t2build geoip
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

To compare with the basicFlow output, I aggregated the same columns as above:

tawk '{ print $srcIP, wildcard("^srcIp"), $dstIP, wildcard("^dstIp") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIpContinent  srcIpCountry  srcIpCity        srcIpPostcode  srcIpAccuracy  srcIpLat   srcIpLong    srcIpTimeZone          dstIP          dstIpContinent  dstIpCountry  dstIpCity  dstIpPostcode  dstIpAccuracy  dstIpLat   dstIpLong  dstIpTimeZone
198.189.255.75  NA              US            "Long Beach"     90802          5              33.763000  -118.177400  "America/Los_Angeles"  192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.105   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.104   --              --            "--"             --             0              0.000000   0.000000     ""                     77.67.44.206   EU              IE            "--"       --             200            53.347200  -6.243900  "Europe/Dublin"
192.168.1.103   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.102   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.1     --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.103  --              --            "--"       --             0              0.000000   0.000000   ""
143.166.11.10   NA              US            "--"             --             1000           37.751000  -97.822000   "America/Chicago"      192.168.1.105  --              --            "--"       --             0              0.000000   0.000000   ""
77.67.44.206    EU              IE            "--"             --             200            53.347200  -6.243900    "Europe/Dublin"        192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""
63.245.221.11   NA              US            "Mountain View"  94041          50             37.389300  -122.078300  "America/Los_Angeles"  192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""

Hex code labeling

As mentioned above T2 supports hex code labeling, which is a powerful flow selection mechanism, as integer AND operations are much faster than strings compares. Open basicFlow.h and set BFO_SUBNET_HEX to 1, rebuild all and rerun t2, as indicated below

$ t2conf basicFlow -D BFO_SUBNET_HEX=1
$ t2build -R
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

Now the strings are gone and replaced by 32 bit hex numbers. Now you can select all flows of a certain country and/or organization with a simple tawk script.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC     srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC     dstIPLat_Lng_relP
198.189.255.75  0         0xe9004dcc  666_666_-1             192.168.1.104  0         0x01015ea4  666_666_-1
192.168.1.105   0         0x01015ea4  666_666_-1             192.168.1.1    0         0x01015ea4  666_666_-1
192.168.1.104   0         0x01015ea4  666_666_-1             77.67.44.206   3257      0x4b00a662  48.71667_2.25_80
192.168.1.103   0         0x01015ea4  666_666_-1             192.168.1.1    0         0x01015ea4  666_666_-1
192.168.1.102   0         0x01015ea4  666_666_-1             192.168.1.1    0         0x01015ea4  666_666_-1
192.168.1.1     0         0x01015ea4  666_666_-1             192.168.1.103  0         0x01015ea4  666_666_-1
143.166.11.10   0         0xe90075f3  30.51748_-97.67207_80  192.168.1.105  0         0x01015ea4  666_666_-1
77.67.44.206    3257      0x4b00a662  48.71667_2.25_80       192.168.1.104  0         0x01015ea4  666_666_-1
63.245.221.11   395642    0xe9011db5  38.6409_-121.5228_80   192.168.1.104  0         0x01015ea4  666_666_-1

The 32 bit hex coding is shown below:

The code to text resolution can be found in

  • who[46]CntryCds.txt
  • who[46]OrgCds.txt

Now the strings are gone and replaced by 32 bit hex numbers. Now you can select all flows of a certain country and/or organization with a simple tawk script.

tawk 'and(strtonum($srcIPCC), 0xff000000) == 0xe9000000 || hdr() { print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC     srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC     dstIPLat_Lng_relP
198.189.255.75  0         0xe9004dcc  666_666_-1             192.168.1.104  0         0x01015ea4  666_666_-1
143.166.11.10   0         0xe90075f3  30.51748_-97.67207_80  192.168.1.105  0         0x01015ea4  666_666_-1
63.245.221.11   395642    0xe9011db5  38.6409_-121.5228_80   192.168.1.104  0         0x01015ea4  666_666_-1

In srcIPCC or dstIPCC, the bit 0x00800000 indicates a TOR address or you can select TOR flows just with the flowStat bit as indicated below from the pcap I did not anonymize yet.

tawk 'bitsanyset($flowStat, 0x0100000000000000) { print $dir, $flowInd, $flowStat, wildcard("^(src|dst)IP") }' ~/results/wurst_flows.txt | tcol

%dir  flowInd  flowStat            srcIP         srcIPASN  srcIPCC     srcIPLat_Lng_relP    dstIP         dstIPASN  dstIPCC     dstIPLat_Lng_relP
A     29388    0x0100000000004300  K.A.C.K  	 3303      0x2c003a60  46.20222_6.14569_80  S.H.I.T   	  8437      0x0f80c5a4  16.37208_48.20849_1
B     29388    0x0100000000004301  S.H.I.T   	 8437      0x0f80c5a4  16.37208_48.20849_1  K.A.C.K   	  3303      0x2c003a60  46.20222_6.14569_80

Internal WHOIS subnet your own

Which admin was not asking himself WHO, WHERE and WHY the fuck is somebody doing what he is doing, or how to find an in-house IP 10.23.4.5? Yeah, I did lot’s and got weary to lookup Excel sheets, logs or if I was lucky, DBs. Now you try to do that on 1000 addresses and hand over a report in no time.

As the private IPv4/6 address space is hopefully only listed inside your organization we need to build our own subnet file. Building one is fairly easy if IP to location and organization is available as a tab or csv file. So that you can expand the current subnet files or rewrite them, T2 is shipped with the .txt version and including scripts to convert them to the T2 compatible binary version. That is the reason, why the initial build of basicFlow takes a bit longer.

Let’s look now at the basicFlow directory after the plugin is compiled. The HL.txt files are intermittent files to the binary format HL.bin. The original is the decompressed subnets[46].txt file, which contains all information.

$ ls
aclocal.m4  autom4te.cache  config.h     config.status  COPYING  libtool   Makefile.am  README    subnets4_HLP.bin  subnets4.txt      subnets6_HLP.txt  subnets6.txt.bz2  tor
AUTHORS     build-aux       config.h.in  configure      doc      m4        Makefile.in  src       subnets4_HLP.txt  subnets4.txt.bz2  subnets6_HL.txt   t2plconf          utils
autogen.sh  ChangeLog       config.log   configure.ac   INSTALL  Makefile  NEWS         stamp-h1  subnets4_HL.txt   subnets6_HLP.bin  subnets6.txt      tests
$

Open subnets4.txt, the IPv6 is built in a similar fashion.

lsx subnets4.txt

#                                   3    21062019
# IPCIDR        Msk     IP range                        CtryWhoCode    ASN      Radius  Latitude        Longitude      Country  County       City         Org
# Begin IPv4 private address space
0.0.0.0/32      32      0.0.0.0-0.0.0.0                 0x0101d28b      0       -1.0    666.000000      666.000000      00      -            -            Unspecified
10.0.0.0/8      8       10.0.0.0-10.255.255.255         0x01015ea4      0       -1.0    666.000000      666.000000      04      -            -            Private network
127.0.0.0/8     8       127.0.0.0-127.255.255.255       0x0101019a      0       -1.0    666.000000      666.000000      01      -            -            Loopback
100.64.0.0/10   10      100.64.0.0-100.127.255.255      0x01018e73      0       -1.0    666.000000      666.000000      20      -            -            Shared address space
169.254.0.0/16  16      169.254.0.0-169.254.255.255     0x0100ff11      0       -1.0    666.000000      666.000000      02      -            -            Link-local
172.16.0.0/12   12      172.16.0.0-172.31.255.255       0x01015ea4      0       -1.0    666.000000      666.000000      05      -            -            Private network
192.0.0.0/24    24      192.0.0.0-192.0.0.255           0x01015ea4      0       -1.0    666.000000      666.000000      06      -            -            Private network
192.0.2.0/24    24      192.0.2.0-192.0.2.255           0x0101aaf5      0       -1.0    666.000000      666.000000      21      -            -            TEST-NET-1
192.88.99.0/24  24      192.88.99.0-192.88.99.255       0x0100caca      0       -1.0    666.000000      666.000000      60      -            -            IPv6 to IPv4 relay
192.168.0.0/16  16      192.168.0.0-192.168.255.255     0x01015ea4      0       -1.0    666.000000      666.000000      07      -            -            Private network
198.18.0.0/15   15      198.18.0.0-198.119.255.255      0x01015ea4      0       -1.0    666.000000      666.000000      08      -            -            Private network
198.51.100.0/16 16      198.51.100.0-198.51.100.255     0x0101aaf6      0       -1.0    666.000000      666.000000      22      -            -            TEST-NET-2
203.0.113.0/24  24      203.0.113.0-203.0.113.255       0x0101aaf7      0       -1.0    666.000000      666.000000      23      -            -            TEST-NET-3
224.0.0.0/4     4       224.0.0.0-239.255.255.255       0x01011e1b      0       -1.0    666.000000      666.000000      10      -            -            Multicast
240.0.0.0/4     4       240.0.0.0-255.255.255.254       0x0101702b      0       -1.0    666.000000      666.000000      24      -            -            Reserved
255.255.255.255/32  32  255.255.255.255-255.255.255.255 0x01003c9a      0       -1.0    666.000000      666.000000      11      -            -            Broadcast
# End IPv4 privat address space
1.0.0.0/24      24      1.0.0.0-1.0.0.255               0xe9000a24      13335   80.0    34.052230      -118.243680      us      California   Los Angeles  APNIC Research and Development
1.0.1.0/24      24      1.0.1.0-1.0.1.255               0x31003c7c      0       80.0    26.061390       119.306110      cn      Fujian       Fuzhou       CHINANET FUJIAN PROVINCE NETWORK
1.0.2.0/24      24      1.0.2.0-1.0.2.255               0x310043dc      0       80.0    26.061390       119.306110      cn      Fujian       Fuzhou       CHINANET FUJIAN PROVINCE NETWORK
1.0.4.0/22      22      1.0.4.0-1.0.7.255               0x1001b2e9      0       80.0   -37.814000       144.963320      au      Victoria     Melbourne    Wirefreebroadband Pty Ltd
...

You can now write your own subnet file or modify the original one, so make a copy of the subnets4.txt to have an easy way to restore the default. Let’s define the 192.168. network a bit more precisely by adding two more lines describing the Knoedelrutschen company with one /24 and one /28 network:

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07    -    -     Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.0.0-192.168.1.255        0x010136e0   0      -1.0    666.000000  666.000000   eu   -	  -     Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.0.0-192.168.1.15         0x010136e0   0       0.05   48.856892   2.350850     fr    -    -     KRI, Managers, Eifeltower, over paid
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08    -    -     Private network
...

Because autogen.sh decompresses the subnets4.txt.bz2 and thus overwrites the subnet file we need first to bzip2 your subnets4.txt and then build basicFlow with the -f option. That is for beginners the easiest way to reconstruct the binary and ship it to the ~/.tranalyzer/plugins/ folder. Then rerun t2 with the pcap.

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow

...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

Now open the flow file and you will see your IP labeling.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			        srcIPLat_Lng_relP	dstIP		dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"	666_666_-1		192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1
192.168.1.105	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		77.67.44.206	3257		fr	"GTT Communications Inc."	48.71667_2.25_80
192.168.1.103	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05	192.168.1.103	0		eu	"Knoedelrutschen Inc"		666_666_-1
143.166.11.10	0	    us	    "Dell"			        30.51748_-97.67207_80	192.168.1.105	0		eu	"Knoedelrutschen Inc"		666_666_-1
77.67.44.206	3257	    fr	    "GTT Communications Inc."	        48.71667_2.25_80	192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1
63.245.221.11	395642	    us	    "Mozilla Corporation"		38.6409_-121.5228_80	192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1

As the most important part of a company are the engineer department, let’s expand the network definition by one more /26 network

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07        Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.1.0-192.168.1.255        0x010136e0   0       1000.0 666.000000  666.000000   eu	 Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.1.0-192.168.1.15         0x010136e0   0       1.5    48.856892   2.350850     fr        KRI, Managers, Eifeltower, over paid
192.168.1.64/26                     26   192.168.1.64-192.168.1.127       0x010136e0   0       0.01   46.947990   7.459672     ch        Engineers, Bern, @bears
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08        Private network
...

Compress to bzip2, recompile and rerun t2.

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow

...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
$

Note, that the engineers are now properly labeled. If an address is located outside the managers and engineers network it would be labeled as Knoedelrutschen Inc.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			    srcIPLat_Lng_relP	    dstIP	    dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"   666_666_-1		    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
192.168.1.105	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  77.67.44.206    3257	fr	"GTT Communications Inc."	48.71667_2.25_80
192.168.1.103	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "   48.85689_2.35085_0.05   192.168.1.103   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
143.166.11.10	0	    us	    "Dell"			    30.51748_-97.67207_80   192.168.1.105   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
77.67.44.206	3257	    fr	    "GTT Communications Inc."	    48.71667_2.25_80	    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
63.245.221.11	395642	    us	    "Mozilla Corporation"	    38.6409_-121.5228_80    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01

As we are using the CIDR mode, let’s now test the range mode. So open utils.h and set SUBRNG to 1 or simply use the t2conf command below.

$ t2conf basicFlow -D SUBRNG=1
$

Now t2 selects the third column in the subnet file. Add a new /28 network as listed below. If you have a dash in the CIDR column and CIDR is configured, the entry is ignored, as the range is definitely not CIDR. You can have any values in the CIDR or range column, as non CIDR ranges would consist of several rows of CIDR. Here we have clearly a non CIDR network and we are in the RANGE mode anyway. We have now SW and HW engineers separated.

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07        Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.0.0-192.168.1.255        0x010136e0   0       1000.0 666.000000  666.000000   eu	 Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.1.0-192.168.1.15         0x010136e0   0       1.5    48.856892   2.350850     fr        KRI, Managers, Eifeltower, over paid
192.168.1.0/28                      26   192.168.1.64-192.168.1.103       0x010136e0   0       0.01   46.947990   7.459672     ch        HW-Engineers, Bern, @bears
-                                   26   192.168.1..4-192.168.1.108       0x010136e0   0       0.01   46.947990   7.459672     ch        SW-Engineers, Bern, @bears
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08        Private network
...

So again bzip2, rebuild and rerun t2.

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow

...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
$

If you look into the flow file, you will now discover that there are also SW-Engineers

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			    srcIPLat_Lng_relP	    dstIP	    dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"   666_666_-1		    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
192.168.1.105	0	    ch	    "SW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    ch	    "SW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  77.67.44.206    3257	fr	"GTT Communications Inc."	48.71667_2.25_80
192.168.1.103	0	    ch	    "HW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    ch	    "HW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "   48.85689_2.35085_0.05   192.168.1.103   0		ch	"HW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
143.166.11.10	0	    us	    "Dell"			    30.51748_-97.67207_80   192.168.1.105   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
77.67.44.206	3257	    fr	    "GTT Communications Inc."	    48.71667_2.25_80	    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
63.245.221.11	395642	    us	    "Mozilla Corporation"	    38.6409_-121.5228_80    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01

t2whois

Suppose you want to write your own subnet file or just test a few IP’s without using whois or any other DB, t2whois allows you to query the anteater DB. It is compiled along side with basicFlow. Try the following commands to get acquainted with t2whois.

t2whois -h

Usage:
    t2whois [OPTION...] [INPUT...]

Input
    -               If no input is provided, read from stdin
    ip              Read IP address(es) directly from the command line
    -r file         Read IP address(es) from 'file'

Optional arguments:
    -d file         Binary subnet file to use for IPv4
    -e file         Binary subnet file to use for IPv6

    -o field(s)     Field(s) to output (in order). Many fields can be selected
                    by using multiple '-o' options or by separating the fields
                    with a comma, e.g., -o field1,field2. Valid field names are
                    ip, netmask, net, mask, range, who, country, county, city,
                    asn, lat, lng, prec, id

    -q              Do not display an interactive prompt when reading from stdin

    -k file         Generate a KML 'file'

    -l              Output one line per IP
    -H              Do not output the header with -l option
    -t char         Start character(s) for column header (-l option) ["%"]

    -s char         Column separator for output ["\t"]

    -L              Describe the available fields and exit

    -V              Show info about the database (version, ...) and exit

    -h              Show help options and exit

t2whois 77.67.44.206 63.245.221.11

IP        	77.67.44.206
Network/Mask	77.67.0.0/17
Range     	77.67.0.0 - 77.67.127.255
Who       	GTT Communications Inc.
Country   	fr
ASN       	3257
Latitude  	48.716671
Longitude 	2.250000
Precision 	80.000000
NetID     	0x4b00a662


IP        	63.245.221.11
Network/Mask	63.245.208.0/20
Range     	63.245.208.0 - 63.245.223.255
Who       	Mozilla Corporation
Country   	us
ASN       	395642
Latitude  	38.640900
Longitude 	-121.522797
Precision 	80.000000
NetID     	0xe9011db5

t2whois -l 77.67.44.206 88.67.56.56

%IP           Network/Mask  Range                      Who                      Country  ASN   Latitude   Longitude  Precision  NetID
77.67.44.206  77.67.0.0/17  77.67.0.0 - 77.67.127.255  GTT Communications Inc.  fr       3257  48.716671  2.250000   80.000000  0x4b00a662
88.67.56.56   88.67.0.0/18  88.67.0.0 - 88.67.63.255   ARCOR AG                 de       3209  50.143280  8.571110   80.000000  0x39000ba9

If you want the interactive mode:

t2whois

[INF] Enter an IPv4/6 address, 'header', 'help' or 'quit' to exit

>>> help
The following commands are available:
    ip         get information about the IPv4/6 address 'ip'
    header     display the header when '-l' option was used
    help       show this help
    quit       exit the program
>>> 88.67.56.56

IP              88.67.56.56
Network/Mask    88.67.0.0/18
Range           88.67.0.0 - 88.67.63.255
Who             ARCOR AG
Country         de
ASN             3209
Latitude        50.143280
Longitude       8.571110
Precision       80.000000
NetID           0x39000ba9

>>>

And if you want to lookup all public hosts in your flow file:

tawk '!privip($srcIP) { print $srcIP } !privip($dstIP) { print $dstIP }' ~/results/faf-exercise_flows.txt | sort -u | t2whois -l

%IP             Network/Mask     Range                          Who                          Country  ASN     Latitude    Longitude    Precision  NetID
143.166.11.10   143.166.0.0/16   143.166.0.0 - 143.166.255.255  Dell                         us       0       30.517477   -97.672066   80.000000  0xe90075f3
198.189.255.75  198.189.0.0/16   198.189.0.0 - 198.189.255.255  California State University  us       0       666.000000  666.000000   -1.000000  0xe9004dcc
63.245.221.11   63.245.208.0/20  63.245.208.0 - 63.245.223.255  Mozilla Corporation          us       395642  38.640900   -121.522797  80.000000  0xe9011db5
77.67.44.206    77.67.0.0/17     77.67.0.0 - 77.67.127.255      GTT Communications Inc.      fr       3257    48.716671   2.250000     80.000000  0x4b00a662

If you like to select only certain columns:

t2whois -L

The fields available are:
	ip         	IP
	netmask    	Network/Mask
	net        	Network
	mask       	Mask
	range      	Range
	who        	Who
	loc        	Location
	asn        	ASN
	lat        	Latitude
	lng        	Longitude
	prec       	Precision
	id         	NetID

tawk -H '{ print $srcIP }' ~/results/faf-exercise_flows.txt | sort -u | t2whois -l -o ip,netmask,asn,country,who

%IP		Network/Mask	ASN	Country	Who
143.166.11.10	143.166.0.0/16	0	us	Dell
192.168.1.1	192.168.0.0/16	0	07	Private network
192.168.1.102	192.168.0.0/16	0	07	Private network
192.168.1.103	192.168.0.0/16	0	07	Private network
192.168.1.104	192.168.0.0/16	0	07	Private network
192.168.1.105	192.168.0.0/16	0	07	Private network
198.189.255.75	198.189.0.0/16	0	us	California State University
63.245.221.11	63.245.208.0/20	395642	us	Mozilla Corporation
77.67.44.206	77.67.0.0/17	3257	fr	GTT Communications Inc.

Let us finish this section with an example of t2whois -k option which can be used to generate a KML file.

$ tawk -H '{ print shost(); print dhost() }' ~/results/faf-exercise_flows.txt | t2whois -k ~/results/faf-exercise.kml
$ ls ~/results | grep -F .kml
faf-exercise.kml
$

The faf-exercise.kml file contains information about each IP (as specified with t2whois -o option) and its location (latitude, longitude). This KML (Keyhole Markup Language) file can then be loaded in, e.g., Google Maps or Google Earth, and will display each IP at its exact location.

t2whois can generate KML files to display addresses location…
… and additional information is also readily available

You can also load your own subnet file using the -e or -d options. Try t2doc scripts for more documentation.

Right! This is all for now. And don’t forget to reset the configuration of basicFlow for the next tutorials:

$ t2conf basicFlow -D BFO_SUBNET_ASN=0 -D BFO_SUBNET_LL=0 -D BFO_SUBNET_HEX=0 -D SUBRNG=0
$ t2build -R
...
$

Have fun!