Tutorial: Geolocation and WHOIS behind it

Introduction

This tutorial details the different features of T2 concerning geolocation and the determination of the organization behind an IP address. There are two options:

basicFlow T2 geolocation and organization
geoip open source geolocation GeoIP/MaxMind DB

Note that the geoip DB is considerably slower than basicFlow.

Preparation

In order to do so, we need to prepare T2. If you did not complete the tutorials before, just follow the procedure described below.

First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins and compile the following plugins:

$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build -f tranalyzer2 basicFlow basicStats tcpStates connStat txtSink
...
BUILD SUCCESSFUL

If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:

$ mkdir ~/data ~/results
$ cd data

The anonymized sample PCAP used in this tutorial can be downloaded here: faf-exercise.pcap Please extract it under your data folder. Now you are all set for T2 IP label experiments.

basicFlow subnet and IP labeling

T2 provides its own geolabeling and IP identification service, so no need anymore to lookup a MaxMind DB or whois every IP address. The files necessary are always updated with each version of T2. The bzip2 subnet files for IPv4/6 are extracted by the autogen.sh script or by t2build using the programs under utils/. We will look at it below.

$ basicFlow
$ ls
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  Makefile.am  NEWS  README  src  subnets4.txt.bz2  subnets6.txt.bz2  t2plconf  tests  tor  utils
$

Now move to the src/ directory. The subnetHL[46].c files contain our binary-vector search algorithm. All .h files contain configuration constants.

$ cd src
$ ls
basicFlow.c  basicFlow.h  Makefile.am  subnetHL4.c  subnetHL4.h  subnetHL6.c  subnetHL6.h  utils.h
$

Open basicFlow.h and look for the user defined switches concerning subnets as shown below:

$ vi basicFlow.h

BFO_SUBNET_TEST activates the subnet labeling. It is switched on by default. If GRE, L2TP or TEREDO output switches (not shown here), are activated, then the subnet labeling can be activated separately for these addresses. We leave them off because the pcaps in this tutorial do not contain any of these encapsulations.

To be close to the default geoip plugin output we switch on the Autonomous Systems Numbers (ASN) and the longitude/latitude output as indicated below. The BFO_SUBNET_HEX toggles between a human readable whois output or a hex coded one, which can be a powerful selection mechanism when searching large flow files. We leave this option off for now.

Now open utils.h:

$ vi utils.h

The SUBRNG constant defines the search mode, either CIDR or ranges. The range mode has the advantage that any range can be defined by one single line whereas the CIDR notation would need many lines in the subnet file. We leave it at the default CIDR.

The WHOLEN, CNTYLEN and CTYLENconstants define the length of the County, City and WHOIS column respectively in the basicFlow output. The latter two are present since basicFlow 0.8.7 and controlled by CNTYCTY which is set by default off. So the generated binary subnet files do not contain this information, because we do not want to load information we do not use anyway.

SUBVERS defines the subnet version. Different versions are NOT compatible. t2build will warn you if there is a discrepancy. So leave it at the default value.

Save all open files and rebuild basicFlow, basicStats and connStat, because basicStats and connStat depend on the subnetHL[46].c routines if BFO_SUBNET_TEST is activated. You may also rebuild all plugins built so far, it is shorter to type. Instead of editing all the files you can also use the t2conf command:

$ t2conf basicFlow -D BFO_SUBNET_ASN=1 -D BFO_SUBNET_LL=1
$ t2build -R
...

t2 -r ~/data/faf-exercise.pcap -w ~/results

================================================================================
Tranalyzer 0.8.7 (Anteater), Tarantula. PID: 12542
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.7
    02: basicStats, 0.8.7
    03: tcpStates, 0.8.7
    04: connStat, 0.8.7
    05: txtSink, 0.8.7
[INF] basicFlow: IPv4 Ver: 3, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 312747 (312.75 K)
[INF] basicFlow: IPv6 Ver: 3, Rev: 01072019, Range Mode: 0, subnet ranges loaded: 21494 (21.49 K)
Processing file: /home/wurst/faf-exercise.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258544215.037210 sec (Wed 18 Nov 2009 11:36:55 GMT)
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 50276.646078 sec (13h 57m 56s)
Finished processing. Elapsed time: 0.004831 sec
Finished unloading flow memory. Time: 0.004860 sec
Percentage completed: 100.00%
Number of processed packets: 5902 (5.90 K)
Number of processed bytes: 4993414 (4.99 M)
Number of raw bytes: 4993414 (4.99 M)
Number of pcap bytes: 5087870 (5.09 M)
Number of IPv4 packets: 5902 (5.90 K) [100.00%]
Number of A packets: 1986 (1.99 K) [33.65%]
Number of B packets: 3916 (3.92 K) [66.35%]
Number of A bytes: 209315 (209.31 K) [4.19%]
Number of B bytes: 4784099 (4.78 M) [95.81%]
Average A packet load: 105.40
Average B packet load: 1221.68 (1.22 K)
--------------------------------------------------------------------------------
basicStats: Biggest Talker: 143.166.11.10 (US): 3101 (3.10 K) [52.54%] packets
basicStats: Biggest Talker: 143.166.11.10 (US): 4436320 (4.44 M) [88.84%] bytes
tcpStates: Aggregated anomaly flags: 0x4a
connStat: Number of unique source IPs: 25
connStat: Number of unique destination IPs: 26
connStat: Number of unique source/destination IPs connections: 10
connStat: Max unique number of source IP / destination port connections: 18
connStat: IP prtcon/sdcon, prtcon/scon: 1.800000, 0.720000
connStat: Source IP with max connections: 192.168.1.104: 2 connections
connStat: Destination IP with max connections: 77.67.44.206 (FR): 1 connections
--------------------------------------------------------------------------------
Headers count: min: 3, max: 3, average: 3.00
Number of TCP packets: 5902 (5.90 K) [100.00%]
Number of TCP bytes: 4993414 (4.99 M) [100.00%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 72
Number of processed A flows: 36 [50.00%]
Number of processed B flows: 36 [50.00%]
Number of request     flows: 36 [50.00%]
Number of reply       flows: 36 [50.00%]
Total   A/B    flow asymmetry: 0.00
Total req/rply flow asymmetry: 0.00
Number of processed   packets/flows: 81.97
Number of processed A packets/flows: 55.17
Number of processed B packets/flows: 108.78
Number of processed total packets/s: 0.12
Number of processed A+B packets/s: 0.12
Number of processed A   packets/s: 0.04
Number of processed   B packets/s: 0.08
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 0.00
Average full raw bandwidth: 795 b/s
Average full bandwidth : 792 b/s
Max number of flows in memory: 18 [0.01%]
Memory usage: 0.07 GB [0.11%]
Aggregate flow status: 0x0000000000004000
[INF] IPv4

Note that biggest talkers and connectors are now labeled with the country acronym, if one is found.

Let’s print the essential columns of the flow file relevant to geolocation and whois.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC  srcIPWho                       srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC  dstIPWho                   dstIPLat_Lng_relP
198.189.255.75  0         us       "California State University"  666_666_80             192.168.1.104  0         07       "Private network"          666_666_-1
192.168.1.105   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.104   0         07       "Private network"              666_666_-1             77.67.44.206   3257      us       "GTT Communications Inc."  42.3636_-71.08521_80
192.168.1.103   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.102   0         07       "Private network"              666_666_-1             192.168.1.1    0         07       "Private network"          666_666_-1
192.168.1.1     0         07       "Private network"              666_666_-1             192.168.1.103  0         07       "Private network"          666_666_-1
143.166.11.10   0         us       "Dell"                         30.51748_-97.67207_80  192.168.1.105  0         07       "Private network"          666_666_-1
77.67.44.206    3257      us       "GTT Communications Inc."      42.3636_-71.08521_80   192.168.1.104  0         07       "Private network"          666_666_-1
63.245.221.11   395642    us       "Mozilla Corporation"          38.6409_-121.5228_80   192.168.1.104  0         07       "Private network"          666_666_-1

A 666 in the longitude, latitude column means that there is no location defined, also indicated by the radius -1. If you look in the subnets4.txt file you can confirm the IPv4 labeling. We will look at these files in detail below: Internal WHOIS: subnet your own

If you want to add the county and city column set CNTYCTY to 1 and recompile basicFlow with the -f option, because this information is now added in the binary subnet file and copied under the plugins directory. We will omit in the following the county and city info, to produce clearer output. If you want to add it just follow the steps below:

$ t2conf basicFlow -D CNTYCTY=1
$ t2build -R -f
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results
...

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC  srcIPCnty     srcIPCty      srcIPWho                       srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC  dstIPCnty     dstIPCty     dstIPWho                   dstIPLat_Lng_relP
198.189.255.75  0         us       "ca"          "long beach"  "California State University"  666_666_80             192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.105   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.104   0         07       "-"           "-"           "Private network"              666_666_-1             77.67.44.206   3257      us       "Massachuse"  "Cambridge"  "GTT Communications Inc."  42.3636_-71.08521_80
192.168.1.103   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.102   0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.1    0         07       "-"           "-"          "Private network"          666_666_-1
192.168.1.1     0         07       "-"           "-"           "Private network"              666_666_-1             192.168.1.103  0         07       "-"           "-"          "Private network"          666_666_-1
143.166.11.10   0         us       "Texas"       "Round Rock"  "Dell"                         30.51748_-97.67207_80  192.168.1.105  0         07       "-"           "-"          "Private network"          666_666_-1
77.67.44.206    3257      us       "Massachuse"  "Cambridge"   "GTT Communications Inc."      42.3636_-71.08521_80   192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1
63.245.221.11   395642    us       "California"  "Sacramento"  "Mozilla Corporation"          38.6409_-121.5228_80   192.168.1.104  0         07       "-"           "-"          "Private network"          666_666_-1

Because we do not like to waste memory, some of the columns are cut. You can increase the values in the CNTYLEN and CTYLEN constants and redo the steps above.

TOR address labeling

By default TOR addresses are integrated in the subnet file by the subconv script under basicFlow/utils/ when t2build or autogen.sh are invoked. You can switch it off by editing the autogen.sh file and removing the -t option op subconv. Below a flow file is shown where TOR addresses are present, I currently do not have an anonymized pcap for you to play with. I’m on it.

$ t2 -r ~/data/wurst.pcap -w ~/results
...
Aggregate flow status: 0x010038f2c098fb04
[WRN] L3 SnapLength < Length in IP header
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] IPv4
[INF] IPv6
[INF] IPv4/6 fragmentation
[INF] VLAN encapsulation
[INF] MPLS encapsulation
[INF] L2TP encapsulation
[INF] PPP/HDLC encapsulation
[INF] GRE encapsulation
[INF] AYIYA tunnel
[INF] Teredo tunnel
[INF] CAPWAP/LWAPP tunnel
[INF] Ethernet flows
[INF] Authentication Header (AH)
[INF] Encapsulating Security Payload (ESP)
[INF] TOR addresses

Note that the end report indicates that TOR addresses are present. In the flow file TOR addresses will be labeled by a TOR,, or just select all TOR traffic with the TORADD bit in flowStat as shown below.

tawk 'bitsanyset($flowStat,0x0100000000000000) { print $dir, $flowInd, $flowStat, wildcard("^(src|dst)IP") }' ~/results/wurst_flows.txt | tcol

%dir  flowInd  flowStat            srcIP         srcIPASN  srcIPCC  srcIPWho                       srcIPLat_Lng_relP    dstIP         dstIPASN  dstIPCC  dstIPWho                       dstIPLat_Lng_relP
A     29388    0x0100000000004300  N.U.D.E   	 3303      ch       "Bluewin"                      46.20222_6.14569_80  L.O.L.U       8437      at       "TOR,Hutchison Drei Austria "  16.37208_48.20849_1
B     29388    0x0100000000004301  L.O.L.U       8437      at       "TOR,Hutchison Drei Austria "  16.37208_48.20849_1  N.U.D.E       3303      ch       "Bluewin"                      46.20222_6.14569_80

geoip plugin

T2 supports the open source legacy GeoLite2 databases and the newer MaxMind GeoIP2 databases. Note that GeoIP does not provide any more updates for its legacy GeoLite2 DBs since January 2019.

Now move to the geoip plugin and look into it

$ geoip
$ ls
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  GeoLite2-City.mmdb.gz  GeoLiteCity.dat.gz  GeoLiteCityv6.dat.gz  Makefile.am  NEWS  README  scripts  src  t2plconf  tests
$

Note the GeoIP DB: GeoLiteCity.dat.gz and GeoLiteCityv6.dat.gz as well as the MaxMind DB: GeoLite2-City.mmdb.gz. If you move into the scripts folder you see two scripts:

genkml.sh map coordinates to google earth
updatedb.sh update DB

The first maps a flow file to a KML google earth file to produce an earth view with the location of the various IPs. The second updates the DBs. Run t2doc geoip for detailed information.

Now move to the src/ directory and look into the geoip.h file

$ cd src
$ ls
geoip.c  geoip.h  Makefile.am
$ vi geoip.h

Important is the selection of the type of DB. Since the 0.8.4 default is the MaxMind DB. As you can see the classification of source or destination IP can be separately enabled. Any output of country, city, language, etc, can also be enabled. For this tutorial we leave everything in default configuration as shown below.

...
// user defines
#define GEOIP_LEGACY     0 // Whether to use GeoLite2 (0) or the GeoLite legacy database (1)

#define GEOIP_SRC        1 // whether or not to display geo info for the source IP
#define GEOIP_DST        1 // whether or not to display geo info for the destination IP

#define GEOIP_CONTINENT  2 // 0: no continent, 1: name (GeoLite2), 2: two letters code
#define GEOIP_COUNTRY    2 // 0: no country, 1: name, 2: two letters code, 3: three letters code (Legacy)
#define GEOIP_CITY       1 // whether or not to display the city of the IP
#define GEOIP_POSTCODE   1 // whether or not to display the postal code of the IP
#define GEOIP_POSITION   1 // whether or not to display the position (latitude, longitude) of the IP
#define GEOIP_METRO_CODE 0 // whether or not to display the metro (dma) code of the IP (US only)

#if GEOIP_LEGACY == 0
#define GEOIP_ACCURACY   1    // whether or not to display the accuracy (GeoLite2)
#define GEOIP_TIMEZONE   1    // whether or not to display the time zone (GeoLite2)
#define GEOIP_LANG       "en" // Output language: en, de, fr, es, ja, pt-BR, ru, zh-CN, ...
#define GEOIP_BUFSIZE    64   // buffer size
#else // GEOIP_LEGACY == 1
#define GEOIP_REGION     1 // 0: no region,  1: name, 2: code
#define GEOIP_AREA_CODE  0 // whether or not to display the telephone area code of the IP
#define GEOIP_NETMASK    1 // 0: no netmask, 1: netmask as int (cidr),
                           // 2: netmask as hex (IPv4 only), 3: netmask as IP (IPv4 only)
#define GEOIP_DB_CACHE   2 // 0: read DB from file system (slower, least memory)
                           // 1: index cache (cache frequently used index only)
                           // 2: memory cache (faster, more memory)
#endif // GEOIP_LEGACY == 1

#define GEOIP_UNKNOWN    "--" // Representation of unknown locations (GeoIP's default)
...

So compile the plugin and rerun T2 on the said pcap.

$ t2build geoip
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

To compare with the basicFlow output, I aggregated the same columns as above:

tawk '{ print $srcIP, wildcard("^srcIp"), $dstIP, wildcard("^dstIp") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIpContinent  srcIpCountry  srcIpCity        srcIpPostcode  srcIpAccuracy  srcIpLat   srcIpLong    srcIpTimeZone          dstIP          dstIpContinent  dstIpCountry  dstIpCity  dstIpPostcode  dstIpAccuracy  dstIpLat   dstIpLong  dstIpTimeZone
198.189.255.75  NA              US            "Long Beach"     90802          5              33.763000  -118.177400  "America/Los_Angeles"  192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.105   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.104   --              --            "--"             --             0              0.000000   0.000000     ""                     77.67.44.206   EU              IE            "--"       --             200            53.347200  -6.243900  "Europe/Dublin"
192.168.1.103   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.102   --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.1    --              --            "--"       --             0              0.000000   0.000000   ""
192.168.1.1     --              --            "--"             --             0              0.000000   0.000000     ""                     192.168.1.103  --              --            "--"       --             0              0.000000   0.000000   ""
143.166.11.10   NA              US            "--"             --             1000           37.751000  -97.822000   "America/Chicago"      192.168.1.105  --              --            "--"       --             0              0.000000   0.000000   ""
77.67.44.206    EU              IE            "--"             --             200            53.347200  -6.243900    "Europe/Dublin"        192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""
63.245.221.11   NA              US            "Mountain View"  94041          50             37.389300  -122.078300  "America/Los_Angeles"  192.168.1.104  --              --            "--"       --             0              0.000000   0.000000   ""

Hex code labeling

As mentioned above T2 supports hex code labeling, which is a powerful flow selection mechanism, as integer AND operations are much faster than strings compares. Open basicFlow.h and set BFO_SUBNET_HEX to 1, rebuild all and rerun t2, as indicated below

$ t2conf basicFlow -D BFO_SUBNET_HEX=1
$ t2build -R
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

Now the strings are gone and replaced by 32 bit hex numbers. Now you can select all flows of a certain country and/or organization with a simple tawk script. Let’s select all src and dstIP columns to see how it looks like now:

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC     srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC     dstIPLat_Lng_relP
198.189.255.75  0         0x808045ea  666_666_80             192.168.1.104  0         0x0401743a  666_666_-1
192.168.1.105   0         0x0401743a  666_666_-1             192.168.1.1    0         0x0401743a  666_666_-1
192.168.1.104   0         0x0401743a  666_666_-1             77.67.44.206   3257      0x8080bcc6  42.3636_-71.08521_80
192.168.1.103   0         0x0401743a  666_666_-1             192.168.1.1    0         0x0401743a  666_666_-1
192.168.1.102   0         0x0401743a  666_666_-1             192.168.1.1    0         0x0401743a  666_666_-1
192.168.1.1     0         0x0401743a  666_666_-1             192.168.1.103  0         0x0401743a  666_666_-1
143.166.11.10   0         0x8080787f  30.51748_-97.67207_80  192.168.1.105  0         0x0401743a  666_666_-1
77.67.44.206    3257      0x8080bcc6  42.3636_-71.08521_80   192.168.1.104  0         0x0401743a  666_666_-1
63.245.221.11   395642    0x808131eb  38.6409_-121.5228_80   192.168.1.104  0         0x0401743a  666_666_-1

The 32 bit binary coding is shown below:

cccc cccc cTww wwww wwww wwww wwww wwww

c:	country code
T:	TOR Notification bit
w:	WHOIS code

The code to text resolution can be found under basicFlow/utils in

  • whoCntryCds.txt
  • whoOrgCds.txt

Let’s see all flows from any organisation coming from USA, from whoCntryCds.txt: 0x80800000

tawk 'and(strtonum($srcIPCC), 0xff800000) == 0x80800000 || hdr() { print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP           srcIPASN  srcIPCC     srcIPLat_Lng_relP      dstIP          dstIPASN  dstIPCC     dstIPLat_Lng_relP
198.189.255.75  0         0x808045ea  666_666_80             192.168.1.104  0         0x0401743a  666_666_-1
143.166.11.10   0         0x8080787f  30.51748_-97.67207_80  192.168.1.105  0         0x0401743a  666_666_-1
77.67.44.206    3257      0x8080bcc6  42.3636_-71.08521_80   192.168.1.104  0         0x0401743a  666_666_-1
63.245.221.11   395642    0x808131eb  38.6409_-121.5228_80   192.168.1.104  0         0x0401743a  666_666_-1

In srcIPCC or dstIPCC, the bit 0x00400000 indicates a TOR address or you can select TOR flows just with the flowStat bit 0x0100000000000000 as indicated below in traffic I generated on my computer.

tawk 'bitsanyset($flowStat, 0x0100000000000000) { print $dir, $flowInd, $flowStat, wildcard("^(src|dst)IP") }' ~/results/wurst_flows.txt | tcol

%dir  flowInd  flowStat            srcIP     srcIPASN  srcIPCC     srcIPLat_Lng_relP    dstIP     dstIPASN  dstIPCC     dstIPLat_Lng_relP
A     29388    0x0100000000004300  K.A.C.K   3303      0x1d003835  46.20222_6.14569_80  S.H.I.T   8437      0x0ec0cadf  16.37208_48.20849_1
B     29388    0x0100000000004301  S.H.I.T   8437      0x0ec0cadf  16.37208_48.20849_1  K.A.C.K   3303      0x1d003835  46.20222_6.14569_80

As mentioned above the dst/src IP code of the B/A flow 0x0ec0cadf has the TOR bit set, thus a Tor address and the whole flow is TOR labelled in flowStat. As homework try now to select all TOR flows in faf-exercise.pcap using srcIPCC. Are there any?

Internal WHOIS subnet your own

Which admin was not asking himself WHO, WHERE and WHY the fuck is somebody doing what he is doing, or how to find an in-house IP 10.23.4.5? Yeah, I did lot’s and got weary to lookup Excel sheets, logs or if I was lucky, DBs. Now you try to do that on 1000 addresses and hand over a report in no time.

As the private IPv4/6 address space is hopefully only listed inside your organization we need to build our own subnet file. Building one is fairly easy if IP to location and organization is available as a tab or csv file. So that you can expand the current subnet files or rewrite them, T2 is shipped with the .txt version and including scripts to convert them to the T2 compatible binary version. That is the reason, why the initial build of basicFlow takes a bit longer.

Let’s look now at the basicFlow directory after the plugin is compiled. The HL.txt files are intermittent files to the binary format HL.bin. The original is the decompressed subnets[46].txt file, which contains all information.

$ ls
aclocal.m4  autom4te.cache  config.h     config.status  COPYING  libtool   Makefile.am  README    subnets4_HLP.bin  subnets4.txt      subnets6_HLP.txt  subnets6.txt.bz2  tor
AUTHORS     build-aux       config.h.in  configure      doc      m4        Makefile.in  src       subnets4_HLP.txt  subnets4.txt.bz2  subnets6_HL.txt   t2plconf          utils
autogen.sh  ChangeLog       config.log   configure.ac   INSTALL  Makefile  NEWS         stamp-h1  subnets4_HL.txt   subnets6_HLP.bin  subnets6.txt      tests
$

Open subnets4.txt, the IPv6 is built in a similar fashion.

lsx subnets4.txt

#                                   4    01112019                                                                                                                
# IPCIDR                            Msk  IPrange                          CtryWhoCode  ASN    Uncert  Latitude    Longitude    Country  County      City         Org
# Begin IPv4 private address space                                                                                                                               
0.0.0.0/32                          32   0.0.0.0-0.0.0.0                  0x0081f466   0      -1.0    666.000000  666.000000   00       -           -            Unspecified
10.0.0.0/8                          8    10.0.0.0-10.255.255.255          0x0281743a   0      -1.0    666.000000  666.000000   04       -           -            Private network
127.0.0.0/8                         8    127.0.0.0-127.255.255.255        0x010114eb   0      -1.0    666.000000  666.000000   01       -           -            Loopback
100.64.0.0/10                       10   100.64.0.0-100.127.255.255       0x0601a495   0      -1.0    666.000000  666.000000   20       -           -            Shared address space
169.254.0.0/16                      16   169.254.0.0-169.254.255.255      0x01810f71   0      -1.0    666.000000  666.000000   02       -           -            Link-local
172.16.0.0/12                       12   172.16.0.0-172.31.255.255        0x0301743a   0      -1.0    666.000000  666.000000   05       -           -            Private network
192.0.0.0/24                        24   192.0.0.0-192.0.0.255            0x0381743a   0      -1.0    666.000000  666.000000   06       -           -            Private network
192.0.2.0/24                        24   192.0.2.0-192.0.2.255            0x0681d621   0      -1.0    666.000000  666.000000   21       -           -            TEST-NET-1
192.88.99.0/24                      24   192.88.99.0-192.88.99.255        0x0a00ea60   0      -1.0    666.000000  666.000000   60       -           -            IPv6 to IPv4 relay
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x0401743a   0      -1.0    666.000000  666.000000   07       -           -            Private network
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x0481743a   0      -1.0    666.000000  666.000000   08       -           -            Private network
198.51.100.0/16                     16   198.51.100.0-198.51.100.255      0x0701d622   0      -1.0    666.000000  666.000000   22       -           -            TEST-NET-2
203.0.113.0/24                      24   203.0.113.0-203.0.113.255        0x0781d623   0      -1.0    666.000000  666.000000   23       -           -            TEST-NET-3
224.0.0.0/4                         4    224.0.0.0-239.255.255.255        0x050133db   0      -1.0    666.000000  666.000000   10       -           -            Multicast
240.0.0.0/4                         4    240.0.0.0-255.255.255.254        0x08018937   0      -1.0    666.000000  666.000000   24       -           -            Reserved
255.255.255.255/32                  32   255.255.255.255-255.255.255.255  0x05803f25   0      -1.0    666.000000  666.000000   11       -           -            Broadcast
# End IPv4 private address space                                                                                                                                 
1.0.0.0/24                          24   1.0.0.0-1.0.0.255                0x80801aa6   13335  80.0    34.052230   -118.243680  us       California  Los Angeles  APNIC Research and Development
1.0.1.0/24                          24   1.0.1.0-1.0.1.255                0x248051b7   0      80.0    26.061390   119.306110   cn       Fujian      Fuzhou       CHINANET FUJIAN PROVINCE NETWORK
1.0.4.0/22                          22   1.0.4.0-1.0.7.255                0x14020f97   0      80.0    -37.814000  144.963320   au       Victoria    Melbourne    Wirefreebroadband Pty Ltd
1.0.8.0/21                          21   1.0.8.0-1.0.15.255               0x248051b9   0      80.0    23.116670   113.250000   cn       Guangdong   Guangzhou    CHINANET Guangdong province network
1.0.16.0/24                         24   1.0.16.0-1.0.16.255              0x4500cf68   0      80.0    35.689506   139.691700   jp       Tokyo       Tokyo        i2ts
1.0.32.0/19                         19   1.0.32.0-1.0.63.255              0x248051b9   0      80.0    23.116670   113.250000   cn       Guangdong   Guangzhou    CHINANET Guangdong province network
...

You can now write your own subnet file or modify the original one, so make a copy of the subnets4.txt to have an easy way to restore the default. Let’s define the 192.168. network a bit more precisely by adding two more lines describing the Knoedelrutschen company with one /24 and one /28 network:

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07    -    -     Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.0.0-192.168.1.255        0x010136e0   0      -1.0    666.000000  666.000000   eu   -	  -     Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.0.0-192.168.1.15         0x010136e0   0       0.05   48.856892   2.350850     fr    -    -     KRI, Managers, Eifeltower, over paid
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08    -    -     Private network
...

Because autogen.sh decompresses the subnets4.txt.bz2 and thus overwrites the subnet file we need first to bzip2 your subnets4.txt and then build basicFlow with the -f option. That is for beginners the easiest way to reconstruct the binary and ship it to the ~/.tranalyzer/plugins/ folder. Then rerun t2 with the pcap:

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
...
$

Now open the flow file and you will see your IP labeling.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			        srcIPLat_Lng_relP	dstIP		dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"	666_666_-1		192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1
192.168.1.105	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		77.67.44.206	3257		fr	"GTT Communications Inc."	42.3636_-71.08521_80	
192.168.1.103	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    eu	    "Knoedelrutschen Inc"		666_666_-1		192.168.1.1	0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05	192.168.1.103	0		eu	"Knoedelrutschen Inc"		666_666_-1
143.166.11.10	0	    us	    "Dell"			        30.51748_-97.67207_80	192.168.1.105	0		eu	"Knoedelrutschen Inc"		666_666_-1
77.67.44.206	3257	    fr	    "GTT Communications Inc."	        42.3636_-71.08521_80	192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1
63.245.221.11	395642	    us	    "Mozilla Corporation"		38.6409_-121.5228_80	192.168.1.104	0		eu	"Knoedelrutschen Inc"		666_666_-1

As the most important part of a company are the engineer department, let’s expand the network definition by one more /26 network

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07        Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.1.0-192.168.1.255        0x010136e0   0       1000.0 666.000000  666.000000   eu	 Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.1.0-192.168.1.15         0x010136e0   0       1.5    48.856892   2.350850     fr        KRI, Managers, Eifeltower, over paid
192.168.1.64/26                     26   192.168.1.64-192.168.1.127       0x010136e0   0       0.01   46.947990   7.459672     ch        Engineers, Bern, @bears
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08        Private network
...

Compress to bzip2, recompile and rerun t2.

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
$

Note, that the engineers are now properly labeled. If an address is located outside the managers and engineers network it would be labeled as Knoedelrutschen Inc.

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			    srcIPLat_Lng_relP	    dstIP	    dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"   666_666_-1		    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
192.168.1.105	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  77.67.44.206    3257	fr	"GTT Communications Inc."	42.3636_-71.08521_80
192.168.1.103	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    ch	    "Engineers, Bern, @bears"	    46.94799_7.459672_0.01  192.168.1.1	    0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "   48.85689_2.35085_0.05   192.168.1.103   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
143.166.11.10	0	    us	    "Dell"			    30.51748_-97.67207_80   192.168.1.105   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
77.67.44.206	3257	    fr	    "GTT Communications Inc."	    42.3636_-71.08521_80    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01
63.245.221.11	395642	    us	    "Mozilla Corporation"	    38.6409_-121.5228_80    192.168.1.104   0		ch	"Engineers, Bern, @bears"	46.94799_7.459672_0.01

As we are using the CIDR mode, let’s now test the range mode. So open utils.h and set SUBRNG to 1 or simply use the t2conf command below.

$ t2conf basicFlow -D SUBRNG=1
$

Now t2 selects the third column in the subnet file. Add a new /28 network as listed below. If you have a dash in the CIDR column and CIDR is configured, the entry is ignored, as the range is definitely not CIDR. You can have any values in the CIDR or range column, as non CIDR ranges would consist of several rows of CIDR. Here we have clearly a non CIDR network and we are in the RANGE mode anyway. We have now SW and HW engineers separated.

...
192.168.0.0/16                      16   192.168.0.0-192.168.255.255      0x010136e0   0      -1.0    666.000000  666.000000   07        Private network
# Begin Knoedelrutschen company internal network
192.168.1.0/24                      24   192.168.0.0-192.168.1.255        0x010136e0   0       1000.0 666.000000  666.000000   eu	 Knoedelrutschen Inc
# Begin Knoedelrutschen company internal sub networks
192.168.1.0/28                      28   192.168.1.0-192.168.1.15         0x010136e0   0       1.5    48.856892   2.350850     fr        KRI, Managers, Eifeltower, over paid
192.168.1.0/28                      26   192.168.1.64-192.168.1.103       0x010136e0   0       0.01   46.947990   7.459672     ch        HW-Engineers, Bern, @bears
-                                   26   192.168.1..4-192.168.1.108       0x010136e0   0       0.01   46.947990   7.459672     ch        SW-Engineers, Bern, @bears
# End Knoedelrutschen company internal sub networks
198.18.0.0/15                       15   198.18.0.0-198.119.255.255       0x010136e0   0      -1.0    666.000000  666.000000   08        Private network
...

So again bzip2, rebuild and rerun t2:

$ bzip2 -cf subnets4.txt > subnets4.txt.bz2
$ t2build -f basicFlow
...
$ t2 -r ~/data/faf-exercise.pcap -w ~/results/
$

If you look into the flow file, you will now discover that there are also SW-Engineers

tawk '{ print wildcard("^(src|dst)IP") }' ~/results/faf-exercise_flows.txt | sort -Vru -k1,1 | tcol

srcIP	        srcIPASN    srcIPCC srcIPWho			    srcIPLat_Lng_relP	    dstIP	    dstIPASN	dstIPCC	dstIPWho			dstIPLat_Lng_relP
198.189.255.75	0	    us	    "California State University"   666_666_-1		    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
192.168.1.105	0	    ch	    "SW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.104	0	    ch	    "SW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  77.67.44.206    3257	fr	"GTT Communications Inc."	42.3636_-71.08521_80
192.168.1.103	0	    ch	    "HW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.102	0	    ch	    "HW-Engineers, Bern, @bears"    46.94799_7.459672_0.01  192.168.1.1     0		fr	"KRI, Managers, Eifeltower, "	48.85689_2.35085_0.05
192.168.1.1	0	    fr	    "KRI, Managers, Eifeltower, "   48.85689_2.35085_0.05   192.168.1.103   0		ch	"HW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
143.166.11.10	0	    us	    "Dell"			    30.51748_-97.67207_80   192.168.1.105   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
77.67.44.206	3257	    fr	    "GTT Communications Inc."	    42.3636_-71.08521_80    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01
63.245.221.11	395642	    us	    "Mozilla Corporation"	    38.6409_-121.5228_80    192.168.1.104   0		ch	"SW-Engineers, Bern, @bears"	46.94799_7.459672_0.01

t2whois

Suppose you want to write your own subnet file or just test a few IP’s without using whois or any other DB, t2whois allows you to query the anteater DB. It is compiled along side with basicFlow. Try the following commands to get acquainted with t2whois.

t2whois -h

Usage:
    t2whois [OPTION...] [INPUT...]

Input
    -               If no input is provided, read from stdin
    ip              Read IP address(es) directly from the command line
    -r file         Read IP address(es) from 'file'

Optional arguments:
    -d file         Binary subnet file to use for IPv4
    -e file         Binary subnet file to use for IPv6

    -o field(s)     Field(s) to output (in order). Many fields can be selected
                    by using multiple '-o' options or by separating the fields
                    with a comma, e.g., -o field1,field2. Valid field names are
                    ip, netmask, net, mask, range, who, country, county, city,
                    asn, lat, lng, prec, id

    -q              Do not display an interactive prompt when reading from stdin

    -k file         Generate a KML 'file'

    -l              Output one line per IP
    -H              Do not output the header with -l option
    -t char         Start character(s) for column header (-l option) ["%"]

    -s char         Column separator for output ["\t"]

    -L              Describe the available fields and exit

    -V              Show info about the database (version, ...) and exit

    -h              Show help options and exit

t2whois 77.67.44.206 63.245.221.11

IP        	77.67.44.206
Network/Mask	77.67.0.0/17
Range     	77.67.0.0 - 77.67.127.255
Who       	GTT Communications Inc.
Country   	us
County    	Massachuse
City      	Cambridge
ASN       	3257
Latitude  	42.363598
Longitude 	-71.085205
Precision 	80.000000
NetID     	0x8080bcc6


IP        	63.245.221.11
Network/Mask	63.245.208.0/20
Range     	63.245.208.0 - 63.245.223.255
Who       	Mozilla Corporation
Country   	us
County    	California
City      	Sacramento
ASN       	395642
Latitude  	38.640900
Longitude 	-121.522797
Precision 	80.000000
NetID     	0x808131eb

t2whois -l 77.67.44.206 63.245.221.11

%IP            Network/Mask     Range                          Who                      Country  County      City        ASN     Latitude   Longitude    Precision  NetID
77.67.44.206   77.67.0.0/17     77.67.0.0 - 77.67.127.255      GTT Communications Inc.  us       Massachuse  Cambridge   3257    42.363598  -71.085205   80.000000  0x8080bcc6
63.245.221.11  63.245.208.0/20  63.245.208.0 - 63.245.223.255  Mozilla Corporation      us       California  Sacramento  395642  38.640900  -121.522797  80.000000  0x808131eb

If you want the interactive mode:

t2whois

[INF] Enter an IPv4/6 address, 'header', 'help' or 'quit' to exit

>>> help
The following commands are available:
    ip         get information about the IPv4/6 address 'ip'
    header     display the header when '-l' option was used
    help       show this help
    quit       exit the program
>>> 88.67.56.56

P        	88.67.56.56
Network/Mask	88.67.0.0/18
Range     	88.67.0.0 - 88.67.63.255
Who       	ARCOR AG
Country   	de
County    	Hessen
City      	Eschborn
ASN       	3209
Latitude  	50.143280
Longitude 	8.571110
Precision 	80.000000
NetID     	0x28801c7f
>>>

And if you want to lookup all public hosts in your flow file:

tawk '!privip($srcIP) { print $srcIP } !privip($dstIP) { print $dstIP }' ~/results/faf-exercise_flows.txt | sort -u | t2whois -l

%IP             Network/Mask     Range                          Who                          Country  County      City        ASN     Latitude    Longitude    Precision  NetID
143.166.11.10   143.166.0.0/16   143.166.0.0 - 143.166.255.255  Dell                         us       Texas       Round Rock  0       30.517477   -97.672066   80.000000  0x8080787f
198.189.255.75  198.189.0.0/16   198.189.0.0 - 198.189.255.255  California State University  us       ca          long beach  0       666.000000  666.000000   80.000000  0x808045ea
63.245.221.11   63.245.208.0/20  63.245.208.0 - 63.245.223.255  Mozilla Corporation          us       California  Sacramento  395642  38.640900   -121.522797  80.000000  0x808131eb
77.67.44.206    77.67.0.0/17     77.67.0.0 - 77.67.127.255      GTT Communications Inc.      us       Massachuse  Cambridge   3257    42.363598   -71.085205   80.000000  0x8080bcc6

If you like to select only certain columns:

t2whois -L

The fields available are:
	ip         	IP
	netmask    	Network/Mask
	net        	Network
	mask       	Mask
	range      	Range
	who        	Who
	loc        	Location
	asn        	ASN
	lat        	Latitude
	lng        	Longitude
	prec       	Precision
	id         	NetID

tawk -H '{ print $srcIP }' ~/results/faf-exercise_flows.txt | sort -u | t2whois -l -o ip,netmask,asn,country,who

%IP		Network/Mask	ASN	Country	Who
143.166.11.10	143.166.0.0/16	0	us	Dell
192.168.1.1	192.168.0.0/16	0	07	Private network
192.168.1.102	192.168.0.0/16	0	07	Private network
192.168.1.103	192.168.0.0/16	0	07	Private network
192.168.1.104	192.168.0.0/16	0	07	Private network
192.168.1.105	192.168.0.0/16	0	07	Private network
198.189.255.75	198.189.0.0/16	0	us	California State University
63.245.221.11	63.245.208.0/20	395642	us	Mozilla Corporation
77.67.44.206	77.67.0.0/17	3257	fr	GTT Communications Inc.

Let us finish this section with an example of t2whois -k option which can be used to generate a KML file.

$ tawk -H '{ print shost(); print dhost() }' ~/results/faf-exercise_flows.txt | t2whois -k ~/results/faf-exercise.kml
$ ls ~/results | grep -F .kml
faf-exercise.kml
$

The faf-exercise.kml file contains information about each IP (as specified with t2whois -o option) and its location (latitude, longitude). This KML (Keyhole Markup Language) file can then be loaded in, e.g., Google Maps or Google Earth, and will display each IP at its exact location.

t2whois can generate KML files to display addresses location…
… and additional information is also readily available

You can also load your own subnet file using the -e or -d options. Try t2doc scripts for more documentation.

Right! This is all for now. And don’t forget to reset the configuration of basicFlow for the next tutorials:

$ t2conf basicFlow -D BFO_SUBNET_ASN=0 -D BFO_SUBNET_LL=0 -D BFO_SUBNET_HEX=0 -D SUBRNG=0 -D CNTYCTY=0
$ t2build -R -f
...
$

Have fun!