Tutorial: Plugin geo labeling

Since version 0.8.8, all subnet processing is provided by the core. Meaning that every plugin can now profit from its services and no dependencies, e.g. to basicFlow are necessary anymore. Thus, it became easier to write your own geo plugins. This is what we will do now in this tutorial. Moreover an introduction to the IPv4/6 flow processing is integrated in the tutorial.

But first of all, unnecessary plugins should be deleted from the plugin folder ~/.tranalyzer/plugins and compile only tranalyzer2, basicFlow and txtSink.

$ t2build -e
Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? y
Plugin folder emptied
$ t2build tranalyzer2 basicFlow txtSink
...

If you didn’t read the tutorials before, here is the basis plugin which we will extend: tcpGeoWin and unpack it and move it to your ~/tranalyzer2/plugins directory.

The anonymized sample pcap can be downloaded here: annoloc2.pcap. Please extract it under your data folder: ~/data, if you haven’t already created. Now you are all set for subnet programming.

The core subnet processing

As subnet processing is now part of the core, the flow and packet structure provides an index to the global IPv4/6 subnet table.

$ tranalyzer2
$ vi src/flow.h

So in your program you access the subnet index for the src and dest address via the flow structure:

  • flowP->subnetNrSrc
  • flowP->subnetNrDst

The same is accessible in the packet structure, see <----

In your program, e.g. at the claimLayer4Information() callback, you access the index via the packet structure:

  • packet->subnetNrSrc
  • packet->subnetNrDst

That’s it. So how do I convert this ominous index to viable subnet information? Easy, use the following macros:

Your code must react to the following core switches, in order to avoid compiler errors or unexpected results if you reconfigure tranalyzer core.

Enabling switches reside in tranalyzer.h.

IPV6_ACTIVATE Controls the focus on IPv4/6 or both

SUBNET_ON Enables the subnet functions in the core

AGGREGATIONFLAG Controls aggregation modes of flows. The code SUBNET=0x80 enables also subnet functions like SUBNET_ON.

SUBNET_INIT Switch for plugin programmers enabling subnet functions. A logical OR of SUBNET_ON and SUBNET.

We leave everything at default.

An important part for a programmer are the macros T2 supplies to facilitate subnet tests and information extraction. They also shield the user from the global table pointers defined in the core main.c:

The definition of the IPv4/6 structs is located at the end of the subnetHL.h in the utils folder. Note that CNTYCTY is switched off by default, saving space in the binary subnet file.

The said macros are defined in the middle part of subnetHL.h. Nevertheless, you may use the global pointers to access any info using the supplied index. Your choice.

That is all you need to know for now. Let’s implement geo stuff now.

Implementing IPv4/6 switches in onFlowTerminate()

Open tcpGeoWin.c and scroll to onFlowTerminate(). The other callbacks before are the same as in previous tutorials.

We want to detect all packets where the window size is below TCPWIN_THRES and store it in the gwz structure if the count for an IP address is higher than the previous one.

As indicated by <----- the flowP->subnetNrSrc contains the subnet index for the specific IP address provided by the core. That is all you need to access. Easy he?

We store it in the gwz.sID[i] for later processing. The macro FLOW_IS_IPV4 tests the flow status for the IPv4/6 bit. We need it in order to store the IPv4/6 address appropriately. The switch IPV6_ACTIVATE > 0 covers the different modes of tranalyzer IPv4 only, IPv6 only, both. Same for the storage of the IP’s below.

The switch SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0 activates the code only if core processes subnets and no aggregation mode is on, as we only want the standard case here, where we have only one IP per flow. Yes I could produce a simpler switch for that, will do later.

Note the T2_CMP_FLOW_IP() macro which simplifies IPv4/6 comparison.

$ tcpGeoWin
$ vi src/tcpGeoWin.c
...
void onFlowTerminate(unsigned long flowIndex) {
    const flow_t * const flowP = &flows[flowIndex]; // <--
    tcpWinFlow_t * const tcpWinFlowP = &tcpWinFlows[flowIndex];
    float f = 0;

    if (tcpWinFlowP->pktTcpCnt) {
        f = (float)tcpWinFlowP->winThCnt/(float)tcpWinFlowP->pktTcpCnt; // produce a useful relative number
        pktTcpCnt += tcpWinFlowP->pktTcpCnt;
    }

    if (tcpWinFlowP->winThCnt && tcpWinFlowP->pktTcpCnt >= TCPWIN_MINPKTS) {
        const int wzi = gwz.wzi;

#if SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0                   // compile only if SUBNET core is on and no aggregation mode
        if (wzi < TCPWIN_MAXWSCNT) { // If array full, stop saving
            int i;
            const uint_fast8_t ipver = FLOW_IPVER(flowP);
            for (i = 0; i < wzi; i++) {
                if (T2_CMP_FLOW_IP(gwz.wzip[i].addr, flowP->srcIP, ipver)) break; // compare whether IP exists
            }

            if (tcpWinFlowP->winThCnt > gwz.wzCnt[i]) {                 // only update if count is greater than the previous one
                gwz.tcpCnt[i] = tcpWinFlowP->pktTcpCnt;                 // save tcp packet count
                gwz.wzCnt[i] = tcpWinFlowP->winThCnt;                   // save relative count
                if (i == wzi) {
                    gwz.wzip[i].ver = ipver;                            // save IP ver
#if IPV6_ACTIVATE > 0
                    gwz.wzip[i].addr = flowP->srcIP;                    // save IPv4/6
#else // IPV6_ACTIVATE == 0
                    gwz.wzip[i].addr.IPv4 = flowP->srcIP.IPv4;          // save IPv4
#endif // IPV6_ACTIVATE
                    gwz.sID[i] = flowP->subnetNrSrc;                    // save subnetID from core
                    gwz.wzi++;                                          // increment global window size counter
                }
            }
        }
#endif // SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0             // compile only if SUBNET core is on and no aggregation mode
    }

    if (tcpWinFlowP->stat) { // update the global vars
        winStatG |= tcpWinFlowP->stat;
        winThCntG += tcpWinFlowP->winThCnt;
    }

#if BLOCK_BUF == 0
    OUTBUF_APPEND_U8(main_output_buffer, tcpWinFlowP->stat);
    OUTBUF_APPEND_U32(main_output_buffer, tcpWinFlowP->winThCnt);
    OUTBUF_APPEND_U32(main_output_buffer, tcpWinFlowP->tcpWinInit);
    OUTBUF_APPEND_FLT(main_output_buffer, f);
#endif // BLOCK_BUF == 0

}

One flow variable access and you have all subnet info you need.

The configuration is located in tcpGeoWin.h. Most of it we already know from earlier tutorials. New is the subnet ID sID in the gwz_t struct. Look for the <-----

Adding of subnet Info to the summary file

After the free call for that tcpWinFlows struct we open the file TCPWIN_FNSUP. As above we select the output of subnet info only if subnet activated and standard flow aggregation. The macro T2_IP_TO_STR converts IPv4/6 addresses to human readable strings. SUBNET_LOC and SUBNET_WHO select the country code and the organisation given the stored subnet index. That is all you need, and you are done. Look for <-----

void onApplicationTerminate() {

    free(tcpWinFlows); // free the tcpWin Flows

    // open TCPWIN statistics file
    FILE *fp;
    int i, ipver;
    char srcIP[INET6_ADDRSTRLEN];

    fp = t2_open_file(baseFileName, TCPWIN_FNSUP, "w");
    if (UNLIKELY(!fp)) { // if file cannot be opened print warning and return;
        T2_PWRN("tcpWin", "Failed to allocate memory for: %s", TCPWIN_FNSUP);
        return;
    }

#if SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0             // compile only if SUBNET core is on and no aggregation mode
    fprintf(fp, "# IP\tCntry\tOrg\twinTcpCnt\twinRelThCnt\n"); // print header
#else // many IP's / flow
    fprintf(fp, "# IP\tpktTcpCnt\twinRelThCnt\n"); // print header
#endif // SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0       // compile only if SUBNET core is on and no aggregation mode
    for (i = 0; i < gwz.wzi; i++) {
        ipver = gwz.wzip[i].ver;
        T2_IP_TO_STR(gwz.wzip[i].addr, ipver, srcIP, INET6_ADDRSTRLEN);    // transfer IP to string
#if SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0             // compile only if SUBNET core is on and no aggregation mode
        char *loc, *org;
        SUBNET_LOC(loc, ipver, gwz.sID[i]);         // <---- get country for IP
        SUBNET_ORG(org, ipver, gwz.sID[i]);         // <---- get organization for IP
        fprintf(fp, "%s\t%s\t%s\t%"PRIu32"\t%f\n", srcIP, loc, org, gwz.wzCnt[i], (float)gwz.wzCnt[i]/gwz.tcpCnt[i]); // print in file
#else // many IP's / flow
        fprintf(fp, "%s\t%"PRIu32"\t%f\n", srcIP, gwz.wzCnt[i], (float)gwz.wzCnt[i]/gwz.tcpCnt[i]); // print in file
#endif // SUBNET_ON != 0 && (AGGREGATIONFLAG & SUBNET) == 0
    }

    fclose(fp);

Two macros and you are subnet ready.

Adding of subnet calls to pluginReport(FILE *stream) callback

Adding subnet info in the end report is straight forward, using the same switches as above.

So you are all set. Compile and run t2:

$ t2build tcpGeoWin
...
$ t2 -r ~/data/annoloc2.pcap -w ~/results
================================================================================
Tranalyzer 0.8.8 (Anteater), Tarantula. PID: 25078
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.8
    02: tcpGeoWin, 0.8.8
    03: txtSink, 0.8.8
[INF] IPv4 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 389669 (389.67 K)
[INF] IPv6 Ver: 5, Rev: 01022020, Range Mode: 0, subnet ranges loaded: 104862 (104.86 K)
Processing file: /home/wurst/data/annoloc2.pcap
Link layer type: Ethernet [EN10MB/1]
Dump start: 1022171701.691172 sec (Thu 23 May 2002 16:35:01 GMT)
[WRN] snapL2Length: 54 - snapL3Length: 40 - IP length in header: 1500
Dump stop : 1022171726.640398 sec (Thu 23 May 2002 16:35:26 GMT)
Total dump duration: 24.949226 sec
Finished processing. Elapsed time: 0.438014 sec
Finished unloading flow memory. Time: 0.580417 sec
Percentage completed: 100.00%
Number of processed packets: 1219015 (1.22 M)
Number of processed bytes: 64082726 (64.08 M)
Number of raw bytes: 844642686 (844.64 M)
Number of pcap bytes: 83586990 (83.59 M)
Number of IPv4 packets: 1218608 (1.22 M) [99.97%]
Number of IPv6 packets: 160 [0.01%]
Number of A packets: 564227 (564.23 K) [46.29%]
Number of B packets: 654788 (654.79 K) [53.71%]
Number of A bytes: 29447862 (29.45 M) [45.95%]
Number of B bytes: 34634864 (34.63 M) [54.05%]
Average A packet load: 52.19
Average B packet load: 52.89
--------------------------------------------------------------------------------
tcpGeoWin: IP: 216.237.125.166, country: us, Org: Infortech Corporation
tcpGeoWin: Aggregated status flags: 0x01
tcpGeoWin: Number of tcp winsize packets below threshold 1: 2415 [0.25%]
--------------------------------------------------------------------------------
Headers count: min: 2, max: 4, average: 3.01
Number of GRE packets: 247 [0.02%]
Number of IGMP packets: 12 [0.00%]
Number of ICMP packets: 3059 (3.06 K) [0.25%]
Number of ICMPv6 packets: 11 [0.00%]
Number of TCP packets: 948743 (948.74 K) [77.83%]
Number of TCP bytes: 52643546 (52.64 M) [82.15%]
Number of UDP packets: 266900 (266.90 K) [21.89%]
Number of UDP bytes: 11234272 (11.23 M) [17.53%]
Number of IPv4 fragmented packets: 2284 (2.28 K) [0.19%]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of processed   flows: 17086 (17.09 K)
Number of processed A flows: 9704 (9.70 K) [56.80%]
Number of processed B flows: 7382 (7.38 K) [43.20%]
Number of request     flows: 9661 (9.66 K) [56.54%]
Number of reply       flows: 7425 (7.42 K) [43.46%]
Total   A/B    flow asymmetry: 0.14
Total req/rply flow asymmetry: 0.13
Number of processed   packets/flows: 71.35
Number of processed A packets/flows: 58.14
Number of processed B packets/flows: 88.70
Number of processed total packets/s: 48859.83 (48.86 K)
Number of processed A+B packets/s: 48859.83 (48.86 K)
Number of processed A   packets/s: 22615.01 (22.61 K)
Number of processed   B packets/s: 26244.82 (26.24 K)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of average processed flows/s: 684.83
Average full raw bandwidth: 270835712 b/s (270.84 Mb/s)
Average snapped bandwidth : 20548206 b/s (20.55 Mb/s)
Average full bandwidth : 270268576 b/s (270.27 Mb/s)
Max number of flows in memory: 17086 (17.09 K) [6.52%]
Memory usage: 0.10 GB [0.15%]
Aggregate flow status: 0x0c0098fa0202d044
[WRN] L3 SnapLength < Length in IP header
[WRN] L4 header snapped
[WRN] Consecutive duplicate IP ID
[WRN] IPv4/6 payload length > framing length
[WRN] IPv4/6 fragmentation header packet missing
[WRN] IPv4/6 packet fragmentation sequence not finished
[INF] Ethernet flows
[INF] IPv4 flows
[INF] IPv6 flows
[INF] ARP
[INF] IPv4/6 fragmentation
[INF] IPv4/6 in IPv4/6
[INF] GRE encapsulation
[INF] SSDP/UPnP
$

If the file is too big you may sort for the 4th column, the TCP window threshold count (winTcpCnt).

$ tawk -s '#' 't2sort(winTcpCnt)' annoloc2_tcpwin.txt | tcol
216.237.125.166  us     Infortech Corporation           210        0.489510
36.152.156.46    cn     China Mobile Communications Co  76         0.962025
138.212.187.203  jp     ASAHI KASEI CORPORATION         76         1.000000
201.98.31.61     mx     Uninet S.A. de C.V.             64         0.164948
200.44.192.225   ve     CANTV Servicios                 62         0.196825
138.212.186.191  jp     ASAHI KASEI CORPORATION         62         0.247012
138.212.185.150  jp     ASAHI KASEI CORPORATION         48         0.246154
138.212.186.160  jp     ASAHI KASEI CORPORATION         47         0.229268
193.87.5.62      sk     Zdruzenie pouzivatelov Slovens  33         0.140426
138.212.186.52   jp     ASAHI KASEI CORPORATION         33         0.203704
193.86.108.236   cz     T-Mobile Czech Republic a.s.    30         0.035419
201.9.136.60     br     Telemar Norte Leste S.A.        28         0.142132
201.9.140.14     br     Telemar Norte Leste S.A.        24         0.123711
138.212.186.60   jp     ASAHI KASEI CORPORATION         21         0.150000
216.32.165.228   us     CenturyLink Communications      20         0.526316
133.26.84.187    jp     Meiji University                20         0.009620
216.138.126.57   us     Airband Communications          18         0.529412
216.56.159.22    us     WiscNet                         15         0.157895
216.217.165.245  us     Windstream Communications LLC   14         0.269231
201.123.124.98   mx     Gestión de direccionamiento    14         0.041176
215.64.214.183   us     Network DoD                     11         0.478261
210.87.23.0      au     Hotline Support Pty Ltd         11         0.289474
216.91.166.92    us     CenturyLink Communications      10         0.357143
213.53.140.197   nl     Verizon Nederland B.V.          10         0.454545
193.87.97.162    sk     Zdruzenie pouzivatelov Slovens  10         0.011862
19.112.1.129     us     Ford Motor Company              10         0.102041
138.212.191.84   jp     ASAHI KASEI CORPORATION         10         0.312500
...

If you like to access the county and city information, you need to switch on CNTYCTY in subnetHL.h and recompile with t2build -R -f.

Exercise: Print County and City info in the end report.

Query the IP tables

Assume that a protocol response contains an IP, such as DNS. From that IP you want to know whois behind it, then you need to query the IP tables yourself.

In order to do so you have to call the core functions subnet_testHL4 and subnet_testHL6 for IPv4 and 6 addresses shown below. The return value is the subnetID you can use to fetch the geo and whois data as already indicated above.

subnetTable[46]P may not exist depending on the configuration of T2, so to make things simpler, just use the following macros instead:

SUBNET_TEST_IP4(subnetID, ip4);
SUBNET_TEST_IP4(subnetID, ip);

SUBNET_TEST_IP6(subnetID, ip);

SUBNET_TEST_IP(subnetID, ip, 4); // For IPv4
SUBNET_TEST_IP(subnetID, ip, 6); // For IPv6

It is beneficial to store IPv4 and 6 addresses in one structure, if in dual mode. On the other hand if only IPv4 addresses are processed the ip4Addr_t structure does not wast 48 bits. I advise to use the t2 structures as the MACROS rely on them.

The relevant address structures are defined in networkHeaders.h, as shown in the extract below.

Instead of using the subnet indexes from the packet structure you can now define your own and use the SUBNET_TEST_IP[46]() macros or the subnet_testHL[46]() functions. Try it on tcpGeoWin, it should produce the same output.

Have fun experimenting with subnet info!

The next tutorial will teach you all about plugin dependencies

See Also