Tutorial: Post-processing with TAWK

Contents

This tutorial presents tawk functionality through various scenarios. tawk works just like awk, but provides access to the columns via their names. In addition, it provides access to helper functions, such as host() or port(). For an overview, refer to the Alphabetical List of TAWK Functions. Custom functions can be added in the folder named t2custom where they will be automatically loaded.

Prerequisites

This tutorial assumes a working knowledge of awk.

Dependencies

gawk version 4.1 is required.

Kali/Ubuntu sudo apt-get install gawk
Arch sudo pacman -S gawk
Fedora/Red Hat sudo yum install gawk
Gentoo sudo emerge gawk
openSUSE sudo zypper install gawk
macOS brew install gawk (Homebrew package manager)

Installation

The recommended way to install tawk is to install t2_aliases as documented in README.md:

Documentation (Man Pages)

The man pages for tawk and t2nfdump (more on that later) can be installed by running: ./install.sh man. Once installed, they can be consulted by running man tawk and man t2nfdump respectively.

General Introduction

Command line options

First, run tawk -h to list the available command line options:

Usage:
    tawk [OPTION...] 'program' file_flows.txt
    tawk [OPTION...] -I file_flows.txt 'program'

Input arguments:
    -I file             Alternative way to specify the input file

Optional arguments:
    -N num              Row number where column names are to be found
    -s char             First character for the row listing the columns name
    -F fs               Use 'fs' as input field separator
    -O fs               Use 'fs' as output field separator
    --csv               Set input and output separators to ',' and
                        extract names from first row
    --zeek              Configure tawk to work with Bro/Zeek log files
    -n                  Load nfdump functions
    -e                  Load examples functions
    -X xerfile          Specify the '.xer' file to use with -k and -x options
    -x outfile          Run the fextractor on the extracted data
    -P                  Extract specific packets instead of whole flows
    -k                  Run Wireshark on the extracted data
    -t                  Do not validate column names
    -r                  Try renaming invalid columns (suffix them with '_')
    -H                  Do not output the header (column names)
    -c[=u]              Output command line as a comment
                        (use -c=u for UTC instead of localtime)

Help and documentation arguments:
    -l[=n], --list[=n]  List column names and numbers
    -g[=n], --func[=n]  List available functions

    -d fname            Display function 'fname' documentation
    -V vname[=value]    Display variable 'vname' documentation

    -L                  Decode all variables from Tranalyzer log file

    -D                  Display tawk PDF documentation

    -?, -h, --help      Show help options and exit

-s and -N Options

The -s option can be used to specify the starting character(s) of the row containing the column names (default: %). If several rows start with the specified character(s), then the last one is used as column names. To change this behaviour, the line number can be specified as well with the help of the -N option. For example, if rows 1 to 5 start with # and row 3 contains the column names, specify the separator as follows: tawk -s "#" -N 3. If the row with column names does not start with a special character, use -s ""}.

What features (columns) are available?

What functions are available?

Alternatively, refer to the Alphabetical List of TAWK Functions.

How to use a specific function?

How to interpret a specific column?

How to decode all aggregated fields in Tranalyzer log file?

Ignore all flows between private IPs

Replace the protocol number by its string representation, e.g., 6 -> TCP

Replace the Unix timestamp used for timeFirst and timeLast by their value in UTC

Replace the Unix timestamp used for timeFirst and timeLast by their values in localtime

Inspect the flow number 1234 in the flow file

Follow a specific flow, e.g., the flow with flow index 1234, in the packet file

Inspect the packet number 1234 in the packet file

Follow a flow (similar to Wireshark follow TCP/UDP stream):

$ tawk 'follow_stream(1)' file_packets.txt

Recreate a binary file transferred in a B flow:

$ tawk 'follow_stream(1, 3, "B")' file_packets.txt | xxd -p -r > out.data

Extract all flows whose HTTP Host: header matches google using Wireshark field names

Extract the DNS query field from all flows where at least one DNS answer was seen (using Wireshark field names)

Open all ICMP flows involving the network 1.2.3.4/24 in Wireshark

Create a PCAP files with all TCP flows with port 80 or 8080

Writing a tawk Function

  • Ideally one function per file (where the filename is the name of the function)
  • Private functions are prefixed with an underscore
  • Always declare local variables 8 spaces after the function arguments
  • Local variables are prefixed with an underscore
  • Use uppercase letters and two leading and two trailing underscores for global variables
  • Include all referenced functions
  • Files should be structured as follows:
  • Copy your files in the t2custom folder.
  • To have your functions automatically loaded, include them in the file t2custom/t2custom.load.

Using tawk Within Scripts

To use tawk from within a script:

  • Create a TAWK variable pointing to the script: TAWK="$T2HOME/scripts/tawk/tawk" (make sure to replace $T2HOME with the actual path to the scripts folder)
  • Call tawk as follows: $TAWK 'dport(80)' file.txt

Using tawk With Non-Tranalyzer files

tawk can also be used with files which were not produced by Tranalyzer.

  • The input field separator can be specified with the -F option, e.g., tawk -F ',' 'program' file.csv
  • The row listing the column names, can start with any character specified with the -s option, e.g., tawk -s '#' 'program' file.txt
  • All the column names must not be equal to a function name (tawk will rename them with a trailing underscore if -t option is NOT being used)
  • Valid column names must start with a letter (a-z, A-Z) and can be followed by any number of alphanumeric characters or underscores
  • If no column names are present, use the -t option to prevent tawk from trying to validate the column names.
  • If the column names are different from those used by Tranalyzer, refer to the next section.

Mapping External Column Names to Tranalyzer Column Names

If the column names are different from those used by Tranalyzer, a mapping between the different names can be made in the file scripts/tawk/my_vars. The format of the file is as follows:

Once edited, run tawk with the -i $T2HOME/scripts/tawk/my_vars option and the external column names will be automatically used by tawk functions, such as tuple2(). For more details, refer to the my_vars file itself.

Using tawk with Bro/Zeek Files

To use tawk with Bro/Zeek log files, use one of --bro or --zeek option:

Examples

For more examples, refer to tawk -d option, e.g., tawk -d aggr, where every function is documented and comes with a set of examples. For more complex examples, have a look at the scripts/t2fm/tawk/ folder. The complete documentation can be consulted by running tawk -d all.

See Also