{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This tutorial explains how to use Tranalyzer to extract the *bytes-per-burst* (BPB) feature from TLS encrypted YouTube video streams and recognize what video title is contained in a new test sample, or detect that it is a new video. This is an implementation of recent work by Dubin et al. [1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The bytes-per-burst feature\n", "\n", "A flow is viewed as a signal of packets over time. This signal is then transformed in a series of *bursts*. A burst here is defined as the set of packets that were recorded within a certain time window of each other. (Note that this is not a regular binning of the time dimension as a burst can be arbitrarily large in the time dimension as long as the next packet arrives within that window.) Each burst corresponds to the sum of bytes contained in all the packets aggregated into that burst. The total number of bytes in each burst for a given flow is then used to characterize this flow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting bursts from a flow\n", "\n", "Given the following series of packets:\n", "\n", "![Packet length](files/img/bpb/packets.png)\n", "\n", "Using a time window of 50ms in the `nFrstPkts` plugin, the following bursts are extracted:\n", "\n", "(Note the logarithmic scale on the y-axis and the changing y-limits between the two plots.)\n", "\n", "![Extracted bursts](files/img/bpb/bursts.png)\n", "\n", "Plots like this can also be generated for any given flow file using the `fpsGplt` and `t2plot` scripts. More information on this can be found in the documentation or in the traffic mining tutorial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Identifying YouTube flows\n", "\n", "In order to identify YouTube flows in a larger PCAP traffic dump, the *Server Name Indication (SNI)* TLS extension is used. The *sslDecode* plugin for Tranalyzer makes the server name available in the `sslServerName` column in the flow file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "* Tranalyzer version 0.8.1lmw4 or higher,\n", "* A folder containing your training data:\n", " * Dubin et al [1] provide their data at: http://www.cse.bgu.ac.il/title_fingerprinting/dataset_chrome_100\n", " * The PCAP files used for training are expected to be at the following location: `DATA_PATH/{Class1,Class2,...,ClassN}/Train/*.pcap`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use a few modules:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#!/usr/bin/env python3\n", "\n", "import argparse\n", "import matplotlib.pyplot as plt\n", "import tempfile\n", "import subprocess\n", "import threading\n", "import os\n", "from os.path import dirname, basename, isfile, splitext, normpath, expanduser\n", "import sys\n", "import glob\n", "import pickle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuration\n", "\n", "We need the path to the Tranalyzer directory, and to the PCAP files that we are going to be using to create our model." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Configuration: Adapt to your system.\n", "T2_ROOT = '{}/code/tranalyzer2-0.9.0'.format(expanduser('~'))\n", "DATA_PATH = '/mnt/{}/BPB/data_short'.format(os.environ['USER'])\n", "\n", "VERBOSE = False" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Automatically derived\n", "T2_PATH = '{}/tranalyzer2/src/tranalyzer'.format(T2_ROOT)\n", "T2CONF = '{}/scripts/t2conf/t2conf'.format(T2_ROOT)\n", "T2BUILD = '{}/autogen.sh'.format(T2_ROOT)\n", "BPBS_PATH = 'bpbs.data'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up Tranalyzer\n", "\n", "First, we need to set up Tranalyzer to include the required non-standard plugins:\n", "\n", "* `nFrstPkts` to get the signal for the first few packets of a flow, and\n", "* `sslDecode` to identify YouTube flows.\n", "\n", "We can use `t2conf` to configure the plugins to our liking. For this tutorial, we set the minimum time window that defines a burst to 50ms and the number of packets to analyze to 200.\n", "\n", "To build Tranalyzer and the plugins, we use the included `autogen` build script." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "REQUIRED_PLUGINS = ['nFrstPkts', 'sslDecode']\n", "\n", "def setup_t2():\n", " options = {\n", " 'NFRST_IAT': '0',\n", " 'NFRST_MINIATS': '0',\n", " 'NFRST_MINIATU': '50000',\n", " 'NFRST_PLAVE': '0',\n", " 'NFRST_PKTCNT': '200'\n", " }\n", "\n", " # Configure\n", " for opt, val in options.items():\n", " subprocess.call([T2CONF, 'nFrstPkts', '-D', f'{opt}={val}'])\n", "\n", " # Build\n", " subprocess.call([T2BUILD])\n", " for plugin in REQUIRED_PLUGINS:\n", " subprocess.call([T2BUILD, plugin])\n", "\n", "def definitely_need_setup_t2():\n", " active_plugins = subprocess.check_output([T2BUILD, '-l'], text=True).splitlines()\n", " return not all(plugin in active_plugins for plugin in REQUIRED_PLUGINS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running Tranalyzer\n", "\n", "We need a function that, given a PCAP file, runs Tranalyzer to determine the bursts, and returns the path to the resulting flow file for further processing:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def run_t2(infile, outdir):\n", " subprocess.call([\n", " T2_PATH,\n", " '-r', infile,\n", " '-w', outdir\n", " ], stdout=subprocess.DEVNULL)\n", " return '{}/{}_flows.txt'.format(outdir, splitext(basename(infile))[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extract the BPB features\n", "Given a flow file, we now need to extract a list of numbers that correspond to the total number of bytes in each burst. For this, we run a small (T)AWK script and some basic postprocessing that we execute using `tawk`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def get_bpb(flowfile):\n", " EXTRACT_SCRIPT = '''\n", " source \"$1/scripts/t2utils.sh\"\n", " flowfile=\"$2\"\n", "\n", " cat $flowfile | $TAWK -t -I $flowfile \\\n", " '($dir == \"B\" && $sslServerName ~ /.googlevideo.com/) {\n", " print $numBytesSnt, $L2L3L4Pl_Iat_nP\n", " }' |\n", " sort -nr | # sort by byte count, descending\n", " head -n 1 | # get biggest flow\n", " cut -d$'\\t' -f2 | # get nFrstPkts output (burst)\n", " tr ';' '\\n' | # one burst per line\n", " cut -d'_' -f1 | # extract byte count from each burst\n", " cat\n", " '''\n", " proc = subprocess.Popen(['/bin/bash', '-c', EXTRACT_SCRIPT, '', T2_ROOT, flowfile], stdout=subprocess.PIPE)\n", " bpbs = [int(bpb) for bpb in proc.stdout] if proc.stdout else []\n", " return bpbs\n", "\n", "def extract_bpb(pcap):\n", " with tempfile.TemporaryDirectory() as tempdir:\n", " flowfile = run_t2(pcap, tempdir)\n", " bpb = get_bpb(flowfile)\n", " return bpb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check the output using a random PCAP in our training data path:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[434420, 1595848, 359344, 1811560, 351882, 1682472, 1665230, 329019, 2101566, 1793397, 356024, 1789932, 2101566, 338444, 2101566, 1377684, 341082, 1830971, 2084460, 337281, 2101566, 1792682, 328908, 2101566, 1957214, 445567, 1129815]\n" ] } ], "source": [ "try:\n", " print(extract_bpb(glob.glob(f'{DATA_PATH}/*/Train/*.pcap')[0]))\n", "except:\n", " print('Unable to find any PCAPs in your training data. Are you sure you set DATA_PATH correctly?')\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We extract the bursts for several PCAPS in parallel to speed up the process. Each thread is given a PCAP file, runs Tranalyzer, extracts the bursts and stores them into a unique location per thread." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def thread_t2_bpb(bpbs, index, pcap):\n", " bpb = extract_bpb(pcap)\n", " bpbs[index] = bpb\n", "\n", "def get_bpbs_for_class(c):\n", " pcaps_root = '{}/{}/Train'.format(DATA_PATH.rstrip('/'), c)\n", " pcaps = glob.glob('{}/*.pcap'.format(pcaps_root))\n", " bpbs = []\n", "\n", " max_pcaps = 6\n", " chunk_size = max_pcaps # `chunk_size` must divide `max_pcaps`.\n", " pcaps_chunks = zip(*[iter(pcaps[:max_pcaps])]*chunk_size)\n", " for chunk in map(list, pcaps_chunks):\n", " threads_results = [None] * chunk_size\n", " threads = [threading.Thread(target=thread_t2_bpb, args=(threads_results, i, pcap))\n", " for i, pcap in enumerate(chunk)]\n", " for t in threads:\n", " t.start()\n", " for t in threads:\n", " t.join()\n", " for bpbs_c in threads_results:\n", " bpbs.append(bpbs_c)\n", " return bpbs\n", "\n", "def load_bpbs(path):\n", " with open(path, 'rb') as fh:\n", " return pickle.load(fh)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at the BPBs for all samples of a class. For each sample of this class, we get a list of numbers representing the number of bytes in a burst:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4110, 129278, 778713, 268405, 3728018, 4098388, 384251, 3486785, 379773, 3845582, 375686, 1108149], [4990, 66262, 7826406, 384251, 3486785, 379773, 3845582, 375686, 1108149], [4110, 66262, 4098388, 384251, 3486785, 379773, 3845582, 375686, 1108149], [4400, 129278, 149475, 627878, 268405, 402514, 1608766, 405435, 1203498, 4098388, 384251, 3486785, 379773, 1906720, 1938862, 375686, 1108149], [4110, 129278, 149475, 896283, 2011280, 405435, 4931516, 4098388, 384251, 3486785, 379773, 3845582, 375686, 1108149], [4110, 129278, 1287473, 670919, 1554910, 2094745, 8215957, 384251, 3335118, 379773, 3881665]]\n" ] } ], "source": [ "print(get_bpbs_for_class('Avengers'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building the model\n", "\n", "Using these building blocks, we can now write a function `learn` that, given a list of classes, fetches the corresponding PCAP files from the user-defined path at the top in this script, extracts the bursts for all PCAPs of each class using Tranalyzer, and stores the resulting features in a dictionary.\n", "\n", "We persist this dictionary to disk so that we can call this program again in test mode, give it a new PCAP, and get the video title that most closely matches the unknown sample, given the model.\n", "\n", "Note that here, we are storing the bursts in a list rather than a set (as in the paper). This helps us understand the data better when exploring it visually later on. It does not have an impact on classification accuracy." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def learn(classes, *, write_to=None):\n", " bpbs = { c: [] for c in classes }\n", " for c in classes:\n", " print('Extracting bursts for class {}'.format(c))\n", " c_bpb = get_bpbs_for_class(c)\n", " for b in c_bpb:\n", " # NOTE: Could be set() to improve performance in production, but we're using\n", " # list() to preserve the order of the bursts and produce more meaningful plots.\n", " bpbs[c].append(list(b))\n", " if write_to:\n", " with open(write_to, 'wb') as fh:\n", " pickle.dump(bpbs, fh)\n", " return bpbs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classification\n", "\n", "To classify an unknown sample, a simple nearest neighbor approach is used.\n", "\n", "A video is represented as a set of integers, each representing one burst in the signal.\n", "\n", "The unknown sample is classified to the video title of the known sample that shares the most bursts in common with the unknown sample." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def similarity(xs, ys):\n", " return sum(x in ys for x in xs)\n", "\n", "def nearest_neighbors(x, bpbs, n=3):\n", " top, sims = [None] * n, [0] * n\n", " for c in bpbs:\n", " best_in_class = 0\n", " j = -1\n", " for y in bpbs[c]:\n", " j += 1\n", " s = similarity(x, y)\n", " for i in range(n):\n", " if s > sims[i] and s > best_in_class:\n", " sims[i] = s\n", " best_in_class = s\n", " top[i] = c\n", " break\n", " return list(zip(top, sims))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Program options\n", "\n", "Parsing the options for our program:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def parse_args(argv=sys.argv[1:]):\n", " parser = argparse.ArgumentParser(description='Classifier for YouTube video streams')\n", " group = parser.add_argument_group('Classifier options')\n", " modes = set(['learn', 'test'])\n", " parser.add_argument('mode', metavar='mode', choices=modes, nargs=1,\n", " help='Mode to run in. Available: ' + ', '.join(modes))\n", " group.add_argument('-t', '--test', default='', type=str, help='Path to pcap to classify')\n", " group.add_argument('-f', '--force', action='store_true', help='Overwrite existing files without prompting')\n", " group.add_argument('-s', '--setup', action='store_true', help='Perform initial setup of tranalyzer')\n", " return parser.parse_args(argv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exploring the data\n", "\n", "To understand our data better, we can plot the extracted bursts for each sample and examine them side by side." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def plot_bursts(bpbs, title=''):\n", " width = 10\n", " cols = 3\n", " rows = (len(bpbs) + cols - 1 ) // cols\n", " fig = plt.figure(figsize=(width, 3 * rows))\n", " ax1 = plt.subplot(rows, cols, 1)\n", " for i, bpb in enumerate(map(list, bpbs), 1):\n", " plt.subplot(rows, cols, i, sharey=ax1)\n", "\n", " # Optional: Filter out audio bursts <=500K.\n", " #bpb = [x for x in bpb if (x > 500*1024)]\n", "\n", " # Optional: Sort for alternative visualization perspective.\n", " #bpb.sort()\n", "\n", " plt.bar(x=range(len(bpb)), height=bpb)\n", " plt.title(title + ' ' + str(i), fontsize=12, fontweight=0)\n", "\n", " plt.tight_layout()\n", " plt.show(block=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Putting it all together\n", "\n", "We first train our model using the training data. Afterwards, the program can be run in test mode and will output the top matches for a new unknown encrypted video stream sample." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def main(argv=None):\n", " args = parse_args(argv)\n", "\n", " if args.setup or definitely_need_setup_t2():\n", " print('Setting up tranalyzer ... ', end='')\n", " setup_t2()\n", " print('done')\n", "\n", " # run t2 on all PCAPs and extract BPB set for each class\n", " if args.mode[0] == 'learn':\n", " print('Storing learned BPB models to file: {}'.format(BPBS_PATH))\n", " if isfile(BPBS_PATH) and not args.force:\n", " print('WARNING: The model file {} already exists. Overwrite it? [yN] '.format(BPBS_PATH), end='')\n", " if input().lower() != 'y':\n", " print('Exiting.')\n", " sys.exit(1)\n", " classes = [basename(normpath(p)) for p in glob.glob('{}/**/'.format(DATA_PATH), recursive=False)]\n", " classes = classes[:10]\n", " print('Found {} classes, building model now ...'.format(len(classes)))\n", " bpbs = learn(classes, write_to=BPBS_PATH)\n", " print('Done building model, ready to test now.')\n", "\n", " # NOTE: Collected BPB features are stored in a file and are used when this program is\n", " # invoked in \"test\" mode.\n", "\n", " if VERBOSE:\n", " for c in classes[:3]:\n", " print('Bursts for class {}:'.format(c))\n", " plot_bursts(list(bpbs[c]), c) # Plot bursts of whichever class happens to be the first.\n", " else:\n", " if not args.test:\n", " print(\"Missing path to test PCAP file. See help page.\")\n", " sys.exit(1)\n", "\n", " pcap_test = args.test\n", " bpbs = load_bpbs(BPBS_PATH)\n", " bpb_test = extract_bpb(pcap_test)\n", "\n", " print('Bursts of test sample ({}):'.format(basename(pcap_test)))\n", " plot_bursts([bpb_test], 'Test sample')\n", "\n", " top = nearest_neighbors(bpb_test, bpbs, 3)\n", " result = top[0][0]\n", " print('Classification result: {}'.format(result))\n", " print('')\n", " print('(Top 3: {})'.format(top))\n", " plot_bursts(bpbs[result], result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the purposes of this tutorial, let's first train our model, then test it on a new PCAP:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Storing learned BPB models to file: bpbs.data\n", "WARNING: The model file bpbs.data already exists. Overwrite it? [yN] y\n", "Found 10 classes, building model now ...\n", "Extracting bursts for class Jennifer_Lopez_On_The_Floor\n", "Extracting bursts for class Democratic_Town_Hall\n", "Extracting bursts for class Fast_and_Furious_six\n", "Extracting bursts for class Coolio_Gangsters_Paradise\n", "Extracting bursts for class Lenny_Kravitz_American_Woman\n", "Extracting bursts for class Disconnect\n", "Extracting bursts for class Jungle_Book\n", "Extracting bursts for class Robbie_Williams_Supreme\n", "Extracting bursts for class Meghan_Trainor_All_About_That_Bass\n", "Extracting bursts for class fifty_Cent_In_Da_Club\n", "Done building model, ready to test now.\n", "----------------------------------------\n", "Bursts of test sample (Fast_and_Furious_six_Train00_40_30.pcap):\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARMAAADQCAYAAAAkooUWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAEK5JREFUeJzt3X+MVeWdx/H3R0R0cS2gaC2wha6TVey2/phFsjUbg8aO2hYTNcVtlbQYug1mbeLuik27pFoT7G6La2LbECVCY6SsmkpbDKGoa3dblfHHqsgaxp9MYfkhoFitFv3uH+eZ9TDcYe6988CZmft5JTf3nu95znnOUeeT5zzn3KsiAjOzgTqs6gMws+HBYWJmWThMzCwLh4mZZeEwMbMsHCZmloXDxIYtSf8r6eyqj6NVOEyGMUlvlV4fSHqntPylAez3UUlfznmsg4mk0ZLulfSqpJA0vepjGgocJsNYRBzd8wJeAz5fqt1V9fENYgH8B/C3wK6Kj2XIcJi0MEkjJH1b0kuSdki6S9KYtG60pOWSdkraLekxSWMlfR/4K+D2NML5fo391tw2rfuapP+RtEdSl6SvlrbrSLVvpeP5naQLJc2U9KKk1yVdW2q/UNLdaRSxR9I6Sac2eq69RcTbEXFrRPwX8MGA/iG3EIdJa/tH4HzgbGAi8EdgUVp3FXA4MAE4DrgaeC8irgXWAVelEc61++21j23Tui3ABcAxwN8Bt/UKgI+n4/gosBBYAlwKfAo4D7hJ0oRS+0uApcA44H7gPkkjGjxXy8Bh0tq+BsyPiM0R8QfgO8AXJYnij2088OcRsTci1kXE7+vcb5/bRsTKiHg5Cr+iuJwoT5K+DfxLROwFlgMnAP8aEb+PiKeAF4G/LLX/TdrnHynC5zjgjAbP1TI4vOoDsGqkP6JJwCpJ5W97HgYcC9xBMTq4R9LRwDLg2xHxfh2773NbSV8AvgWclPr6E+DXpW23R0TPpcU76X1raf07wNGl5U09HyJir6TNwMcaPNcddZyT9cMjkxYVxdfFfwfMiIgxpdeREbEjIt6NiH+OiJOBvwEuA2b1bN7PvmtuK2k08O/AjcDxETEGeBAYyOhgUs+HdHnzMWBzI+c6gL6txGHS2n4MLJQ0CUDS8ZI+nz6fJ2mqpMOAN4G9QM+oZCvwib52eoBtjwJGAtuAD9Io5ZwBnsNfS/qcpJHAPwGvA082cq59nMMoSUemxSNKn60PDpPW9j3gV8CDkvYAv+HD+YYJFBOae4DngFXAirRuEXClpF2SvldjvzW3TaOAfwB+TvFHf3FaNxD3Al+luIV7CXBJH5diBzrXWl6luKQ6lmJe5x1JHx3gsQ5r8o8j2VAlaSFwXERcVfWxmEcmZpZJ3WGSHvp5StIv0vKU9DDSRkk/lXREqo9Ky11p/eTSPq5P9RckfbZU70i1LknzS/WG+zCzajQyMrkG2FBavhlYFBFtFNerc1J9DrArIk6iuLa+GUDSVIq7AacCHcAPU0CNAG6jeJBpKnB5attwH9ZaImK+L3EGj7rCRNJE4CLg9rQsYAZwT2qylGIyDWBmWiatPze1nwksT7cNXwa6gGnp1RURL0XEexQPKs1ssg8zq0i9D63dQnHb7U/T8rHA7vSUIkA3xQw+6X0T/P9DRG+k9hOAR0v7LG+zqVf9rCb72OeZAUlzgbkAo0ePPvPkk0+u83TNDOCJJ57YERHj62nbb5hI+hywLSKekHROT7lG0+hnXV/1WqOjA7Xvr/8PCxGLgcUA7e3t0dnZWWMzM+uLpFfrbVvPyOQzwBckXQgcSfEFrVuAMZIOTyOHiXz41GE3xVOJ3ZIOBz4C7CzVe5S3qVXf0UQfZlaRfudMIuL6iJgYEZMpJlAfjIgvAQ9RfJsTYDbFQ0oAK9Myaf2D6XHmlRSPVI+SNAVoAx6n+AZqW7pzc0TqY2XaptE+zKwiA/mi33XAcknfBZ6i+HIX6f0nkrooRguzACJivaQVwPMUj1fP63lSUdLVwGpgBLAkItY304eZVadlnoD1nIlZ4yQ9ERHt9bT1E7BmloXDxMyy8I8jmR3A5Pm/3K/2ysKLKjiSwc8jEzPLwmFiZlk4TMwsC4eJmWXhMDGzLBwmZpaFw8TMsnCYmFkWDhMzy8JhYmZZOEzMLAuHiZll4TAxsywcJmaWhcPEzLJwmJhZFg4TM8vCYWJmWThMzCwLh4mZZeEwMbMsHCZmloXDxMyycJiYWRYOEzPLwmFiZlk4TMwsC4eJmWXhMDGzLA7vr4GkI4FHgFGp/T0RsUDSFGA5MA54ErgiIt6TNApYBpwJvA58MSJeSfu6HpgDvA/8fUSsTvUO4N+AEcDtEbEw1Rvuwwa3yfN/uV/tlYUXVXAklls9I5N3gRkR8WngNKBD0nTgZmBRRLQBuyhCgvS+KyJOAhaldkiaCswCTgU6gB9KGiFpBHAbcAEwFbg8taXRPsysOv2GSRTeSosj0yuAGcA9qb4UuDh9npmWSevPlaRUXx4R70bEy0AXMC29uiLipYh4j2IkMjNt02gfZlaRuuZM0gjiaWAbsAZ4EdgdEXtTk25gQvo8AdgEkNa/ARxbrvfapq/6sU300fu450rqlNS5ffv2ek7VzJpUV5hExPsRcRowkWIkcUqtZum91gghMtYP1Me+hYjFEdEeEe3jx4+vsYmZ5dLQ3ZyI2A08DEwHxkjqmcCdCGxOn7uBSQBp/UeAneV6r236qu9oog8zq0i/YSJpvKQx6fNRwHnABuAh4NLUbDZwf/q8Mi2T1j8YEZHqsySNSndp2oDHgXVAm6Qpko6gmKRdmbZptA8zq0i/t4aBE4Gl6a7LYcCKiPiFpOeB5ZK+CzwF3JHa3wH8RFIXxWhhFkBErJe0Ange2AvMi4j3ASRdDaymuDW8JCLWp31d10gfZladfsMkIp4BTq9Rf4li/qR3/Q/AZX3s6ybgphr1VcCqHH2YWTX8BKyZZeEwMbMsHCZmloXDxMyycJiYWRYOEzPLwmFiZlk4TMwsC4eJmWXhMDGzLBwmZpZFPV/0MxsW/PuzB5dHJmaWhcPEzLJwmJhZFg4TM8vCYWJmWThMzCwLh4mZZeEwMbMsHCZmloXDxMyycJiYWRYOEzPLwmFiZlk4TMwsC4eJmWXhMDGzLBwmZpaFw8TMsnCYmFkW/YaJpEmSHpK0QdJ6Sdek+jhJayRtTO9jU12SbpXUJekZSWeU9jU7td8oaXapfqakZ9M2t0pSs32YWTXqGZnsBa6NiFOA6cA8SVOB+cDaiGgD1qZlgAuAtvSaC/wIimAAFgBnAdOABT3hkNrMLW3XkeoN9WFm1en31+kjYguwJX3eI2kDMAGYCZyTmi0FHgauS/VlERHAo5LGSDoxtV0TETsBJK0BOiQ9DBwTEb9N9WXAxcADjfaRjtWa4F9ut4FqaM5E0mTgdOAx4ISeP970fnxqNgHYVNqsO9UOVO+uUaeJPnof71xJnZI6t2/f3sipmlmD6g4TSUcD9wLfiIg3D9S0Ri2aqB/wcOrZJiIWR0R7RLSPHz++n12a2UDUFSaSRlIEyV0RcV8qb02XL6T3baneDUwqbT4R2NxPfWKNejN9mFlF6rmbI+AOYENE/KC0aiXQc0dmNnB/qX5luuMyHXgjXaKsBs6XNDZNvJ4PrE7r9kianvq6ste+GunDzCpSz/8e9DPAFcCzkp5OtW8CC4EVkuYArwGXpXWrgAuBLuBt4CsAEbFT0o3AutTuhp7JWODrwJ3AURQTrw+kekN9mFl16rmb85/UnqMAOLdG+wDm9bGvJcCSGvVO4JM16q832oeZVcNPwJpZFg4TM8vCYWJmWThMzCwLh4mZZeEwMbMsHCZmloXDxMyycJiYWRYOEzPLop7v5phZBYbaD1Z5ZGJmWThMzCwLh4mZZeEwMbMsHCZmloXDxMyycJiYWRYOEzPLwg+tmVVsqD2c1hePTMwsC4eJmWXhMDGzLBwmZpaFw8TMsnCYmFkWDhMzy8JhYmZZOEzMLAuHiZll4TAxsyz6DRNJSyRtk/RcqTZO0hpJG9P72FSXpFsldUl6RtIZpW1mp/YbJc0u1c+U9Gza5lZJarYPM6tOPSOTO4GOXrX5wNqIaAPWpmWAC4C29JoL/AiKYAAWAGcB04AFPeGQ2swtbdfRTB9mVq1+wyQiHgF29irPBJamz0uBi0v1ZVF4FBgj6UTgs8CaiNgZEbuANUBHWndMRPw2IgJY1mtfjfRhZhVqds7khIjYApDej0/1CcCmUrvuVDtQvbtGvZk+9iNprqROSZ3bt29v6ATNrDG5f89ENWrRRL2ZPvYvRiwGFgO0t7f3t1+zg2q4/G5JX5odmWztubRI79tSvRuYVGo3EdjcT31ijXozfZhZhZoNk5VAzx2Z2cD9pfqV6Y7LdOCNdImyGjhf0tg08Xo+sDqt2yNperqLc2WvfTXSh5lVqN/LHEl3A+cAx0nqprgrsxBYIWkO8BpwWWq+CrgQ6ALeBr4CEBE7Jd0IrEvtboiInkndr1PcMToKeCC9aLQPM6tWv2ESEZf3sercGm0DmNfHfpYAS2rUO4FP1qi/3mgfZq2uynkZPwFrZlk4TMwsC/+vLsxawKG4/PHIxMyycJiYWRYOEzPLwmFiZlk4TMwsC4eJmWXhW8OW3XD/dqzV5pGJmWXhMDGzLHyZM0z5UsMONYeJDUkOy8HHYWLDjoOmGp4zMbMsHCZmloXDxMyy8JzJEOf5gdYzWP+de2RiZlk4TMwsC4eJmWXhORMbFPqaBzgU8wPN9DFY5y2q5DAxy6iVQ8aXOWaWhcPEzLLwZc4Q0MpDZxs6HCZ1GqwTgYfCYD0uG1wcJi3GwWAHi+dMzCwLj0wGyM8omBUcJgeJA8NazZC9zJHUIekFSV2S5ld9PGatbkiGiaQRwG3ABcBU4HJJU6s9KrPWNiTDBJgGdEXESxHxHrAcmFnxMZm1NEVE1cfQMEmXAh0RcVVavgI4KyKu7tVuLjA3Lf4F8EID3RwH7MhwuEORz7011Tr3j0fE+Ho2HqoTsKpR2y8VI2IxsLipDqTOiGhvZtuhzufuc2/GUL3M6QYmlZYnApsrOhYzY+iGyTqgTdIUSUcAs4CVFR+TWUsbkpc5EbFX0tXAamAEsCQi1mfupqnLo2HC596aBnTuQ3IC1swGn6F6mWNmg4zDxMyycJj00mqP6UtaImmbpOdKtXGS1kjamN7HVnmMB4OkSZIekrRB0npJ16R6K5z7kZIel/Tf6dy/k+pTJD2Wzv2n6eZG3RwmJS36mP6dQEev2nxgbUS0AWvT8nCzF7g2Ik4BpgPz0r/rVjj3d4EZEfFp4DSgQ9J04GZgUTr3XcCcRnbqMNlXyz2mHxGPADt7lWcCS9PnpcDFh/SgDoGI2BIRT6bPe4ANwARa49wjIt5KiyPTK4AZwD2p3vC5O0z2NQHYVFruTrVWc0JEbIHijw44vuLjOagkTQZOBx6jRc5d0ghJTwPbgDXAi8DuiNibmjT8377DZF91PaZvw4eko4F7gW9ExJtVH8+hEhHvR8RpFE+PTwNOqdWskX06TPblx/QLWyWdCJDet1V8PAeFpJEUQXJXRNyXyi1x7j0iYjfwMMW80RhJPQ+yNvzfvsNkX35Mv7ASmJ0+zwbur/BYDgpJAu4ANkTED0qrWuHcx0sakz4fBZxHMWf0EHBpatbwufsJ2F4kXQjcwoeP6d9U8SEdVJLuBs6h+Pr5VmAB8DNgBfBnwGvAZRHRe5J2SJN0NvBr4Fngg1T+JsW8yXA/909RTLCOoBhQrIiIGyR9guKmwzjgKeDLEfFu3ft1mJhZDr7MMbMsHCZmloXDxMyycJiYWRYOEzPLwmFiZlk4TMwsi/8D3CCK7myprkgAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Classification result: Fast_and_Furious_six\n", "\n", "(Top 3: [('Fast_and_Furious_six', 30), ('Disconnect', 1), ('Jungle_Book', 1)])\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "if __name__ == '__main__':\n", " main(['learn'])\n", "\n", " print('-' * 40)\n", "\n", " test_pcap = f'{DATA_PATH}/Fast_and_Furious_six/Test/Fast_and_Furious_six_Train00_40_30.pcap'\n", " main(['test', '-t', test_pcap])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "We wrote a short python program to build and train a nearest neighbor model to classify encrypted YouTube video streams using Tranalyzer.\n", "\n", "Download the jupyter notebook for this tutorial [here](/download/data/bpb-classifier.ipynb).\n", "\n", "If you have any questions or feedback, please do not hesitate to contact us!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References\n", "\n", "[1] R. Dubin, A. Dvir, O. Pele and O. Hadar, \"I Know What You Saw Last Minute—Encrypted HTTP Adaptive Video Streaming Title Classification,\" in IEEE Transactions on Information Forensics and Security, vol. 12, no. 12, pp. 3039-3049, Dec. 2017. doi: 10.1109/TIFS.2017.2730819 https://ieeexplore.ieee.org/document/7987775" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }