Coder Social home page Coder Social logo

v2ray-deep-packet-inspection's Introduction

Protecting the World against the tyranny

Please consider making your donation to the National Bank of Ukraine. The fundraiser National Bank of Ukraine or NBU is the central bank of Ukraine. You could help freedom fighters and Ukrainian civilians in humanitarian crisis:

If you are against funding bombs and arms, you could also donate to Come Back Alive Charity which helps the Ukrainian armed forces with defense including medical assistance and rehabilitation. The organization is transparent with the donation and its spending.

Thank you for your support.

What

This is a collection of Python notebooks of my on-going research on deep packet inspection.

  1. V2Ray Deep Packet Inspection Notebook demos my work to perform deep packet inspection and classify V2ray traffic. For more details, please visit my blog post.
  2. Adversarial Examples for Traffic Classifiers Notebook demos white-box adversarial example to evade V2Ray classifier. For more details, please visit my blog post.
  3. A Classification Model for V2Ray TLS + WebSocket settings is a pre-trained model for TLS + WebSocket settings. For more details, please visit my blog post.

v2ray-deep-packet-inspection's People

Contributors

rickyzhang82 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

v2ray-deep-packet-inspection's Issues

Interesting project

Can you analyze where do the salient features come from? The header (if yes which part) or the payload?
Thank you !

Wrong way to split the dataset

An important result of this repository is that V2Ray traffic data can be easily detected by a CNN model. But I found that the method of dividing the data set seems to be wrong. The following code is in this file. The following code undersamples both the training set and the validation set. It's ok to undersample on the training set, but undersampling on the validation/test set is not such a good choice. This method of dividing the data set allows the classifier to cheat. Because the distribution of positive and negative examples of the test set has been artificially adjusted. For machine learning, what is important is the independence of the training set and the test set.



import numpy as np
import math
import os.path
from pathlib import Path
import glob
from tensorflow.keras.utils import Sequence

FIXED_PACKET_SIZE = 1500
NUM_OF_PACKETS_PER_FILE = 16
RESCALE_FACTOR = 1./255

# v2ray traffic tag
TRAINING_DATA_PERCENTAGE = 0.8
PACKET_FILE_EXT = '*.bin'


def rglob(data_root, file_ext):
    files = list()
    for filePath in Path(data_root).rglob(file_ext):
        files.append(str(filePath))
    return files


def binary_classification(packet_path, match_string=V2RAY_HOST_TAG):
    """Binary network traffic classification function

    :param packet_path: file path to packet
    :param match_string:
    :return: 1, if it is v2ray traffice. 0, otherwise.
    """
    if packet_path.find(match_string) != -1:
        return 1
    else:
        return 0


def generate_train_validation_packet_path_list(data_root, training_pct=TRAINING_DATA_PERCENTAGE, eqaul_size=True):
    file_list = rglob(data_root, PACKET_FILE_EXT)
    v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 1]
    non_v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 0]

    if eqaul_size:
        cut_off_count = min(len(v2ray_file_list), len(non_v2ray_file_list))
        v2ray_file_size = cut_off_count
        non_v2ray_file_size = cut_off_count
    else:
        v2ray_file_size = len(v2ray_file_list)
        non_v2ray_file_size = len(non_v2ray_file_list)

    v2ray_indexes = np.arange(len(v2ray_file_list))
    np.random.shuffle(v2ray_indexes)
    non_v2ray_indexes = np.arange(len(non_v2ray_file_list))
    np.random.shuffle(non_v2ray_indexes)

    training_file_list = [v2ray_file_list[index]
                          for index in v2ray_indexes[:math.ceil(v2ray_file_size * training_pct)]] + \
                         [non_v2ray_file_list[index]
                          for index in non_v2ray_indexes[:math.ceil(non_v2ray_file_size * training_pct)]]

    validation_file_list = [v2ray_file_list[index]
                            for index in v2ray_indexes[math.ceil(v2ray_file_size * training_pct): v2ray_file_size]] + \
                           [non_v2ray_file_list[index]
                            for index in non_v2ray_indexes[math.ceil(non_v2ray_file_size * training_pct): non_v2ray_file_size]]

    print("Statistics: ")
    print("Total V2ray traffic %d, Total non-V2ray traffic %d" % (len(v2ray_file_list), len(non_v2ray_file_list)))
    print("Output train traffic %d, Total validation traffic %d" % (len(training_file_list), len(validation_file_list)))

    return training_file_list, validation_file_list
# Generate training data and validation file list
train_file_list, val_file_list = generate_train_validation_packet_path_list(data_root=DATA_ROOT, eqaul_size=True)

Related issue/discussion:

v2fly/v2ray-core#557
v2ray/discussion#569

Ref:
https://datascience.stackexchange.com/questions/61858/oversampling-undersampling-only-train-set-only-or-both-train-and-validation-set

data set

Could you please share the data set you use?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.