Coder Social home page Coder Social logo

reproduce's Introduction

REproduce

Coverage Status

Introduction

REproduce is a program that reads raw packet values such as a pcap file, converts them into log-type streams through specific field values or characteristics, and outputs the conversion result to a file or to a kafka server.

Packet translation is up to the transport layer, and the protocols currently supported are Ethernet, IP, ARP, TCP, UDP, and ICMP. Also, logs and plain text files are converted to a new type of log stream by adding or removing information according to their attributes.

Function Specification

  • The program converts packets into log-type streams
  • The program performs a conversion that converts the log to a new format, and removes unnecessary elements or adds features
  • The program sends transformed streams via kafka platform

1. Data entry

Specify a single pcap file or network interface or plain text file such as log to be converted through the program's options.

2. Conversion

REproduce converts the incoming packet or pcap file, log to a stream format with space as delimiter, as in the following Conversion Format. The conversion of packet starts with the sec value in the time_t structure representing the timestamp, and then converts from the lower layer to the higher layer of the protocol.

Conversion Example

Packet

1531980829 Ethernet2 a4:7b:2c:1f:eb:61 40:61:86:82:e9:26 IP 4 5 0 56325 19069 64 127 7184 59.x.x.91 121.x.x.134 ip_opt TCP 3389 63044 1092178785 2869829243 20 AP 64032 5779 0

[Seconds of Timestamp] [Protocol Name] [Destination MAC Address] [Source MAC Address] [Protocol Name] [Version] [IHL] [ToS] [Total Length] [Identification] [Fragment Offset] [TTL] [Header Checksum] [Source IP Address] [Destination IP Address] [Presence of option field] [Protocol name] [Source Port Address] [Destination Port Address] [Squence Number] [Acknowledge Number] [Hlen] [Flags(UAPRSF)] [Window Size] [Checksum] [Urgent Pointer]

Log

20180906 e1000 enp0s3 N Link Up 1000Mbps Full Duplex Flow Control: RX

See more details in appendix.

3. Event id

REproduce insert Event id in all messages.

  • Format: 0.9.8 or later
    Event id (64bit) = system time in seconds(upper 32bit) + sequence number(lower 24bit) + datasource id(lowest 8bit)

  • Format: 0.9.7 and lower
    Event id (64bit) = datasource id(upper 16bit) + sequence number(lower 48bit)

4. Output

REproduce outputs the converted result in a form specified by the user(Stdout, File, Transmission to kafka server).

Usage

Program Usage

> reproduce [OPTIONS]

OPTIONS

  -b: kafka broker list, [host1:port1,host2:port2,..] (default: localhost:9092)
  -c: send count
  -d: data source id (1~255). (default: 1)
  -e: evaluation mode. output statistical result of transmission after job is terminated or stopped
  -E: entropy ratio. The amount of maximum entropy allowed for a
      session (0.0 < entropy ratio <= 1.0). Default is 0.9.
      Only relevant for network packets.
  -f: packet filter syntax when input is NIC or PCAP
      (reference : https://www.tcpdump.org/manpages/pcap-filter.7.html)
  -g: follow the growing input file
  -h: help
  -i: input [PCAPFILE/LOGFILE/DIR/NIC]
      If no 'i' option is given, input is internal sample data
      If DIR is given, the g option is not supported.
  -j: set initial sequence number. (24bit size)
  -k: kafka config file (Ex: kafka.conf)
      it overrides default kafka config to user kafka config
  -m: match [Pattern FILE]
  -n: prefix of file name to send multiple files or directory
  -o: output [TEXTFILE/none]
      If no 'o' option is given, output is kafka
  -p: queue period time. how much time keep queued data. (default: 3)
  -q: queue size. how many bytes send once to kafka. (default: 900000)
  -r: record [prefix of offset file]
      using this option will start the conversion after the previous
      conversion. The offset file name is managed by [input file]_[prefix].
      Except when the input is a NIC.
  -s: skip count
  -t: kafka topic (default: pcap)
      If the broker does not have a corresponding topic, the broker fails
      unless there is a setting that automatically creates the topic.
  -v: REproduce watches the input directory and sends it when new files are found.

Kafka Config

When transferring the converted result via kafka, various options can be set through the file specified with the 'k' option. The configuration consists of two sections: global settings and topic settings. Each section consists of properties and values. An example of a configuration file is following.

Examples

  • Convert pcap file and send it to kafka server:
    • reproduce -i test.pcap -b 192.168.10.1:9092 -t sample_topic
  • Convert log file and send it to kafka server:
    • reproduce -i LOG_20180906 -b 192.168.10.1:9092 -t sample_topic
  • Save result file after converting pcap file:
    • reproduce -i test.pcap -o result.txt
  • Skip 10000 packets and convert 1000 packets in pcap file and evaluate performance:
    • reproduce -i test.pcap -s 10000 -c 1000 -o none -e
  • Convert it while following, If the content of the input file continue to grow
    • reproduce -i test.pcap -g
  • Convert only udp packets of traffic to and from network interface enp0s3
    • reproduce -i enp0s3 -f "udp" -o none
  • When transmitting to kafka once, queue up to 10Kbytes, and if transmission interval is delayed more than 2 seconds, send immediately
    • reproduce -i test.pcap -q 10240 -p 2
  • Send all log files beginning with 'msg' prefix in the '/data/LOG' directory and it's subdirectory
    • reproduce -i /data/LOG -n msg -b 192.168.4.5:9092 -t syslog -e
  • Send all log files in the '/data/LOG' directory and it's subdirectory. And polling the directory periodically (default 3 seconds). If new files found, send it too.
    • reproduce -i /data/LOG -v -b 192.168.4.5:9092 -t syslog -e

Report Example

REproduce creates or opens /report/<Kafka topic name> first. If it failed, it will try to open ./<Kafka topic name>.

root@bada-unbuntu:~/REproduce# ./REproduce -i test.pcap -e -c 10000000
root@bada-unbuntu:~/REproduce# tail -f /report/<Kafka topic name>
--------------------------------------------------
Time:                       Mon Jul  1 14:34:23 2019
Input(PCAP):                test.pcap(976.56M)
Datasource ID:              1
Input ID:                   1 ~ 10000000
Output(KAFKA):              localhost:9092(pcap)
Statistics(Min/Max/Avg):    60/4428/121.97bytes
Process Count:              6905184(803.185116MB)
Skip Count:                 0(0B)
Elapsed Time:               23.86s
Performance:                70.22MBps/419.14Kpps

Sessions Information Example

  • 0.9.6 version doesn't create or update sessions.txt and only send payload data.

REproduce creates /report/sessions.txt-YYYYMMDDHHMMSS or ./sessions.txt-YYYYMMDDHHMMSS.

  • Fields: event_id, sip, dip, proto, sport, dport
  • You can get the packet number by removing the upper 16 bits (datasource_id) from the event_id
  • The report and session files share the 'YYYYMMDDHHMMSS' value, and the value means that REproduce process or docker launch time.
root@bada-unbuntu:~/REproduce# tail -f /report/sessions-20190712143111.txt
281474976840922,316206727,3530930811,6,443,64015
281474976840923,2540037634,3530917189,6,80,63841
281474976840965,3547547186,3422459274,6,5001,63819
281474976840983,1890545971,3530923615,6,14811,10681
281474976841013,2877264388,3530918422,6,51305,5500
281474976841017,1749831047,3530917189,6,80,63848
281474976841024,3551661079,3422459095,1,0,0
281474976841027,1360408062,3530921781,6,53345,51000

Performance

Test environment

  • CPU : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
  • Memory : 64GB
  • Cores(Utilization) : 1(100%)

Result

Contents Speed
Packet Conversion Only 80.07MBps / 477.94Kpps
Kafka Transmission Only 770.97MBps / 4619.52Kpps
Packet Conversion + Kafka Transmission 71.63MBps / 427.58Kpps

Issue

To do

  • Support More protocols
  • Define the conversion of log and Implement it

Appendix

Conversion Format

Ethernet

Order Type description example
1 Text Protocol Name Ethernet2
2 MAC Address Destination MAC Address 40:61:86:82:e9:26
3 MAC Address Source MAC Address a4:7b:2c:1f:eb:61

IP

Order Type description example
1 Text Protocol Name IP
2 Decimal Version 4
3 Decimal IHL 5
4 Decimal ToS 0
5 Decimal Total Length 10240
6 Decimal Identification 29865
7 Decimal Fragment Offset 64
8 Decimal TTL 54
9 Decimal Header Checksum 49311
10 IP Address Source IP Address 125.209.230.110
11 IP Address Destination IP Address 59.7.91.240
12 Text Presence of option field ip_opt

ARP

Order Type description example
1 Text Protocol name ARP
2 Text Keyword To Protocol Type Field Request
* Request
* Reply
* Reverse Request
* Reverse Reply
* Inverse Request
* Inverse Reply
* NACK Reply
3 Text Keyword To Hardware Type Field Ethernet
* Ethernet
* TokenRing
* ArcNet
* FrameRelay
* Strip
* IEEE 1394
* ATM
4 According to the OP Code, it is divided as follows
4-a [Text] [IP Address] [Text] [IP Address] OP Code : 1(ARP Request) who-has 192.168.0.254 tell 192.168.0.1
4-b [IP Address] [Text] [MAC Address] OP Code : 2(ARP Reply) 192.168.0.254 is-at a4:7b:2c:3f:eb:24
4-c [Text] [MAC Address] [Text] [MAC Address] OP Code : 3(RARP Request) who-is 192.168.0.254 tell 192.168.0.1
4-d [MAC Address] [Text] [IP Address] OP Code : 4(RARP Reply) a4:7b:2c:3f:eb:24 at 192.168.0.254
4-e [Text] [IP Address] [Text] [IP Address] OP Code : 8(InARP Request) who-is 40:61:86:82:e9:26 tell a4:7b:2c:3f:eb:24
4-f [MAC Address] [Text] [IP Address] OP Code : 9(InARP Reply) a4:7b:2c:3f:eb:24 at 192.168.0.254

ICMP

Order Type description example
1 Text Protocol name ICMP
2 Decimal Type 8
3 Decimal Code 0
4 Decimal Checksum 1048
5 Text Keyword To ICMP Type Field ttl_expired
* ttl_expired
* echo_reply

TCP

Order Type description example
1 Text Protocol name TCP
2 Decimal Source Port Address 16493
3 Decimal Destination Port Address 80
4 Decimal Squence Number 2622175950
5 Decimal Acknowledge Number 416662581
6 Decimal Hlen 20
7 Text Flags(UAPRSF) AS
8 Decimal Window Size 134
9 Decimal Checksum 32214
10 Decimal Urgent Pointer 0

UDP

Order Type description example
1 Text Protocol name UDP
2 Decimal Source Port Address 15948
3 Decimal Destination Port Address 53
4 Decimal Length 1048
5 Decimal Checksum 30584

Building Docker Images

try this in a directory that has a Dockerfile:

docker build -t registry.gitlab.com/resolutions/reproduce:0.9.8 .

Running Docker Images

Run with the following command

docker run --mount type=bind,source=[the directory containing the target],target=/data \
           --mount type=bind,source=[report or log directory],target=/report reproduce:latest -i [the target file] -b localhost:9092 -t topic

License

Copyright 2018-2021 Petabi, Inc.

Licensed under Apache License, Version 2.0 (the "License"); you may not use this crate except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See LICENSE for the specific language governing permissions and limitations under the License.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.

reproduce's People

Contributors

dependabot[bot] avatar kimhyunwooo avatar minshao avatar msk avatar syncpark avatar vvalgenti avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.