Coder Social home page Coder Social logo

evilsocket / ergo Goto Github PK

View Code? Open in Web Editor NEW
290.0 12.0 35.0 828 KB

๐Ÿง  A tool that makes AI easier.

License: Other

Python 98.45% Shell 1.55%
keras machine-learning deep-learning neural-networks gpu dataset training-algorithm ergo tensorflow

ergo's Introduction

ergo

Release Software License

ergo (from the Latin sentence "Cogito ergo sum") is a command line tool that makes machine learning with Keras easier.

It can be used to:

  • scaffold new projects in seconds and customize only a minimum amount of code.
  • encode samples, import and optimize CSV datasets and train the model with them.
  • visualize the model structure, loss and accuracy functions during training.
  • determine how each of the input features affects the accuracy by differential inference.
  • export a simple REST API to use your models from a server.

Installing

sudo pip3 install ergo-ai

Installing from Sources

git clone https://github.com/evilsocket/ergo.git
cd ergo
sudo pip3 install -r requirements.txt
python3 setup.py build
sudo python3 setup.py install

Enable GPU support (optional)

Make sure you have CUDA 11 and cuDNN 8.0 installed and then:

sudo pip3 uninstall tensorflow
sudo pip3 install tensorflow-gpu

Example Projects

Usage

To print the general help menu:

ergo help

To print action specific help:

ergo <action> -h

Start by printing the available actions by running ergo help, you can also print the software version (ergo, keras and tensorflow versions) and some hardware info with ergo info to verify your installation.

Creating a Project

Once ready, create a new project named example (ergo create -h to see how to customize the initial model):

ergo create example

Inside the newly created example folder, there will be three files:

  1. prepare.py, used to preprocess your dataset and inputs (if, for instance, you're using pictures instead of a csv file).
  2. model.py, that you can change to customize the model.
  3. train.py, for the training algorithm.

By default, ergo will simply read the dataset as a CSV file, build a small neural network with 10 inputs, two hidden layers of 30 neurons each and 2 outputs and use a pretty standard training algorithm.

Exploration (optional)

Explore properties of the dataset. Ergo can generate graphs and tables that can be useful for the feature engineering of the problem.

Explore can show:

  1. Metrics of each feature (min, max, standard deviation) - Which can be used to discard constant features in the dataset.
  2. Feature correlation of each feature with the target - Which can give an idea of how good is feature is as a linear predictor.
  3. Feature correlation matrix.
  4. PCA decomposition:
    • 2D projection of the data based on classes.
    • Explained variance of each principal component with 90, 95 and 99 % explanation values.
  5. Kmeans clustering or DBSCAN clustering of the data.
  6. Elbow method to determine the optimal number of clusters for kmeans.

Example with a dataset some/path/data.csv:

ergo explore example --dataset some/path/data.csv -p

This will show the PCA decomposition of the dataset, saving (and optionally showing) the explained variance vs the number of principal component vectors used and the 2D projection of the dataset (colored by labels).

A full exploratory analysis can be performed using the --all flag:

ergo explore example --dataset some/path/data.csv --all 

Encoding (optional)

In case you implemented the prepare_input function in the prepare.py script, ergo can be used to encode raw samples, being them executables, images, strings or whatever, into vectors of scalars that are then saved into a dataset.csv file suitable for training

Example with a folder /path/to/data which contains a pos and neg subfolders, in auto labeling mode each group of sample is labeled with its parent directory name:

ergo encode example /path/to/data

Example with a single folder and manual labeling:

ergo encode example /path/to/data --label 'some-label'

Example with a single text file containing multiple inputs, one per line:

ergo encode example /path/to/data --label 'some-label' -m

Training

After defining the model structure and the training process, you can import a CSV dataset (first column must be the label) and start training using 2 GPUs:

ergo train example --dataset /some/path/data.csv --gpus 2

This will split the dataset into a train, validation and test sets (partitioned with the --test and --validation arguments), start the training and once finished show the model statistics.

If you want to update a model and/or train it on already imported data, you can simply:

ergo train example --gpus 2

Testing

Now it's time to visualize the model structure and how the the accuracy and loss metrics changed during training (requires sudo apt-get install graphviz python3-tk):

ergo view example

If the data-test.csv file is still present in the project folder (ergo clean has not been called yet), ergo view will also show the ROC curve.

You can use the relevance command to evaluate the model on a given set (or a subset of it, see --ratio 0.1) by nulling one attribute at a time and measuring how that influenced the accuracy (feature.names is an optional file with the names of the attributes, one per line):

ergo relevance example --dataset /some/path/data.csv --attributes /some/path/feature.names --ratio 0.1

Once you're done, you can remove the train, test and validation temporary datasets with:

ergo clean example

Inference

To load the model and start a REST API for evaluation (can be customized with --address, --port, --classes and --debug options):

ergo serve example

To run an inference on a vector of scalars:

curl "http://localhost:8080/?x=0.345,1.0,0.9,..."

If you customized the prepare_input function in prepare.py (see the Encoding section), you can run an inference on a raw sample:

curl "http://localhost:8080/?x=/path/to/sample"

The input x can also be passed as a POST request:

curl --data 'x=...' "http://localhost:8080/"

Or as a file upload:

curl -F 'x=@/path/to/file' "http://localhost:8080/"

The API can also be used to perform encoding only:

curl -F 'x=@/path/to/file' "http://localhost:8080/encode"

This will return the raw features vector that can be used for inference later.

Other commands

To reset the state of a project (WARNING: this will remove the datasets, the model files and all training statistics):

ergo clean example --all

Evaluate and compare the performances of two trained models on a given dataset and (optionally) output the differences to a json file:

ergo cmp example_a example_b --dataset /path/to/data.csv --to-json diffs.json

Freeze the graph and convert the model to the TensorFlow protobuf format:

ergo to-tf example

Convert the Keras model to frugally-deep format:

ergo to-fdeep example

Optimize a dataset (get unique rows and reuse 15% of the total samples, customize ratio with the --reuse-ratio argument, customize output with --output):

ergo optimize-dataset /some/path/data.csv

License

ergo was made with โ™ฅ by the dev team and it is released under the GPL 3 license.

ergo's People

Contributors

evilsocket avatar nicochidt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ergo's Issues

Error while encode

Getting the below error when I try to run the encode command to classify my malicious and clean files:

:~$ ergo encode malware/ data/ --output dataset.csv
[2019-05-29 03:41:35,405] (INFO) loading project /home/remnux/malware ...
[2019-05-29 03:41:35,410] (INFO) building model for training ...
[2019-05-29 03:41:35,449] (WARNING) From /usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
[2019-05-29 03:41:35,465] (WARNING) From /usr/local/lib/python3.4/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
[2019-05-29 03:41:35,509] (INFO) using auto labeling
[2019-05-29 03:41:35,510] (INFO) enumerating data/clean ...
[2019-05-29 03:41:35,510] (INFO) enumerating data/malicious ...
[2019-05-29 03:41:35,510] (INFO) labeling 0 files ...
[2019-05-29 03:41:35,510] (INFO) starting 8 workers for encoding
[2019-05-29 03:41:35,522] (INFO) encoding 0 inputs to dataset.csv ...

Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.4/dist-packages/ergo/actions/encode.py", line 86, in appender
on_progress(done, total)
File "/usr/local/lib/python3.4/dist-packages/ergo/actions/encode.py", line 70, in on_progress
perc = done / total
ZeroDivisionError: division by zero

Turning off logging

Is there a way to turn off logging? It takes a considerable amount of disk space.

Ergo command doesn't print anything

When i type ergo no error and no output is return, the exit code remain 0, i have try to put a python debugger line inside the ergo script installed under bin, but the debugger doesn't start.

I use Python 3.7.3 installed with pyenv, i think the problem is on pyenv side but i have posted this issue here because is the first time i have this problem, other packages are not effected i can execute other packages binary regularly.

Parsing issue in ergo core

Hello,
Seems like there's a parsing issue in serialize_classification_report (utils.py)
When (like in my case) the accuracy has no value for precision and recall, it leads to crash when trying to serialize to json. The text file is properly displayed.

I don't think counting spaces is a good idea, however it could be a temporary fix.

image

Convert model to tflite

Hello,
I'm not sure it's an issue, probably just a question.
I'm trying to convert my model to tflite format to be used on clients (still the one from pe-av)

I've converted to Tensoflow using ergo to-tf
Then, I'm using tflite_convert (with the input/output arrays gotten from a script found here and there, extracted from the pb file) but I have an error:

image

I'm wondering, is that TF .pb format something compatible with tflite ?
How do I know for sure that I'm using the correct input/output arrays, is there a way to extract that from ergo model ?

Error while train the model

Apologies to bother again. Using the provided dataset from your blog, but getting the below error-stack. What am I doing wrong?

(ergoproject) remnux@remnux:~$ ergo train malware --dataset dataset.csv
[2019-05-30 09:48:29,475] (INFO) loading project /home/remnux/malware ...
[2019-05-30 09:48:29,476] (INFO) building model for training ...
[2019-05-30 09:48:29,496] (WARNING) From /home/remnux/ergoproject/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
[2019-05-30 09:48:29,507] (WARNING) From /home/remnux/ergoproject/lib/python3.4/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
[2019-05-30 09:48:29,550] (INFO) preparing data from dataset.csv ...
[2019-05-30 09:48:45,318] (INFO) data shape: (199970, 487)
[2019-05-30 09:48:45,985] (CRITICAL) 'str' object has no attribute 'shape'
[2019-05-30 09:48:45,985] (ERROR)

Traceback (most recent call last):
File "/home/remnux/ergoproject/bin/ergo", line 81, in main
ACTIONS[action].cb(argc - 2, sys.argv[2:])
File "/home/remnux/ergoproject/lib/python3.4/site-packages/ergo/actions/train.py", line 62, in action_train
prj.prepare(args.dataset, args.test, args.validation, not args.no_shuffle)
File "/home/remnux/ergoproject/lib/python3.4/site-packages/ergo/project.py", line 143, in prepare
return self.dataset.source(data, p_test, p_val, shuffle)
File "/home/remnux/ergoproject/lib/python3.4/site-packages/ergo/dataset.py", line 113, in source
log.info("detected non scalar input: %s", x.shape)
AttributeError: 'str' object has no attribute 'shape'

ModuleNotFoundError: No module named 'keras.utils.training_utils'`

tried running

  1. ergo help

and

  1. ergo example

and the following error occurs:
Traceback (most recent call last): File "/home/hg/anaconda3/bin/ergo", line 22, in <module> from ergo.actions.create import action_create File "/home/hg/anaconda3/lib/python3.7/site-packages/ergo/actions/create.py", line 6, in <module> from ergo.project import Project File "/home/hg/anaconda3/lib/python3.7/site-packages/ergo/project.py", line 13, in <module> from keras.utils.training_utils import multi_gpu_model ModuleNotFoundError: No module named 'keras.utils.training_utils'

any clues?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.