Coder Social home page Coder Social logo

cyber-security-ml's Introduction

data_hacking

Welcome to the Click Security Data Hacking Project

"Hacking in the sense of deconstructing an idea, hardware, anything and getting it to do something it wasn’t intended or to better understand how something works." (BSides CFP)

So hacking here means we want to quickly deconstruct data, understand what we've got and how to best utilize it for the problem at hand.

The primary motivation for these exercises is to explore the nexus of IPython, Pandas and Scikit Learn on security data of various kinds. The exercises will often intentionally show common missteps, warts in the data, paths that didn't work out that well and results that could definitely be improved upon. In general we're trying to capture what worked and what didn't, not only is that more realistic but often much more informative to the reader. :)

Python Modules Used:

  • IPython: Architecture for interactive computing and presentation
  • Pandas: Python Data Analysis Library
  • Scikit Learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
  • Matplotlib: Python 2D plotting library

Exercises:

Friendly Notebooks

#####Setup:

  • Required packages:

    • Brew/apt-get
      • graphviz, freetype, zmq
    • Python
      • ipython, pygraphviz, pandas, matplotlib, networkx, pyzmq, jinja2, scipy, patsy, statsmodels, pefile, macholib
  • Some of the exercises use packages from the data_hacking repository, to install those packages into your python site packages:

     %> sudo python setup.py install
  
  • To uninstall:
     %> sudo pip uninstall data_hacking
  

Install IPython:

There's quite a bit of google results for this, we actually have mixed feelings about the IPython install instructions on the IPython page. The directions work but it directs you to download and install Anaconda or the free edition of Enthought Canopy. Both of these are prepackaged python distributions with a bunch of stuff like Numpy, Scipy, IPython, Matplotlib, Pandas, ... occasionally these will have a hitch and then you might be a bit SOL because StackOverflow is going to say 'WTF are those things? Just do '$pip install blah' or '$brew install blah'.

So we recommend you be brave and do it the normal way... in particular this guy seems to have a pretty good write up for Mac installs:

Running the Notebooks:

Most of the notebooks will have relative paths to some resources, data files or images. In general the easiest way we found to run ipython on the notebooks is to change into that project directory and run ipython with this alias (put in your .bashrc or whatever):

alias ipython='ipython notebook --FileNotebookManager.notebook_dir=`pwd`'
$ cd data_hacking/fun_with_syslog
$ ipython (as aliased above)

Click Logo

cyber-security-ml's People

Contributors

brifordwylie avatar sooshie avatar ogrisel avatar xiuxiuxiaodi avatar

Stargazers

Irwin avatar  avatar

Watchers

James Cloos avatar

Forkers

raystyle

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.