Coder Social home page Coder Social logo

imnet's Introduction

imNet: a Sequence Network Construction Toolkit

imNet is a software package for the generation and analysis of large-scale immunological and biological sequence networks. Used together with Apache Spark running on a computer cluster, imNet can be used to analyze network properties from samples of hundreds of thousands or even millions of sequences in a reasonable amount of time.

Installation

pip

The simplest way to install imnet is with pip:

$ pip install imnet

In addition to installing the imnet python library and its dependencies, this will also install the imnet-analyze script into your python bin directory.

From source

Make sure you have all the dependencies installed -- see Dependencies below.

Clone the repository and install:

$ git clone https://github.com/rokroskar/imnet.git
$ cd imnet
$ python setup.py install

If you make changes to the cython code, you will need cython and a usable C compiler.

Dependencies

You only need to worry about installing these by hand if you are installing from source or want to do development. If you just want to run imnet, installing it with pip (see above) should take care of the dependencies for you.

basic python libraries

These should all be installable via pip or conda:

  • click
  • findspark
  • python-Levenshtein
  • scipy
  • networkx
  • pandas
  • cython (optional)

Spark

If your goal is to analyze large samples (> 10k strings) then distributing the computation is strongly advised. imnet currently does this using the Apache Spark distributed computation framework. We won't go into the details of installing and running spark here; you can download it, and unpack the archive somewhere. At the very minimum, you need to set the SPARK_HOME environment variable to point to the directory where you unpacked spark

$ export SPARK_HOME=/path/to/spark

Running imnet with Apache Spark

If you are running spark on a cloud resource, please refer to the [official spark documentation] for instructions on how to start up a spark cluster. To allow imnet to run via spark you will need to provide the spark URL of the spark-master.

If your resource is an academic HPC (high-perfomance computing) cluster, we recommend that you use sparkhpc for managing spark clusters. sparkhpc greatly simplifies spawning and managing spark clusters.

Basic usage

Refer to the command-line help for usage:

$ imnet-analyze --help
Usage: imnet-analyze [OPTIONS] COMMAND [ARGS]...

Options:
  --spark-config TEXT         Spark configuration directory
  --spark-master TEXT         Spark master
  --kind [graph|degrees|all]  Which kind of output to produce
  --outdir TEXT               Output directory
  --min-ld INTEGER            Minimum Levenshtein distance
  --max-ld INTEGER            Maximum Levenshtein distance
  --sc-cutoff INTEGER         For number of strings below this, Spark will not
                              be used
  --spark / --no-spark        Whether to use Spark or not
  --help                      Show this message and exit.

Commands:
  benchmark  Run a series of benchmarks for graph and...
  directory  Process a directory of CDR3 string files
  file       Process an individual file with CDR3 strings...
  random     Run analysis on a randomly generated set of...

$ imnet-analyze random --help
Usage: imnet-analyze random [OPTIONS]

  Run analysis on a randomly generated set of strings for testing

Options:
  --nstrings INTEGER    Number of strings to generate
  --min-length INTEGER  minimum number of characters per string
  --max-length INTEGER  maximum number of characters per string
  --help                Show this message and exit.

For an example of using the imnet python library, have a look at the example notebook.

imnet's People

Contributors

enkelejdamiho avatar rokroskar avatar uweschmitt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.