Coder Social home page Coder Social logo

bjornbredesen / gnocis Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 63.68 MB

Gnocis is a system for the interactive and reproductible analysis and modelling of cis-regulatory DNA sequences in Python 3

License: MIT License

Makefile 0.02% Python 2.83% Jupyter Notebook 90.31% Cython 6.84%
bioinformatics library python regulatory-elements machine-learning analysis neural-networks-and-deep-learning cres cython prediction

gnocis's Introduction

Gnocis

Gnocis

Bjørn André Bredesen-Aa, 2018-

Stability


About

Large parts of complex genomes of multi-cellular organisms are non-coding. Cis-regulatory elements (CREs) are non-coding sequences that establish or modify gene transcription by multiple mechanisms. Multiple classes of CREs have been identified, including promoters, enhancers, silencers, insulators and Polycomb/Trithorax Response Elements. CREs can be identified experimentally or by means of in silico prediction. Experimental identification of CREs can depend on the cells that are used. Genome-wide in silico prediction, on the other hand, can potentially comprehensively predict CREs in a genome. In order to use machine learning for CRE prediction, a variety of functionality is required. A variety of packages exist for Python 3 for machine learning and sequence analysis, but successfully combining them requires the implementation of interfacing between them. Ensuring that the solution is efficient is important for large genomes, but can be challenging for end-users.

Gnocis is a system in Python 3 for the interactive and reproducible analysis and modelling of CRE DNA sequences. A broad suite of tools is implemented for data preparation, feature set definition, model formulation, training, cross-validation and genome-wide prediction. Gnocis employs Cython and a variety of techniques in order to optimally implement the glue necessary in order to apply machine learning for CRE analysis and prediction.


Installing

The recommended way to install Gnocis is through the PyPI package manager. In order to install via the PyPI package manager, open a terminal and execute:

pip install gnocis

Alternatively, Gnocis can be built from source. To build a wheel and install it, run:

make wheel
pip install dist/*.whl

Finally, Gnocis can be used by building from source and including the entire gnocis directory in the source tree. In order to do so, run

make all

Installing dependencies for tutorial

sudo apt-get install python3-sphinx
sudo pip3 install notebook pandas numpy matplotlib cupy sklearn tensorflow

Documentation

For the complete manual, see: https://bjornbredesen.github.io/gnocis/

For an in-depth tutorial, see the Jupyter Notebooks in the tutorial/ folder.


Features

  • DNA sequence handling
    • File format support - Loading and streaming
      • FASTA
      • 2bit
    • File format support - Saving
      • FASTA
    • Operations
      • Printing
      • Sliding window extraction
      • Reverse complement generation
  • Sequence region handling
    • File format support - Loading and saving
      • GFF
      • BED
      • Coordinate lists (chromosome:start..end)
    • Operations
      • Overlap acquisition
      • Non-overlap acquisition
      • Merged set generation
      • Exclusion set generation
      • Sequence region extraction
  • Modelling
    • Generative DNA sequence models, with training and sequence generation
      • I.i.d.
      • N'th order Markov chains
    • Confusion matrices
      • Generation from model statistics
      • Printing
      • Receiver Operating Characteristic curve generation
      • Precision Recall Curve generation
      • Area Under the Curve calculation
    • Feature models
      • Log-odds
      • Dummy
      • Support Vector Machines (via sklearn)
      • Random Forest (via sklearn)
    • Features
      • k-mer spectrum
      • Motif occurrence spectrum
      • Motif pair occurrence spectrum
  • Motifs
    • Types
      • IUPAC nucleotide motifs
      • Position Weight Matrices
      • k-mer spectra
  • Feature networks
    • Directed acyclic graphs of features
    • Transformations of feature sets: filtering; concatenation; scaling; square; ...
    • Feature network nodes for constructing models
    • Application to sequences
  • Optionally integrates with established packages
    • Numpy – for integration with external methods
    • Pandas – for integration with external methods
    • Scikit-learn – for extended analyses and classic machine learning
    • TensorFlow – for neural networks
    • Jupyter Notebooks – for interactive and reproducible analysis and modelling
  • Easy to use
  • Objects are represented by classes, with human-readable descriptions
  • Optimized with Cython
  • ...

Requirements

  • Python 3.6, 3.7, 3.8, 3.9
  • Windows, MacOS or Linux
  • C++ compiler when installing on Linux
  • Optional: Cython – required only when building from source
  • Optional: sklearn – for SVM and RF modelling
  • Optional: CuPy and CUDA – for CUDA-optimized SVM
  • Optional: TensorFlow – for neural networks

Citing

If you use Gnocis in published research, please cite: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0274338.


License

MIT License

Copyright (c) 2018- Bjørn André Bredesen-Aa

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Logo

Logo: Copyright Bjørn André Bredesen-Aa

gnocis's People

Contributors

bjornbredesen avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

cmanco

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.