Coder Social home page Coder Social logo

sonia's Introduction

SONIA

This Package is not actively mantained, please refer to the most updated SoNNia package which includes sonia models as well: https://github.com/statbiophys/soNNia

SONIA is a python 3.6/2.7 software developed to infer selection pressures on features of amino acid CDR3 sequences. The inference is based on maximizing the likelihood of observing a selected data sample given a representative pre-selected sample. This method was first used in Elhanati et al (2014) to study thymic selection. For this purpose, the pre-selected sample can be generated internally using the OLGA software package, but SONIA allows it also to be supplied externally, in the same way the data sample is provided.

SONIA takes as input TCR CDR3 amino acid sequences, with or without per sequence lists of possible V and J genes suspected to be used in the recombination process for this sequence. Its output is selection factors for each amino acid ,(relative) position , CDR3 length combinations, and also for each V and J gene choice. These selection factors can be used to calculate sequence level selection factors which indicate how more or less represented this sequence would be in the selected pool as compared to the the pre-selected pool. These in turn could be used to calculate the probability to observe any sequence after selection and sample from the selected repertoire.

Version

Latest released version: 0.2.2

Installation

SONIA is a python 2.7/3.6 software. It is available on PyPI and can be downloaded and installed through pip:

pip install sonia.

SONIA is also available on GitHub.

Sometimes pip fails to install the dependencies correctly. Thus, if you get any error try first to install the dependencies separately:

pip install tensorflow
pip install matplotlib
pip install olga
pip install sonia 

For mac user on new metal devices, make sure to install additional dependencies. Currently, the configuration tensorflow-macos==2.9 and tensorflow-metal==0.5.0 should work.

References

  1. Sethna Z, Isacchini G, Dupic T, Mora T, Walczak AM, Elhanati Y, Population variability in the generation and thymic selection of T-cell repertoires, (2020) bioRxiv, https://doi.org/10.1101/2020.01.08.899682
  2. Isacchini G, Sethna Z, Elhanati Y ,Nourmohammad A, Mora T, Walczak AM, Generative models of T-cell receptor sequences, (2020) Phys. Rev. E 101, 062414, https://journals.aps.org/pre/abstract/10.1103/PhysRevE.101.062414
  3. Elhanati Y, Murugan A , Callan CGJ , Mora T , Walczak AM, Quantifying selection in immune receptor repertoires, PNAS July 8, 2014 111 (27) 9875-9880, https://doi.org/10.1073/pnas.1409572111

Documentation

Extensive documentation can be found here

Contact

Any issues or questions should be addressed to us.

License

Free use of SONIA is granted under the terms of the GNU General Public License version 3 (GPLv3).

sonia's People

Contributors

giulioisac avatar yuvalel avatar sethnaz avatar camacesco avatar

Stargazers

 avatar  avatar Nikita Syzrantsev avatar  avatar  avatar Nan Peng avatar  avatar Assya Trofimov avatar  avatar Yingcheng Wu avatar Jie Zhu avatar wynn burke avatar Lilly Wollman avatar  avatar Zachary Montague avatar  avatar slp avatar Binbin Xia avatar  avatar Cameron Smith avatar  avatar Mikhail Shugay avatar  avatar

Watchers

James Cloos avatar Aurélien Béliard avatar  avatar

sonia's Issues

Main task list

  • Stream code of update_model
  • Save/Load to include features (Binary?)

Sequence generation failing on Windows machine

Hi, I was exploring your modules for sequence generation. I stumbled across one issue appearing in sequence_generation.py and sonia.py scripts. I was getting below error:

ValueError: high is out of bounds for int32

for the part of code that is generating the seeds values:

seeds=np.random.randint(2**32-1,size=num_gen_seqs)

Since no dtype is provided for np.random.randint method, numpy is using a default int, which is platform dependant. For Windows64 machines this will be int32 (you can read about it here: https://stackoverflow.com/questions/36278590/numpy-array-dtype-is-coming-as-int32-by-default-in-a-windows-10-64-bit-machine). Since it's a signed 32 bits integer, the 2**32-1 value is out of bounds. You can easily fix it by specifying dtype while generating seeds, for example:
seeds=np.random.randint(2**32-1,size=num_seqs,dtype=np.int64)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.