Coder Social home page Coder Social logo

dizak / prwlr Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 15.42 MB

Integrating genetic interactions networks and phylogenetic profiles.

Home Page: https://dizak.github.io/prwlr

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
genetic-interactions phylogenetic-profiles python

prwlr's Introduction

Build Status

prwlr

prwlr (profiles crawler) integrates Genetic Interactions and Phylogenetic Profiles. prwlr aimes to be self-contained (as far as possible) and easy-to-use python library.

prwlr's People

Contributors

dizak avatar

Stargazers

 avatar

Watchers

 avatar

prwlr's Issues

prwlr.read_network

The highest-level convenience method prwlr.read_network should be implemented in prwlr.core so that whole network can be re-read without losingprwlr.profiles.Profile objects.

Highest level functions in __init__.py

e7525d8

Proposed feature

High-level functions should be brought to the __init__.py file. For instance, one function should be used for getting the profiles (without databases.KEGG initialization). That would make the module easier to use and make the import shorter.

The functions can be:

  • profilize_organism - get ORF-Profile Dataframe.

  • read_sga - read an existing Costanzo sga, v1 or v2 (or v3 in the future)

  • merge - merge sga with profilized organism. Just pandas.merge with predefined names siffixes and merge_on.

  • calculate_pss - easily calculate pss without need of using apply from pandas

Each function should evaluate whether the columns are properly named.

Reading/saving iterable as string

prwlr.profiles.Profile.from_string method should be implemented.

An iterable cannot be saved to a flat file but can be saved/read to str with join/split methods. Both are present in the Python's standard library and pandas.Series.str

KEGG modules X-ref

976ee33

Proposed feature

The KEGG modules should be handled. Probably as another prowler.apis.get_db_X_ref function. It can be fetched with eg http://rest.kegg.jp/link/md/K02030 that returns straight CSV, not twisted database entry.

Costanzo_API for SGA_v2

version or commit

7d9ab00

Proposed feature

There should be a feature of downloading the Costanzo's API v2 just as there is for the v1

permute_profiles full output option

7ad31e7

Proposed feature

Returning just PSS bins is sometimes too little. When permuting small dataframes, there is no reason for returning just that.

  • One option is to make an arg for returning full dataframe each iteration.

  • Another one is to let the user pass a functions that would be applied to permuted dataframe before returning the final value.

profiles.Profile comparison

Description

prowler.profiles.Profile are not equal even if created from the same data source.

Steps to reproduce

p1 = prwl.profiles.Profile(list('abcde'), list('abc'))
p2 = prwl.profiles.Profile(list('abcde'), list('abc'))

Expected behavior

p1 == p2

True

Actual behavior

p1 == p2

False

Conditional network-based tests skipping

The network-based test that demand the external host availability should be skipped unless the host are pingable. Now, the build fails since the Costanzo's supplement sites are temporarily down.

Build fails except for python=3.4

Since #93 the build fails for each python version except for 3.4. The failure applies to apis unit-tests. This part of the code was not changed at all. All the tests pass locally.
The following possible causes should be checked:

  • dependencies versions. The dependencies versions should be replicated from the local environment.

apis.get_KOs_db_X_ref KeyError

884e84c

Description

apis.get_KOs_db_X_ref does not work when used for live download.

Steps to reproduce

  1. Call parse_organism_info that uses apis.get_KOs_db_X_ref without IDs, X_ref or KOs or set to

Expected behavior

It should download temporary files and convert them to pandas.DataFrames, then merge them, drop whatever should be dropped and return a proper pandas.DataFrame.

Actual behavior

It throws a KeyError:

    208         """
    209         def f(i):
--> 210             print("{i} ".format(), flush=True, end='\r')
    211             res = rq.get('{}/{}/{}/{}'.format(
    212             self.home,

KeyError: 'i'

pip package

version or commit

5f0b776

Proposed feature

pip package should be prepared.

  • adding LICENSE.txt.
  • adding requirements.txt - dependencies list for building the venv with pip but also reading it in the setup.py.
  • testing the venv built with pip and requirements.txt.
  • adding MANIFEST.in with list of files to exclude from the pip build, like testing-related files and inlcuding the non-python files which by default are ommited during the build.
  • adding setup.py - setting the actual build
  • adding .gitattributes for excluding unncessary files from the github release.

prwlr.save_network

The highest-level convenience method prwlr.save_network should be implemented in prwlr.core so that whole network can be saved without losing prwlr.profiles.Profile objects.

Double testing

The command calling the unittest in the .travis.yml file is repeated.

Allow empty Stats.__init__

2912077

Proposed change

For a convenient way of using Stats - as it uses attribs and returns just as well - it should initialize without any args for __init__ passed.

calculate_enrichment bad attrib

7ad31e7

Description

Cannot run stats.calculate_enrichment function.

Steps to reproduce

Pass two dataframes to stats.calculate_enrichment as described by the function __doc__

Expected behavior

stats.calculate_enrichment should return dataframe with enrichment scores.

Actual behavior

stats.calculate_enrichment gives
AttributeError: ("type object 'Columns' has no attribute '_score'", 'occurred at index 0')

Project rename

Prowler name is already taken. The whole project must be renamed.

Switching to:

prowlr

it is free at PyPI, Anaconda Cloud, github.

Before the rename is done, all the pickled files in test_data must be replaced with the text ones so that they do not depend on the module name.

What has been done so far:

  • AnyNetworkTests
  • ApisTests
  • Bioprocesses
  • Databases
  • SelectorTests
  • SGA1Tests
  • SGA2Tests
  • StatsTests

prwlr.read_profiles

The highest-level convenience method prwlr.read_profiles should be implemented in prwlr.core analogous to prwlr.core.read_sga. There should be also be prwlr.save_profiles avilable.

Travis build fails on venv

version or commit

commit b706e3e

build 149

Description

Travis build fails when creating conda env.

Steps to reproduce

Run the build.

Expected behavior

The virtual env should be created from the yaml files.

Actual behavior

The build fails when resolving the libiconv package.

pepy badge

The PePy badge showing the number of downloads should be added to the project website.

Remove old code

version or commit

5f0b776

Proposed change

There is quite a lot of old code pieces, that was rewritten and should be removed.

Selector interactions types

9d688e5

Proposed change

prowler.stats.Selector should hold proper genetic interactions names (positive DMF is not one) and there should be more of them.

Repo structure

Proposed change

bin directory must be added parallel to the module directory. It can contain the place-holder argparse.

Phylogenetic selection

commit

5f0b776

Proposed feature

  • Selecting which organism should have a positive or negative sign in the profiles of interest.
  • Selecting which organisms should be accounted when calculating the PSS.

It might mean that calculating PSS should be moved to Stats

Remove or merge duplicated profiles

475f16b

Proposed change

Duplicated phylogenetic profiles should be either dropped or merged.

Description

For some ORFs KEGG Orthology gives more than one KO group ID. It produces more than one phylogenetic profile.

Steps to reproduce

import prowler as prwl

api = prwl.apis.KEGG_API()
api.get_organisms_ids('tmp_org_ids.csv')
api.get_org_db_X_ref('Saccharomyces cerevisiae', 'orthology', out_file_name='tmp_org_KO.csv')

Expected behavior

api.org_db_X_ref_df[api.org_db_X_ref_df.duplicated(subset=['ORF_ID'])]
ORF_ID KEGG_ID

Actual behavior

api.org_db_X_ref_df[api.org_db_X_ref_df.duplicated(subset=['ORF_ID'])]
ORF_ID KEGG_ID
YBR019C K01785
RDN37-1 K01982
RDN37-2 K01982

Profiles-distance measure other than PSS

commit

5f0b776

Proposed feature

  • The distance between phylogenetic profiles should probably be measured differently than just with pair-wise PSS. See this paper.
  • Beside built-in ways of doing this, Profiles.profile.calculate_pss should take the function used for distance calculation as an argument.

vectorized calculate_pss

976ee33

Proposed change

The top-level function calculate pss should utilize numpy.vectorize. It speeds things up, despite what is the official purpose of numpy.vectorize (according to the docs, it is mainly convenience), probably due to reduction of iteration overhead.

Amend __doc__

Amend the __doc__ strings. These need to good enough not only for the CLI but also for publishing as HTML.

  • apis.py
  • core.py
  • databases
  • errors.py
  • network.py
  • profiles.py
  • stats.py
  • utils.py

Amend KEGG database parser

commit

7d9ab00

Proposed change

Parsing KEGG database works for KEGG Orthology but is still not very reliable. Does not really work for the other KEGG database or works poorly. The main problem is probably a proper way of handling multiline in regex.

Complete unittests

commit

7d9ab00

Proposed change

The aim is to test as much as possible. Though the test coverage will increase as the legacy code will be finally removed, there are still methods with no tests at all.

At the moment, nosetests --with-coverage --cover-package prowler gives:

Name                      Stmts   Miss  Cover
---------------------------------------------
prowler.py                    4      0   100%
prowler/apis.py             126     58    54%
prowler/databases.py        186     72    61%
prowler/errors.py             8      0   100%
prowler/genome.py           198    183     8%
prowler/interactions.py     134    124     7%
prowler/network.py           34     22    35%
prowler/profiles.py          64     11    83%
prowler/stats.py            367    278    24%
prowler/templater.py         26     17    35%
prowler/utils.py             45     33    27%
---------------------------------------------
TOTAL                      1192    798    33%
----------------------------------------------------------------------
Ran 28 tests in 1.830s

OK

Convert to version-agnostic or python3

2912077

Proposed change

Being stuck with python2.7 is simply a shame. A version-agnostic code would be best as python2.7 is in use, though if it turns out to be too difficult to maintain - python3 should be the way to go.

Make the network dataframes as small as possible

846797a

Proposed change

All the columns in the KEGG Orthology dataframe beside ORF_ID and ENTRY seem redundant. Not parsing them should be a default action. Stripping down the final interactions dataframe reduces the time of each permutation (during the brute-force permutation test) twice.

v0.0.1

The first version to be at PyPI will be v0.0.1

Update __doc__

commit

de66cb4

Proposed change

There is a bunch doc that referer to the old code, eg doc from databases.SGA1 contains info about non-existing Ortho_Interactions.interact_df that was the current class precursor.

permutation test

2912077

Proposed change

  • Stats._permute_profiles should be clean as drop_dups is not needed and uses .size instead of __len__.
  • multiprocessing in Stats.permute_profiles should finally be functional.

project's page update

172131a

Proposed change

Project's page still holds the code examples before closing #58 things to do:

  • Update content.

  • Change theme to cayman.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.