Build fails except for python=3.4

Since #93 the build fails for each python version except for 3.4. The failure applies to apis unit-tests. This part of the code was not changed at all. All the tests pass locally.
The following possible causes should be checked:

dependencies versions. The dependencies versions should be replicated from the local environment.

v0.0.1

The first version to be at PyPI will be v0.0.1

pip package

version or commit

5f0b776

Proposed feature

pip package should be prepared.

adding LICENSE.txt.
adding requirements.txt - dependencies list for building the venv with pip but also reading it in the setup.py.
testing the venv built with pip and requirements.txt.
adding MANIFEST.in with list of files to exclude from the pip build, like testing-related files and inlcuding the non-python files which by default are ommited during the build.
adding setup.py - setting the actual build
adding .gitattributes for excluding unncessary files from the github release.

Complete unittests

commit

7d9ab00

Proposed change

The aim is to test as much as possible. Though the test coverage will increase as the legacy code will be finally removed, there are still methods with no tests at all.

At the moment, nosetests --with-coverage --cover-package prowler gives:

Name                      Stmts   Miss  Cover
---------------------------------------------
prowler.py                    4      0   100%
prowler/apis.py             126     58    54%
prowler/databases.py        186     72    61%
prowler/errors.py             8      0   100%
prowler/genome.py           198    183     8%
prowler/interactions.py     134    124     7%
prowler/network.py           34     22    35%
prowler/profiles.py          64     11    83%
prowler/stats.py            367    278    24%
prowler/templater.py         26     17    35%
prowler/utils.py             45     33    27%
---------------------------------------------
TOTAL                      1192    798    33%
----------------------------------------------------------------------
Ran 28 tests in 1.830s

OK

pepy badge

The PePy badge showing the number of downloads should be added to the project website.

permute_profiles full output option

7ad31e7

Proposed feature

Returning just PSS bins is sometimes too little. When permuting small dataframes, there is no reason for returning just that.

One option is to make an arg for returning full dataframe each iteration.
Another one is to let the user pass a functions that would be applied to permuted dataframe before returning the final value.

species selectors opts

bc0dba2

Proposed change

An equivalent of any should be incorporated into the species selectors.

Update doc

commit

de66cb4

Proposed change

There is a bunch doc that referer to the old code, eg doc from databases.SGA1 contains info about non-existing Ortho_Interactions.interact_df that was the current class precursor.

project's page update

172131a

Proposed change

Project's page still holds the code examples before closing #58 things to do:

Update content.
Change theme to cayman.

permutation test

2912077

Proposed change

Stats._permute_profiles should be clean as drop_dups is not needed and uses .size instead of __len__.
multiprocessing in Stats.permute_profiles should finally be functional.

easier databases.parse_organism_info

ee6c460

Proposed change

databases.parse_organism_info should have a way of usage not requiring any awareness of apis.

contribution guidelines

commit

5f0b776

Proposed feature

Contribution guidelines should be prepared.

Remove old code

version or commit

5f0b776

Proposed change

There is quite a lot of old code pieces, that was rewritten and should be removed.

Phylogenetic selection

commit

5f0b776

Proposed feature

Selecting which organism should have a positive or negative sign in the profiles of interest.
Selecting which organisms should be accounted when calculating the PSS.

It might mean that calculating PSS should be moved to Stats

Allow empty Stats.init

2912077

Proposed change

For a convenient way of using Stats - as it uses attribs and returns just as well - it should initialize without any args for __init__ passed.

simple project website

a3e4746

Proposed feature

The prowler project should have its small, simple landing page at github pages.

Make the network dataframes as small as possible

846797a

Proposed change

All the columns in the KEGG Orthology dataframe beside ORF_ID and ENTRY seem redundant. Not parsing them should be a default action. Stripping down the final interactions dataframe reduces the time of each permutation (during the brute-force permutation test) twice.

Highest level functions in init.py

e7525d8

Proposed feature

High-level functions should be brought to the __init__.py file. For instance, one function should be used for getting the profiles (without databases.KEGG initialization). That would make the module easier to use and make the import shorter.

The functions can be:

profilize_organism - get ORF-Profile Dataframe.
read_sga - read an existing Costanzo sga, v1 or v2 (or v3 in the future)
merge - merge sga with profilized organism. Just pandas.merge with predefined names siffixes and merge_on.
calculate_pss - easily calculate pss without need of using apply from pandas

Each function should evaluate whether the columns are properly named.

setup.py with scripts

Proposed change

The setup.py file should detect the prwlr CLI script.

wiki documentation

commit

5f0b776

Proposed feature

Wiki pages should be prepared.

demo jupyter notebook

commit

5f0b776

Proposed feature

jupyter notebook with demo of the prowler capabilities should be prepared on binder and presented alongside wiki.

Travis build fails on venv

version or commit

commit b706e3e

build 149

Description

Travis build fails when creating conda env.

Steps to reproduce

Run the build.

Expected behavior

The virtual env should be created from the yaml files.

Actual behavior

The build fails when resolving the libiconv package.

vectorized calculate_pss

976ee33

Proposed change

The top-level function calculate pss should utilize numpy.vectorize. It speeds things up, despite what is the official purpose of numpy.vectorize (according to the docs, it is mainly convenience), probably due to reduction of iteration overhead.

minimalistic conda venv files

172131a

Proposed change

The conda venv yml files should be prepared in a minimalistic way, no versions specified if not needed.

Convert to version-agnostic or python3

2912077

Proposed change

Being stuck with python2.7 is simply a shame. A version-agnostic code would be best as python2.7 is in use, though if it turns out to be too difficult to maintain - python3 should be the way to go.

project page code hihglight and indent

7ad31e7

Proposed change

Syntax hihglighting and different indent should be done for readability.

ProfInt returning rather than holding

2912077

Proposed change

ProfInt class is small. There is no need for it to hold an attribute instead of returning the result.

Repo structure

Proposed change

bin directory must be added parallel to the module directory. It can contain the place-holder argparse.

conda package

commit

5f0b776

Proposed feature

conda package should be prepared

Review and rewrite network

commit

4ba0ec8

Proposed change

Code restructuring might be needed for network just as it was with genome and interactions

Double testing

The command calling the unittest in the .travis.yml file is repeated.

profiles.Profile comparison

Description

prowler.profiles.Profile are not equal even if created from the same data source.

Steps to reproduce

p1 = prwl.profiles.Profile(list('abcde'), list('abc'))
p2 = prwl.profiles.Profile(list('abcde'), list('abc'))

Expected behavior

p1 == p2

True

Actual behavior

p1 == p2

False

Conditional network-based tests skipping

The network-based test that demand the external host availability should be skipped unless the host are pingable. Now, the build fails since the Costanzo's supplement sites are temporarily down.

Amend KEGG database parser

commit

7d9ab00

Proposed change

Parsing KEGG database works for KEGG Orthology but is still not very reliable. Does not really work for the other KEGG database or works poorly. The main problem is probably a proper way of handling multiline in regex.

Project rename

Prowler name is already taken. The whole project must be renamed.

Switching to:

prowlr

it is free at PyPI, Anaconda Cloud, github.

Before the rename is done, all the pickled files in test_data must be replaced with the text ones so that they do not depend on the module name.

What has been done so far:

prwlr.save_network

The highest-level convenience method prwlr.save_network should be implemented in prwlr.core so that whole network can be saved without losing prwlr.profiles.Profile objects.

Costanzo_API for SGA_v2

version or commit

7d9ab00

Proposed feature

There should be a feature of downloading the Costanzo's API v2 just as there is for the v1

Accidental suffix hard-coding

Description

The query suffix is hard-coded to _Q. It does not manifest until the module-level settings are not changed, nevertheless, it is a bad typo.

prwlr/prwlr/databases.py

Line 66 in f62feb8

PROF_Q = "PROF_Q".format(QUERY_SUF)

Selector interactions types

9d688e5

Proposed change

prowler.stats.Selector should hold proper genetic interactions names (positive DMF is not one) and there should be more of them.

prwlr.read_network

The highest-level convenience method prwlr.read_network should be implemented in prwlr.core so that whole network can be re-read without losingprwlr.profiles.Profile objects.

multiprocessing load optimization

79741b5

Proposed change

multiprocessing in prowler.stats.Stats.permute_profiles works but the load are not sufficiently distributed on large machines.

Reading/saving iterable as string

prwlr.profiles.Profile.from_string method should be implemented.

An iterable cannot be saved to a flat file but can be saved/read to str with join/split methods. Both are present in the Python's standard library and pandas.Series.str

apis.get_KOs_db_X_ref KeyError

884e84c

Description

apis.get_KOs_db_X_ref does not work when used for live download.

Steps to reproduce

Call parse_organism_info that uses apis.get_KOs_db_X_ref without IDs, X_ref or KOs or set to

Expected behavior

It should download temporary files and convert them to pandas.DataFrames, then merge them, drop whatever should be dropped and return a proper pandas.DataFrame.

Actual behavior

It throws a KeyError:

    208         """
    209         def f(i):
--> 210             print("{i} ".format(), flush=True, end='\r')
    211             res = rq.get('{}/{}/{}/{}'.format(
    212             self.home,

KeyError: 'i'

Profiles-distance measure other than PSS

commit

5f0b776

Proposed feature

The distance between phylogenetic profiles should probably be measured differently than just with pair-wise PSS. See this paper.
Beside built-in ways of doing this, Profiles.profile.calculate_pss should take the function used for distance calculation as an argument.

prwlr.read_profiles

The highest-level convenience method prwlr.read_profiles should be implemented in prwlr.core analogous to prwlr.core.read_sga. There should be also be prwlr.save_profiles avilable.

Amend doc

Amend the __doc__ strings. These need to good enough not only for the CLI but also for publishing as HTML.

calculate_enrichment bad attrib

7ad31e7

Description

Cannot run stats.calculate_enrichment function.

Steps to reproduce

Pass two dataframes to stats.calculate_enrichment as described by the function __doc__

Expected behavior

stats.calculate_enrichment should return dataframe with enrichment scores.

Actual behavior

stats.calculate_enrichment gives
AttributeError: ("type object 'Columns' has no attribute '_score'", 'occurred at index 0')

KEGG modules X-ref

976ee33

Proposed feature

The KEGG modules should be handled. Probably as another prowler.apis.get_db_X_ref function. It can be fetched with eg http://rest.kegg.jp/link/md/K02030 that returns straight CSV, not twisted database entry.

Remove or merge duplicated profiles

475f16b

Proposed change

Duplicated phylogenetic profiles should be either dropped or merged.

Description

For some ORFs KEGG Orthology gives more than one KO group ID. It produces more than one phylogenetic profile.

Steps to reproduce

import prowler as prwl

api = prwl.apis.KEGG_API()
api.get_organisms_ids('tmp_org_ids.csv')
api.get_org_db_X_ref('Saccharomyces cerevisiae', 'orthology', out_file_name='tmp_org_KO.csv')

Expected behavior

api.org_db_X_ref_df[api.org_db_X_ref_df.duplicated(subset=['ORF_ID'])]

ORF_ID	KEGG_ID

Actual behavior

api.org_db_X_ref_df[api.org_db_X_ref_df.duplicated(subset=['ORF_ID'])]

ORF_ID	KEGG_ID
YBR019C	K01785
RDN37-1	K01982
RDN37-2	K01982

permute_profiles has no opt for PSS

7ad31e7

Description

prowler.stats.permute_profiles does not accept the recent feature of PSS calculation with different distance measures. It should.

dizak / prwlr Goto Github PK

prwlr's Issues

version or commit

Proposed feature

commit

Proposed change

Proposed feature

Proposed change

commit

Proposed change

Proposed change

Proposed change

Proposed change

commit

Proposed feature

version or commit

Proposed change

commit

Proposed feature

Proposed change

Proposed feature

Proposed change

Proposed feature

Proposed change

commit

Proposed feature

commit

Proposed feature

version or commit

Description

Steps to reproduce

Expected behavior

Actual behavior

Proposed change

Proposed change

Proposed change

Proposed change

Proposed change

Proposed change

commit

Proposed feature

commit

Proposed change

Description

Steps to reproduce

Expected behavior

Actual behavior

commit

Proposed change

version or commit

Proposed feature

Description

Proposed change

Proposed change

Description

Steps to reproduce

Expected behavior

Actual behavior

commit

Proposed feature

Description

Steps to reproduce

Expected behavior

Actual behavior

Proposed feature

Proposed change

Description

Steps to reproduce

Expected behavior

Actual behavior

Description

Recommend Projects

Recommend Topics

Recommend Org