cogent3 / c3dev Goto Github PK

cogent3 developer tools

Dockerfile 75.61% Shell 24.39%

c3dev's Introduction

cogent3 is a mature python library for analysis of genomic sequence data. We endeavour to provide a first-class experience within Jupyter notebooks, but the algorithms also support parallel execution on compute systems with 1000's of processors.

📣 Feature Announcements 📣

Faster sequence format parsers 💨

We have faster implementations of the parsers for Fasta and GenBank sequence formats. These are used by our standard loading mechanisms. If you just want to get the contents of files in those formats as standard Python types, use cogent3.parser.fasta.iter_fasta_records() or cogent3.parser.genbank.iter_genbank_records().

Supporting third-party apps as plugins 🔌

Cogent3 now provides support for plugins! Third-party developers can deploy their code as cogent3 apps with just a few lines. See the demo project.

Post any questions you have in cogent3 discussions.

The developers of Cogent3 and IQ-TREE2 announce piqtree2 🎉

Speaking of plugins, our first major third-party plugin is piqtree2. Try it out and give us feedback.

New core data types improve efficiency and flexibility

The cogent3 development team 👾 are hard at work modernising the core internals 💪🛠.

In this release the Sequence, SequenceCollection, MolType, GeneticCode, and alphabet classes have all been rewritten from scratch with an eye to simplifying the code while improving its flexibility and performance. (We're working on alignments for the next release.)

The "new-style" objects enhance performance by supporting the access of the underlying data in various formats (i.e. numpy arrays, bytes or strings). You can create "new-style" objects by setting the new_type=True argument in top-level functions (make_seq, load_seq, make_unaligned_seqs, get_moltype, get_code). These are not yet the default and are not fully integrated into the existing code. They can also differ in their API relative to the classes they replace.

We encourage experimentation in cases where integration with old objects is NOT required and look forward to any feedback!

Who is it for?

Anyone who wants to analyse sequence divergence using robust statistical models

cogent3 is unique in providing numerous non-stationary Markov models for modelling sequence evolution, including codon models. cogent3 also includes an extensive collection of time-reversible models (again including novel codon models). We have done more than just invent these new methods, we have established the most robust algorithms for their implementation and their suitability for real data. Additionally, there are novel signal processing methods focussed on statistical estimation of integer period signals.

🎬 Demo non-reversible substitution model

cogent3-demo-composable.mp4

Anyone who wants to undertake exploratory genomic data analysis

Beyond our novel methods, cogent3 provides an extensive suite of capabilities for manipulating and analysing sequence data. You can manipulate sequences by their annotations, e.g.

🎬 Demo sequences with annotations

cogent3-demo-new-ann.mp4

Plus, you can read standard tabular and biological data formats, perform multiple sequence alignment using any cogent3 substitution models, phylogenetic reconstruction and tree manipulation, manipulation of tabular data, visualisation of phylogenies and much more.

Beginner friendly approach to genomic data analysis

Our cogent3.app module provides a very different approach to using the library capabilities. Expertise in structural programming concepts is not essential!

🎬 Demo friendly coding

cogent3-demo-composable.mp4

Installation?

$ pip install cogent3

Install `extra` -- adds visualisation support

The extra group includes python libraries required for visualisation, i.e. plotly, kaleido, psutil and pandas.

$ pip install "cogent3[extra]"

Install `dev` -- adds `cogent3` development related libraries

The dev group includes python libraries required for development of cogent3.

$ pip install "cogent3[dev]"

Install the development version

$ pip install git+https://github.com/cogent3/cogent3.git@develop#egg=cogent3

Project Information

cogent3 is released under the BSD-3 license, documentation is at cogent3.org, while cogent3 code is on GitHub. If you would like to contribute (and we hope you do!), we have created a companion c3dev GitHub repo which provides details on how to contribute and some useful tools for doing so.

Project History

cogent3 is a descendant of PyCogent. While there is much in common with PyCogent, the amount of change has been substantial, motivating the name change to cogent3. This name has been chosen because cogent was always the import name (dating back to PyEvolve in 2004) and it's Python 3 only.

Given this history, we are grateful to the multitude of individuals who have made contributions over the years. Many of these contributors were also co-authors on the original PyEvolve and PyCogent publications. Individual contributions can be seen by using "view git blame" on individual lines of code on GitHub , through git log in the terminal, and more recently the changelog.

Compared to PyCogent version 1.9, there has been a massive amount of changes. These include integration of many of the new developments on algorithms and modelling published by the Huttley lab over the last decade. We have also modernised our dependencies. For example, we now use plotly for visualisation, tqdm for progress bar display, concurrent.futures and mpi4py.futures for parallel process execution, nox and pytest for unit testing.

Funding

Cogent3 has received funding support from the Australian National University and an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

c3dev's People

Contributors

Stargazers

Watchers

Forkers

thomas-la christopherbradley u6052029 jamesmartini wjjmjh stephenrogers1 kiratalreja3 khiron fredjaya

c3dev's Issues

Development docker-compose.yml and Dockerfile

Run in a local path containing a clone of your fork of the cogent3 repo

python version pinned
pip install cogent3 requirements.txt
copy cogent3 clone
add .git pre-commit hooks
black
isort
prod node with a bash entry point
test node with a pytest entry point
debug node using ptvsd with a pytest entry point
ensure security keys allow the project to be checked in from the container

create azure pipeline account for project

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

Gavin to make project level

Direct new contributors to wiki page

The goal here is to make the wiki page the first, go-to resource for new contributors. This includes:

From the c3dev readme, move the conda/non-conda install instructions to an appropriate place in c3dev wiki
Updating the c3dev readme to explicitly direct contributors to start with the c3dev wiki
~~Clarifying that cogent3 should be first installed (for pytest to work)~~ included in the readme, but not wiki
Maybe update the cogent3 readme as well

add a command line script that identifies function usage cases

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

We want a basic script that, given a path to a cogent3 python file, looks for the functions in that file and then all places where it's used in <PyCogent3>/cogent3 and <PyCogent3>/tests.

There may already be a python package for this. If not, use click to create a simple command line interface. See cleanup.py and it's corresponding hook in setup.py.

how to contribute wiki page

instructions introducing the style, clone, PR dance.
point to ticket label for "good first issue"

Update unit testing guidelines

Writing tests still under unittest for PyCogent, instead of pytest.

Will need either discussion with others to outline the testing guidelines, or refer to pytest best practices.

DEV: identify a tool for selectively building dependency graph

Use case here is we have some modules that need eventually to be removed as they duplicate functionality in other libraries on which we already depend. However, given the size of those modules it is rather daunting to approach the problem.

We need the ability to create a dependency graph that is conditioned on a specific module so we can understand where it is being used.

The specific use case is cogent3.maths.stats.special.

Investigating plugin architectures

Below I list some of @Nick-Foto's work exploring two different libraries

I note that I have not come across any actual open source projects using stevedore, but I know of several using pluggy. I also know pluggy has a very significant community (pytest) behind it.

Setevedore

Blog post describing Stevedore

Pluggy

Conference talk on pluggy

https://github.com/Nick-Foto/plugin_sample

The Datasette project uses it.

Provide how-to's for each type of issue?

Currently, there is a dedicated how-to for working on deprecating code.

There are also some "tackling issues" pointers that are specific to i.e. adding a new feature.

I think the [How to contribute code] page should be reserved for high-level advice that is shared across all issue types.

Implement scriv developer tool for changelog management

Here's the scriv repo. The project dev's will need a brief rundown on how this works and then setting it up in the cogent3 repo.

Restructure wiki home page

Rewrite and restructure the wiki home page as a succinct how-to guide.

Move reference material, such as style guides, to a separate dedicated page.

Use dropdowns #37

conda recipe for developer environment

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

this should also create a ipython kernel

I'm attaching to this ticket a very old demo of a conda recipe for the legacy version of pycogent. It needs to be updated to include the current dependencies, dropping old ones etc..

implementing precommit hooks as a github action

We want to avoid code style issues when doing reviews of PRs.

We need black and isort linters to start with. Setup with a GitHub action run on each PR.

Ideally, these would be applied by developers before the push to their own GitHub fork of cogent3.

Question: If these are setup on cogent3#develop, do forks also run them?

add `line_profiler` to the environment file

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

See here, used for looking at performance line-by-line

Remove all references to using mercurial from wiki

DEV: add check for test methods

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

Need to add, to included_tests.py, a function for checking whether:

methods that begin with test_ are correctly indented
for methods that are crippled (e.g. begin with est_)
for commented out test methods

Write guidelines for documentation contributors

Address the remarks made in this cogent3 issue in a wiki entry.

Clarify dev environment and write docs

The proposed development environments include using:

docker
flit (#38 (comment))

Should both approaches be supported and documented, or just one? Maybe a question for @khiron?

Once confirmed, can proceed with writing dev install documentation (i.e. remove current conda documentation)

Clarify nox

To make it more accessible for new contributors, add some additional nox information when installing cogent3 for development.

Suggestions:

Add one sentence on the purpose of nox
One sentence on the suggested installation method (pip? apt?)
Update the code chunk to (up until this step in the wiki, you're still in repos/c3dev) :

cd ~/repos/cogent3
nox -s test-3.8

Update the python versions in code chunks #35
Provide an example of a nox test passing

cogent3-demo-composable.mp4

write script that reformats docstrings to numpy style

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

BEFORE writing anything, check whether somebody has already written a docstring reformatter that can be used to solve this issue. (If so, make a comment below.)

Many PyCogent3 docstrings have the format

function/method summary

Arguments:
   - param: description

That format needs to be refactored to the

function/method summary

Parameters
-------------
param
    description

style (line length <= 80 characters). (The complete numpy style reference is much more extensive than what we require I expect.

NOTE: This script will be a part of the pyco3dev repo NOT PyCogent3.

Make that script a command line installable. As an example, look at the pyco3dev/cleanup.py and associated entry in setup.py (the cleanup insertion point).

The script should:

identify functions/methods with docstrings
whether those match the format I've provided above
parse that format to extract the key components (summary, parameter names and descriptions)
track public methods / functions (no leading _) that do NOT have a docstring
Make sure it works on a few files, but don't commit any changes to the repo.

Recommended way of updating forked origin with remote?

Should contributors adhere strictly to either merge or rebase before PRs?

Are there any other git-specific things to consider? i.e. https://numpy.org/doc/stable/dev/development_workflow.html#additional-things-you-might-want-to-do

project toml broke editable install with pip 19.1

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

When you have pip 19.1 installed, I get error messages that indicates cogent3 cannot be installed in editable mode and use --no-use-pep517 but I cannot get that to work

Please see if you can reproduce and then fix!

Update how to deprecate code

cogent3-demo-new-ann.mp4

ignore

Deleted

Port docker setup and docs from cogent3/iqtree2 into cogent3/cogent3

Port the Dockerfile and entrypoint.sh from cogent3/iqtree2 to cogent3/cogent3 a new docker folder of the repo

Port the README.md from cogent3/iqtree2 to cogent3/c3dev
update dependencies for cogent3

Create a README.md in the docker folder of cogent3/cogent3 to describe using a docker container to run cogent3 in linux as a user

cogent3-demo-evo-model.mp4

Update python version in wiki

In Installing Cogent3 for development example code chunks use 3.8 (e.g. nox -s test-3.8), but the conda environment uses by default 3.10. Either replace all with 3.10 or placeholder <python-version>.

update c3dev to using numba

add to yml file, remove cython and rtd sphinx theme

Remove mercurical commands page

Mercurial how-to and https://github.com/cogent3/c3dev/wiki/Configuring-Mercurial pages not needed, new contributors to use git.

Reinclude linters as pre-commit hooks

Already run as a GitHub Action, but mainly so you can run locally. Add to dev_install.

#23 (comment)

Add how-to for docker install

Docker to be the main dev install/environment - add to dev_install

Reasons why to use docker over native environment #44 (comment)

Move native install to separate page.

create code of conduct entry on wiki

See https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

New wiki page with that content.

Review how to write documentation

Identify anything unclear.

Automate production of requirements.txt from pyproject.toml

investigate methods to ensure pyproject.toml is the source for all dependency requirements

nox testing requirements before PRs

For example - if developing and contributing changes to cogent3 with python=3.11, is a successful test with only nox -s test-3.11 sufficient?

Anything else to consider?

include the script to revert .c files

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

need to convert the following into a python script so it get's auto-installed with the dev environment

find cogent3 -name "*.c" -print0 | xargs -0 hg revert
find cogent3 -name "*.c.orig"

DEV trial the pypi and bioconda release

This project is ready to be uploaded to PyPI and (for the first time) to bioconda.

Let's use that as a trial for building the GitHub action for simultaneous release of cogent3.

ENH: add wiki page on installing and using git

needs to be for each operating system describing config
resource for main git commands

Checking git hooks

Pre-commit hooks
Currently, the pre-commit script will print a success message regardless if black/isort fails.

Add black and isort to c3dev_environment.yml
Add a catch for black or isort fails in the pre-commit sample

Related to #23

Pre-push hooks
Currently, all test sessions are run (see: output) and pushing to origin fails.

Add instructions in wiki to avoid running all tests nox to one session nox -s test-3.10 in pre-push hook

add to dev requirements

sphinx-autobuild
pytest-mpi

Add instructions for submitting discussions/issues

Aim to provide guidelines for whether users and contributors should submit a GitHub issue vs. discussion.

Based off the system to be used in the workshop - most things should be first submitted as a discussion, which can then be escalated to an issue by a core dev if warranted.

Relevant: https://gavinhuttley.github.io/tib/contribute.html

make small case for azure test failing for cogent3

the azure test is failing for the following cogent3 test, which indicates we might need to be able to specify a container with more than 1 process.

can you modify your demo scitrack azure pipeline case to add this one test and see if you can establish a config that allows this to pass?

   def test_create_processes(self):
        """Procressor pool should create multiple distingue processes"""
        index = [2, 3, 4, 5, 6, 7, 8, 9, 10]
>       result = parallel.map(get_process_value, index, max_workers=2, use_mpi=False)

test_util/test_parallel.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../.tox/py37/lib/python3.7/site-packages/cogent3/util/parallel.py:168: in map
    return list(imap(f, s, max_workers, use_mpi, if_serial, chunksize))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <function get_process_value at 0x7efcc01a7320>, s = [2, 3, 4, 5, 6, 7, ...]
max_workers = 2, use_mpi = False, if_serial = 'raise', chunksize = None

    def imap(f, s, max_workers=None, use_mpi=False, if_serial="raise", chunksize=None):
        """
        Parameters
        ----------
        f : callable
            function that operates on values in s
        s : iterable
            series of inputs to f
        max_workers : int or None
            maximum number of workers. Defaults to 1-maximum available.
        use_mpi : bool
            use MPI for parallel execution
        if_serial : str
            action to take if conditions will result in serial execution. Valid
            values are 'raise', 'ignore', 'warn'. Defaults to 'raise'.
        chunksize : int or None
            Size of data chunks executed by worker processes. Defaults to None
            where stable chunksize is determined by set_default_chunksize()
    
        Returns
        -------
        imap is a generator yielding result of f(s[i]), map returns the result
        series
        """
    
        if_serial = if_serial.lower()
        assert if_serial in ("ignore", "raise", "warn"), f"invalid choice '{if_serial}'"
    
        # If max_workers is not defined, get number of all processes available
        # minus 1 to leave for master process
        if use_mpi:
            if not USING_MPI:
                raise RuntimeError("Cannot use MPI")
    
            err_msg = (
                "Execution in serial. For parallel MPI execution, use:\n"
                " $ mpirun -n 1 <executable script>"
            )
    
            if COMM.Get_attr(MPI.UNIVERSE_SIZE) == 1 and if_serial == "raise":
                raise RuntimeError(err_msg)
            elif COMM.Get_attr(MPI.UNIVERSE_SIZE) == 1 and if_serial == "warn":
                warnings.warn(UserWarning, msg=err_msg)
    
            if not max_workers:
                max_workers = COMM.Get_attr(MPI.UNIVERSE_SIZE) - 1
    
            if not chunksize:
                chunksize = set_default_chunksize(s, max_workers)
    
            with MPIfutures.MPIPoolExecutor(max_workers=max_workers) as executor:
                for result in executor.map(f, s, chunksize=chunksize):
                    yield result
        else:
            if not max_workers:
                max_workers = multiprocessing.cpu_count() - 1
>           assert max_workers < multiprocessing.cpu_count()
E           AssertionError

Add wiki _sidebar

Will update the following structure over time here.

home
  conduct
  dev_install
    repos
    devtools
    test_install
  contributing_code
    issue_types
      type_x
      type_y
    dev_cycle
    testing
  writing_docs
  references

develop git-based process

git precommit hooks for black, etc.
implement in devconfig.py

cogent3 / c3dev Goto Github PK

c3dev's Introduction

📣 Feature Announcements 📣

Who is it for?

Anyone who wants to analyse sequence divergence using robust statistical models

Anyone who wants to undertake exploratory genomic data analysis

Beginner friendly approach to genomic data analysis

Installation?

Install extra -- adds visualisation support

Install dev -- adds cogent3 development related libraries

Install the development version

Project Information

Project History

Funding

c3dev's People

Contributors

Stargazers

Watchers

Forkers

c3dev's Issues

Setevedore

Pluggy

Recommend Projects

Recommend Topics

Recommend Org

Install `extra` -- adds visualisation support

Install `dev` -- adds `cogent3` development related libraries