Coder Social home page Coder Social logo

c3dev's Introduction

PyPI version Downloads

Build Status coverall

PyPI - Python Version Ruff

CodeQL Codacy Badge

cogent3 is a mature python library for analysis of genomic sequence data. We endeavour to provide a first-class experience within Jupyter notebooks, but the algorithms also support parallel execution on compute systems with 1000's of processors.

๐Ÿ“ฃ Feature Announcements ๐Ÿ“ฃ

Faster sequence format parsers ๐Ÿ’จ

We have faster implementations of the parsers for Fasta and GenBank sequence formats. These are used by our standard loading mechanisms. If you just want to get the contents of files in those formats as standard Python types, use cogent3.parser.fasta.iter_fasta_records() or cogent3.parser.genbank.iter_genbank_records().

Supporting third-party apps as plugins ๐Ÿ”Œ

Cogent3 now provides support for plugins! Third-party developers can deploy their code as cogent3 apps with just a few lines. See the demo project.

Post any questions you have in cogent3 discussions.

The developers of Cogent3 and IQ-TREE2 announce piqtree2 ๐ŸŽ‰

Speaking of plugins, our first major third-party plugin is piqtree2. Try it out and give us feedback.

New core data types improve efficiency and flexibility

The cogent3 development team ๐Ÿ‘พ are hard at work modernising the core internals ๐Ÿ’ช๐Ÿ› .

In this release the Sequence, SequenceCollection, MolType, GeneticCode, and alphabet classes have all been rewritten from scratch with an eye to simplifying the code while improving its flexibility and performance. (We're working on alignments for the next release.)

The "new-style" objects enhance performance by supporting the access of the underlying data in various formats (i.e. numpy arrays, bytes or strings). You can create "new-style" objects by setting the new_type=True argument in top-level functions (make_seq, load_seq, make_unaligned_seqs, get_moltype, get_code). These are not yet the default and are not fully integrated into the existing code. They can also differ in their API relative to the classes they replace.

We encourage experimentation in cases where integration with old objects is NOT required and look forward to any feedback!

Who is it for?

Anyone who wants to analyse sequence divergence using robust statistical models

cogent3 is unique in providing numerous non-stationary Markov models for modelling sequence evolution, including codon models. cogent3 also includes an extensive collection of time-reversible models (again including novel codon models). We have done more than just invent these new methods, we have established the most robust algorithms for their implementation and their suitability for real data. Additionally, there are novel signal processing methods focussed on statistical estimation of integer period signals.

๐ŸŽฌ Demo non-reversible substitution model
cogent3-demo-composable.mp4

Anyone who wants to undertake exploratory genomic data analysis

Beyond our novel methods, cogent3 provides an extensive suite of capabilities for manipulating and analysing sequence data. You can manipulate sequences by their annotations, e.g.

๐ŸŽฌ Demo sequences with annotations
cogent3-demo-new-ann.mp4

Plus, you can read standard tabular and biological data formats, perform multiple sequence alignment using any cogent3 substitution models, phylogenetic reconstruction and tree manipulation, manipulation of tabular data, visualisation of phylogenies and much more.

Beginner friendly approach to genomic data analysis

Our cogent3.app module provides a very different approach to using the library capabilities. Expertise in structural programming concepts is not essential!

๐ŸŽฌ Demo friendly coding
cogent3-demo-composable.mp4

Installation?

$ pip install cogent3

Install extra -- adds visualisation support

The extra group includes python libraries required for visualisation, i.e. plotly, kaleido, psutil and pandas.

$ pip install "cogent3[extra]"

Install dev -- adds cogent3 development related libraries

The dev group includes python libraries required for development of cogent3.

$ pip install "cogent3[dev]"

Install the development version

$ pip install git+https://github.com/cogent3/cogent3.git@develop#egg=cogent3

Project Information

cogent3 is released under the BSD-3 license, documentation is at cogent3.org, while cogent3 code is on GitHub. If you would like to contribute (and we hope you do!), we have created a companion c3dev GitHub repo which provides details on how to contribute and some useful tools for doing so.

Project History

cogent3 is a descendant of PyCogent. While there is much in common with PyCogent, the amount of change has been substantial, motivating the name change to cogent3. This name has been chosen because cogent was always the import name (dating back to PyEvolve in 2004) and it's Python 3 only.

Given this history, we are grateful to the multitude of individuals who have made contributions over the years. Many of these contributors were also co-authors on the original PyEvolve and PyCogent publications. Individual contributions can be seen by using "view git blame" on individual lines of code on GitHub , through git log in the terminal, and more recently the changelog.

Compared to PyCogent version 1.9, there has been a massive amount of changes. These include integration of many of the new developments on algorithms and modelling published by the Huttley lab over the last decade. We have also modernised our dependencies. For example, we now use plotly for visualisation, tqdm for progress bar display, concurrent.futures and mpi4py.futures for parallel process execution, nox and pytest for unit testing.

Funding

Cogent3 has received funding support from the Australian National University and an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

ย ย ย ย  ย ย ย ย 

c3dev's People

Contributors

fredjaya avatar gavinhuttley avatar katherinecaley avatar khiron avatar thomas-la avatar tla6677658 avatar u6052029 avatar wjjmjh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

c3dev's Issues

Development docker-compose.yml and Dockerfile

Run in a local path containing a clone of your fork of the cogent3 repo

  • python version pinned
  • pip install cogent3 requirements.txt
  • copy cogent3 clone
  • add .git pre-commit hooks
  • black
  • isort
  • prod node with a bash entry point
  • test node with a pytest entry point
  • debug node using ptvsd with a pytest entry point
  • ensure security keys allow the project to be checked in from the container

Direct new contributors to wiki page

The goal here is to make the wiki page the first, go-to resource for new contributors. This includes:

  • From the c3dev readme, move the conda/non-conda install instructions to an appropriate place in c3dev wiki
  • Updating the c3dev readme to explicitly direct contributors to start with the c3dev wiki
  • Clarifying that cogent3 should be first installed (for pytest to work) included in the readme, but not wiki
  • Maybe update the cogent3 readme as well

add a command line script that identifies function usage cases

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).


We want a basic script that, given a path to a cogent3 python file, looks for the functions in that file and then all places where it's used in <PyCogent3>/cogent3 and <PyCogent3>/tests.

There may already be a python package for this. If not, use click to create a simple command line interface. See cleanup.py and it's corresponding hook in setup.py.

DEV: identify a tool for selectively building dependency graph

Use case here is we have some modules that need eventually to be removed as they duplicate functionality in other libraries on which we already depend. However, given the size of those modules it is rather daunting to approach the problem.

We need the ability to create a dependency graph that is conditioned on a specific module so we can understand where it is being used.

The specific use case is cogent3.maths.stats.special.

Investigating plugin architectures

Below I list some of @Nick-Foto's work exploring two different libraries

I note that I have not come across any actual open source projects using stevedore, but I know of several using pluggy. I also know pluggy has a very significant community (pytest) behind it.

Setevedore

Blog post describing Stevedore

Pluggy

Conference talk on pluggy

The Datasette project uses it.

Provide how-to's for each type of issue?

Currently, there is a dedicated how-to for working on deprecating code.

There are also some "tackling issues" pointers that are specific to i.e. adding a new feature.

I think the [How to contribute code] page should be reserved for high-level advice that is shared across all issue types.

implementing precommit hooks as a github action

We want to avoid code style issues when doing reviews of PRs.

We need black and isort linters to start with. Setup with a GitHub action run on each PR.

Ideally, these would be applied by developers before the push to their own GitHub fork of cogent3.

Question: If these are setup on cogent3#develop, do forks also run them?

Clarify dev environment and write docs

The proposed development environments include using:

Should both approaches be supported and documented, or just one? Maybe a question for @khiron?

Once confirmed, can proceed with writing dev install documentation (i.e. remove current conda documentation)

Clarify nox

To make it more accessible for new contributors, add some additional nox information when installing cogent3 for development.

Suggestions:

  • Add one sentence on the purpose of nox
  • One sentence on the suggested installation method (pip? apt?)
  • Update the code chunk to (up until this step in the wiki, you're still in repos/c3dev) :
cd ~/repos/cogent3
nox -s test-3.8
  • Update the python versions in code chunks #35
  • Provide an example of a nox test passing

write script that reformats docstrings to numpy style

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).


BEFORE writing anything, check whether somebody has already written a docstring reformatter that can be used to solve this issue. (If so, make a comment below.)

Many PyCogent3 docstrings have the format

function/method summary

Arguments:
   - param: description

That format needs to be refactored to the

function/method summary

Parameters
-------------
param
    description

style (line length <= 80 characters). (The complete numpy style reference is much more extensive than what we require I expect.

NOTE: This script will be a part of the pyco3dev repo NOT PyCogent3.

Make that script a command line installable. As an example, look at the pyco3dev/cleanup.py and associated entry in setup.py (the cleanup insertion point).

The script should:

  • identify functions/methods with docstrings
  • whether those match the format I've provided above
  • parse that format to extract the key components (summary, parameter names and descriptions)
  • track public methods / functions (no leading _) that do NOT have a docstring
  • Make sure it works on a few files, but don't commit any changes to the repo.

Port docker setup and docs from cogent3/iqtree2 into cogent3/cogent3

Port the Dockerfile and entrypoint.sh from cogent3/iqtree2 to cogent3/cogent3 a new docker folder of the repo

Port the README.md from cogent3/iqtree2 to cogent3/c3dev
update dependencies for cogent3

Create a README.md in the docker folder of cogent3/cogent3 to describe using a docker container to run cogent3 in linux as a user

nox testing requirements before PRs

For example - if developing and contributing changes to cogent3 with python=3.11, is a successful test with only nox -s test-3.11 sufficient?

Anything else to consider?

Checking git hooks

Pre-commit hooks
Currently, the pre-commit script will print a success message regardless if black/isort fails.

  • Add black and isort to c3dev_environment.yml
  • Add a catch for black or isort fails in the pre-commit sample

Related to #23

Pre-push hooks
Currently, all test sessions are run (see: output) and pushing to origin fails.

  • Add instructions in wiki to avoid running all tests nox to one session nox -s test-3.10 in pre-push hook

make small case for azure test failing for cogent3

the azure test is failing for the following cogent3 test, which indicates we might need to be able to specify a container with more than 1 process.

can you modify your demo scitrack azure pipeline case to add this one test and see if you can establish a config that allows this to pass?

   def test_create_processes(self):
        """Procressor pool should create multiple distingue processes"""
        index = [2, 3, 4, 5, 6, 7, 8, 9, 10]
>       result = parallel.map(get_process_value, index, max_workers=2, use_mpi=False)

test_util/test_parallel.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../.tox/py37/lib/python3.7/site-packages/cogent3/util/parallel.py:168: in map
    return list(imap(f, s, max_workers, use_mpi, if_serial, chunksize))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <function get_process_value at 0x7efcc01a7320>, s = [2, 3, 4, 5, 6, 7, ...]
max_workers = 2, use_mpi = False, if_serial = 'raise', chunksize = None

    def imap(f, s, max_workers=None, use_mpi=False, if_serial="raise", chunksize=None):
        """
        Parameters
        ----------
        f : callable
            function that operates on values in s
        s : iterable
            series of inputs to f
        max_workers : int or None
            maximum number of workers. Defaults to 1-maximum available.
        use_mpi : bool
            use MPI for parallel execution
        if_serial : str
            action to take if conditions will result in serial execution. Valid
            values are 'raise', 'ignore', 'warn'. Defaults to 'raise'.
        chunksize : int or None
            Size of data chunks executed by worker processes. Defaults to None
            where stable chunksize is determined by set_default_chunksize()
    
        Returns
        -------
        imap is a generator yielding result of f(s[i]), map returns the result
        series
        """
    
        if_serial = if_serial.lower()
        assert if_serial in ("ignore", "raise", "warn"), f"invalid choice '{if_serial}'"
    
        # If max_workers is not defined, get number of all processes available
        # minus 1 to leave for master process
        if use_mpi:
            if not USING_MPI:
                raise RuntimeError("Cannot use MPI")
    
            err_msg = (
                "Execution in serial. For parallel MPI execution, use:\n"
                " $ mpirun -n 1 <executable script>"
            )
    
            if COMM.Get_attr(MPI.UNIVERSE_SIZE) == 1 and if_serial == "raise":
                raise RuntimeError(err_msg)
            elif COMM.Get_attr(MPI.UNIVERSE_SIZE) == 1 and if_serial == "warn":
                warnings.warn(UserWarning, msg=err_msg)
    
            if not max_workers:
                max_workers = COMM.Get_attr(MPI.UNIVERSE_SIZE) - 1
    
            if not chunksize:
                chunksize = set_default_chunksize(s, max_workers)
    
            with MPIfutures.MPIPoolExecutor(max_workers=max_workers) as executor:
                for result in executor.map(f, s, chunksize=chunksize):
                    yield result
        else:
            if not max_workers:
                max_workers = multiprocessing.cpu_count() - 1
>           assert max_workers < multiprocessing.cpu_count()
E           AssertionError

Add wiki _sidebar

Will update the following structure over time here.

home
  conduct
  dev_install
    repos
    devtools
    test_install
  contributing_code
    issue_types
      type_x
      type_y
    dev_cycle
    testing
  writing_docs
  references

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.