Coder Social home page Coder Social logo

darcy220606 / ampcombi Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 1.0 121.33 MB

AMPcombi parses and filters results from AMP prediction tools

License: MIT License

Python 99.08% Shell 0.92%
amp antimicrobial-genes-annotation antimicrobial-peptides genes

ampcombi's People

Contributors

darcy220606 avatar louperelo avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

maxibor

ampcombi's Issues

Replace `--tooldict` python dictionary structure with standard cli flags

Currently the bash-based interaction AMPcombi requires --tooldict input to be a python dictionary

--tooldict '{"ampir":"ampir.tsv", "amplify":".tsv", "macrel":".prediction", "neubi":"neubi.fasta", "hmmer_hmmsearch":".txt"}'

This is results in inconsistency in the way you're interfacing with the tool (having to think in both bash and python), and furhtermore the nested quotes makes it difficult when wrapping the command in e.g. pipeline code.

I would reocmmend changing the definition of input file suffices for each tool with an explicitt flag e.g.

--ampir-suffix 'ampir.tsv' --amplify-suffix '.tsv' --macrel-suffix 'prediction' --neubi-suffix 'neubi.fasta' --hmmsearch-suffix '.txt'

This will make the UX much smoother, lower mental load when trying to use the tool, and make it easier to integrate into pipeliens

Fix log file

Log file is not appended but rewritten for every sample in the case of funcscan. The patch should include, ampcombi looking for a file in the output folder with the same name and append it

add argparse to define input commands

add input command options with argparse:
First command to implement:
--amp-results: path to folder which contains tools results outputs

Others (see issues):

  • specific input paths
  • output directory
  • probability cutoff

Add gbk extractor script

This issue requests the addition of gbk extractor, which constructs gbk files based on the contig id. This helps the user downstream to only obtian the gbk files with teh contig of interest for easy manipulation and exploring - requested by Rosa Herbst

Create tool options: --main directory option --amp-results

Tool input:
[ ] Give a path to the folder which contains the files --amp-results (like funcscan - put in README how the folder should be structured
[ ] The input folder should be structured like funcscan run/amp/amptool/SAmple1/output.tsv -> subdirectories names will be prefix for samples and that we can use to create the output directories which should be /ampcombi/sample1/table.tsv
[ ] OR work with sample sheets?

Create flag for probability threshold

Filtering is not done in funcscan but in AMPcombi!
Put a probability-threshold on tools (except hmmer, which would be evalue) BEFORE merging AFTER formatting

adapt README to hmmer input and output

hmmer hmmsearch has different output options.
At the moment AMPcombi is designed to parse the default output (no parameter set)
Also it is set to parse only one hmm-file per sample, corresponding to one hmmsearch model.

Note to consider : Append and concat

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Should we change the .append to pd.concat instead?

Output directory

Name output files according to samples (use sample names from directory), use --outdir to name the output folder.

Create subdirectory with sample names, and each subdir should have one resulting table.

Add the different thresholds to AMPcombi summary

Add the different thresholds to AMPcombi summary as requested by Rosa Herbst:

  • (2) e-value cut-off <0.05 : dont remove entries just remove the alignemnet hits (classification) : DONE
  • (4) stop codon : retain sequences that have stop codons in 3' and 5' ends (50 aa before and after) : DONE
  • (5) tertiary structure : (psipred) alpha or beta chain : DONE
  • (6) physical property : basic/acidic (PepFUN) : DONE
  • (7) ABC trasnporter should be before AMP (10 CDSs before and after) the hit : DONE

Remove contigs assigned as non-amps

Eliminate the non-AMPs if tools setting (i.e. AMPLIFY) gives them out as well. If more than one tool was run, condition would be that it was non-AMP in all of them.

Create an input check

[ ] Check if the given path and files exist and print error message if folders/files do not exist or are not found

Amp database

Add an 'if' statement to look for the same date.fasta file/directory, if there is one don't download it again. This is for funcscan so that not every time a sample is run the file is downloaded again

remove head outdir or make it optional

Remove the head output directory <- This was chosen!
OR
Make it optional
If only one sample is processed, a head directory would not be necessary and the results can be written into a directory with the sample name.
If outdir is chosen, subdirectories with sample names are created inside it
Make sure it is created in the working directory

Create tool options: --toolname-outfile

[ ] We give the user the possibility to indicate specific file paths, like: --ampir-outfile, --amplify-outfile etc. (would be in case of few samples - AMPcombi has to be run manually for each sample)
[ ] How to handle different samples? --amp-results points to output of 1 sample? Output is one table per sample in one output directory

Assign 0 to empty amp probability

Which marker to use to identify a contig was not attributed to AMPs in one of the tools (python gives NAN at merge, add zero instead.

Add `DRAMP` db alternative

Sometimes the DRAMP database is down so perhaps add an option to use an alternative AMP database like those referenced here ex. /CAMP/APD/dbAMP

how to treat hmmer output from several models?

hmmsearch has a special way of output, with large header and footer part. The output refers to one run with one hmm-model.
The model name is in the line Query: .
How to treat results from several runs with different models on the same samples?

  1. Concat all output files for one sample and extract Query name (model), evalue, contig_id? (prior concatenation)
  2. Allow several hmm.output files per sample, with model name in the filename and parse these? (no concatenation)
    -> should this happen in an extra module?

Add gbk extractor script

This issue requests the addition of gbk extractor, which constructs gbk files based on the contig id. This helps the user downstream to only obtian the gbk files with teh contig of interest for easy manipulation and exploring - requested by Rosa Herbst

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.