The ampcombi from darcy220606

Currently the bash-based interaction AMPcombi requires --tooldict input to be a python dictionary

--tooldict '{"ampir":"ampir.tsv", "amplify":".tsv", "macrel":".prediction", "neubi":"neubi.fasta", "hmmer_hmmsearch":".txt"}'

This is results in inconsistency in the way you're interfacing with the tool (having to think in both bash and python), and furhtermore the nested quotes makes it difficult when wrapping the command in e.g. pipeline code.

I would reocmmend changing the definition of input file suffices for each tool with an explicitt flag e.g.

--ampir-suffix 'ampir.tsv' --amplify-suffix '.tsv' --macrel-suffix 'prediction' --neubi-suffix 'neubi.fasta' --hmmsearch-suffix '.txt'

This will make the UX much smoother, lower mental load when trying to use the tool, and make it easier to integrate into pipeliens

Fix log file

Log file is not appended but rewritten for every sample in the case of funcscan. The patch should include, ampcombi looking for a file in the output folder with the same name and append it

format output summary

the summary.tsv contains the index numbers.

Create an input argument for the fileending and for tools

To associate the file ending with the tools as this is now hardcoded and the user should be able to asssign fileending for the individual tools, e.g. in a dictionary

Add pydamage / metawrap and MMseqs2 as optional flags

Add pydamage output as an optional flag
Add MAGs identification input as an optional flag (from metawrap output)
Add the MMseqs contig classification as an optional flag

Remove alignment classification from Rshiny table if they have evalue >0.05 (only retain the hits)

Create script for alignment step

create a bash script that runs diamond

Merge reformatted tables

[ ] Check which tools were run, read in and merge accordingly

Add `AMPTransformer` to the summary table

https://github.com/Brendan-P-Moore/AMPTransformer

Publish in pypi and in conda

add argparse to define input commands

add input command options with argparse:
First command to implement:
--amp-results: path to folder which contains tools results outputs

Others (see issues):

specific input paths
output directory
probability cutoff

Add if statement for conda env for diamond

Add an if statement: If conda environemnt with the name diamond cannot be found, create diamond conda environment from yaml

Include ensembleamppred in the script

Add the interactive AMPcombi summary Rshiny flexboard to AMPcombi

Create fasta from the merged table

For alignment: build new fasta from merged table contig_ids? (see code in https://github.com/louperelo/longmetarg/blob/main/bin/read_analysis.py ) -> fasta with sequences of AMP hits as input to Diamond or MMseq

Idea: convert input contig faa to table and then join the seq to the merged table and then use the amp conig IDs and and seq to create a fasta

Update readme with new changes

For release 0.1.9

Write the check function that prints the tools present in the dir

Add gbk extractor script

This issue requests the addition of gbk extractor, which constructs gbk files based on the contig id. This helps the user downstream to only obtian the gbk files with teh contig of interest for easy manipulation and exploring - requested by Rosa Herbst

Create tool options: --main directory option --amp-results

Tool input:
[ ] Give a path to the folder which contains the files --amp-results (like funcscan - put in README how the folder should be structured
[ ] The input folder should be structured like funcscan run/amp/amptool/SAmple1/output.tsv -> subdirectories names will be prefix for samples and that we can use to create the output directories which should be /ampcombi/sample1/table.tsv
[ ] OR work with sample sheets?

Create flag for probability threshold

Filtering is not done in funcscan but in AMPcombi!
Put a probability-threshold on tools (except hmmer, which would be evalue) BEFORE merging AFTER formatting

Add lengths cut-off 0 - 100 aa

adapt README to hmmer input and output

hmmer hmmsearch has different output options.
At the moment AMPcombi is designed to parse the default output (no parameter set)
Also it is set to parse only one hmm-file per sample, corresponding to one hmmsearch model.

Include neubi results in the script

Note: The neubi results are in a fasta file so they should be first converted to table and then reformated

Note to consider : Append and concat

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Should we change the .append to pd.concat instead?

Output directory

Name output files according to samples (use sample names from directory), use --outdir to name the output folder.

Create subdirectory with sample names, and each subdir should have one resulting table.

Add the summary from AMPGram

Add the summary tables from new tool : AMPGram

Write the Readme

Add the different thresholds to AMPcombi summary

Add the different thresholds to AMPcombi summary as requested by Rosa Herbst:

(2) e-value cut-off <0.05 : dont remove entries just remove the alignemnet hits (classification) : DONE
(4) stop codon : retain sequences that have stop codons in 3' and 5' ends (50 aa before and after) : DONE
(5) tertiary structure : (psipred) alpha or beta chain : DONE
(6) physical property : basic/acidic (PepFUN) : DONE
(7) ABC trasnporter should be before AMP (10 CDSs before and after) the hit : DONE

Stop Rscriot from running after every sample summary

Change it that the function 'html_generator()' is in a seperate parameter that is activated independednt from completee summary as that will create a problem in funcscan

Remove contigs assigned as non-amps

Eliminate the non-AMPs if tools setting (i.e. AMPLIFY) gives them out as well. If more than one tool was run, condition would be that it was non-AMP in all of them.

Create an input check

[ ] Check if the given path and files exist and print error message if folders/files do not exist or are not found

Amp database

Add an 'if' statement to look for the same date.fasta file/directory, if there is one don't download it again. This is for funcscan so that not every time a sample is run the file is downloaded again

Sort the merged df by probability

Sort it by the probability that is found in all tools

Merge the diamond alignmentto the merge_df

Print the logo in the beginning of the main function

remove head outdir or make it optional

Remove the head output directory <- This was chosen!
OR
Make it optional
If only one sample is processed, a head directory would not be necessary and the results can be written into a directory with the sample name.
If outdir is chosen, subdirectories with sample names are created inside it
Make sure it is created in the working directory

Download database file in a result directory

Retain the downloaded DRAMP db file in the results output

Create tool options: --toolname-outfile

[ ] We give the user the possibility to indicate specific file paths, like: --ampir-outfile, --amplify-outfile etc. (would be in case of few samples - AMPcombi has to be run manually for each sample)
[ ] How to handle different samples? --amp-results points to output of 1 sample? Output is one table per sample in one output directory

Add the DATABASE function ina way that it doesnt repeat with the sample_list

Assign 0 to empty amp probability

Which marker to use to identify a contig was not attributed to AMPs in one of the tools (python gives NAN at merge, add zero instead.

Add `DRAMP` db alternative

Sometimes the DRAMP database is down so perhaps add an option to use an alternative AMP database like those referenced here ex. /CAMP/APD/dbAMP

Change output to TSV instead of CSV

It would be nice to have the option to change the output format. nf-core/funcscan would be happier with TSV output, not CSV.

consider concatenating all output summaries to one

Put an extra column with sample_name first.
Then, put an extra flag to concat all output.summaries to one summary (sample name needs to be in an extra column, see issue #36 )

how to treat hmmer output from several models?

hmmsearch has a special way of output, with large header and footer part. The output refers to one run with one hmm-model.
The model name is in the line Query: .
How to treat results from several runs with different models on the same samples?

Concat all output files for one sample and extract Query name (model), evalue, contig_id? (prior concatenation)
Allow several hmm.output files per sample, with model name in the filename and parse these? (no concatenation)
-> should this happen in an extra module?

include --version output

We need to be able to get the version number!

Add gbk extractor script

This issue requests the addition of gbk extractor, which constructs gbk files based on the contig id. This helps the user downstream to only obtian the gbk files with teh contig of interest for easy manipulation and exploring - requested by Rosa Herbst

Test behaviour if only one tool input is given

Jasmin observed that AMPcombi fails in the funcscan pipeline, if only AMPIR is used as input. Macrel only works, as does Ampir+Macrel. Test the other input tools.

darcy220606 / ampcombi Goto Github PK

ampcombi's People

Contributors

Stargazers

Watchers

Forkers

ampcombi's Issues

Recommend Projects

Recommend Topics

Recommend Org