russel88 / crisprcastyper Goto Github PK

View Code? Open in Web Editor NEW

87.0 3.0 16.0 255.55 MB

CCTyper: Automatic detection and subtyping of CRISPR-Cas operons

Home Page: https://typer.crispr.dk

License: MIT License

Python 99.37% Shell 0.63%

crispr-analysis cas crispr crispr-cas crispr-cas9 bioinformatics

crisprcastyper's People

Contributors

Stargazers

Watchers

Forkers

alvinleopold elementgenomicsinc healthvivo kobbycyber yemilawal krystal0816 jemimacat xvtyzn geneditbio nataquinones mariormestre pentamorfico liupfskygre graonet flakering sm-le

crisprcastyper's Issues

New V-A variants

Include HMMs for new V-A variants described here: https://www.liebertpub.com/doi/10.1089/crispr.2020.0043

Refseq/Genbank accession as input

Would be useful if one could use the accession number as input, and CCtyper would automatically download fasta.

Optimize plot expansions

Would be useful if arrays were included when calculating which genes to add for plot expansions of CRISPR-Cas loci

Hi! Great program, has been helping me a lot! I've been running cctyper for metagenomes and it has been working for the most part. For one of my FASTA files it has been erroring with XGBoost model incompatible. I am using cctyper v 1.8.0 via mamba on a fresh environment, created as instructed on the README

Thanks for any help in this situation!

cctyper part_001.fasta part_001_results --prodigal meta
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/bin/cctyper:7: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/
en/latest/pkg_resources.html
  import pkg_resources
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
[2024-02-14 06:27:58] INFO: Running CRISPRCasTyper version 1.8.0
[2024-02-14 06:28:01] INFO: Predicting ORFs with prodigal
[2024-02-14 07:20:09] INFO: Running HMMER against Cas profiles
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 705/705 [54:41<00:00,  4.65s/it]
[2024-02-14 08:17:52] INFO: Subtyping putative operons
[2024-02-14 08:18:08] INFO: Predicting CRISPR arrays with minced
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
[2024-02-14 08:18:50] INFO: BLASTing for CRISPR near cas operons
[2024-02-14 08:21:23] INFO: Predicting subtype of CRISPR repeats
[2024-02-14 08:21:23] ERROR: XGBoost model incompatible

File upload fail on web server with Chrome browser

Three separate solutions to this problem:

Disable hardware acceleration in the Chrome browser settings (restart browser)
Use another browser (Firefox, Opera, Safari, Edge does not have this problem)
Copy-paste the sequence from the file into the submission field

access to sequences that used to geenrate Cas profiles

Hi @Russel88 ,

Thanks a lot for developing this valuable tool for the whole community.
I wonder if it is possible for you to share with me the sequences that were used to generate the hmm profiles?

Best,
Huanle

Problems running the program

First of all I have to say the program is great. Easy to use and the output is very comprehensive.
However, recently I have been having this error:

/Users/JFF/opt/miniconda3/envs/cctyper/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
[2022-04-04 14:29:15] INFO: Running CRISPRCasTyper version 1.3.0
[2022-04-04 14:29:15] INFO: Predicting ORFs with prodigal
[2022-04-04 14:29:15] INFO: Running HMMER against Cas profiles
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 691/691 [00:14<00:00, 48.44it/s]
[2022-04-04 14:29:31] INFO: Parsing HMMER output
[2022-04-04 14:29:31] INFO: Subtyping putative operons
/Users/JFF/opt/miniconda3/envs/cctyper/lib/python3.9/site-packages/cctyper/castyping.py:294: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
single_effector_hmms = self.scores[self.scores['Hmm'].isin(list(specifics))].drop('Hmm', 1)
[2022-04-04 14:29:31] INFO: Predicting CRISPR arrays with minced
[2022-04-04 14:29:31] INFO: No CRISPRs found.
[2022-04-04 14:29:31] INFO: Plotting map of CRISPR-Cas loci
[2022-04-04 14:29:31] INFO: Removing temporary files

It's strange because I have run this same sequence on other occasions with the program and I am sure there are CRISPRs.
Do you know what might be causing this?

Kind regards,
Javier

CCTyper results

Hello! I've been using CCTyper and I'm very happy with the performance and results, but I have a question about the results of an isolate assembly. This is the CRISPR_Cas.tab file:

Contig Operon Operon_Pos Prediction CRISPRs Distances Prediction_Cas Prediction_CRISPRs
Contig51 Contig51@1 [785, 2608] I-B ['Contig51_1'] [177] Ambiguous ['I-B']

However, when I go to the cas_operons.tab file, I find:

Contig Operon Start End Prediction Complete_Interference Complete_Adaptation Best_type Best_score Genes Positions E-values CoverageSeq CoverageHMM Strand_Interference Strand_Adaptation
Contig137 Contig137@1 40 5517 I-B 100% 0% I-B 15.0 ['Cas6_0_CAS-III-B-I-B', 'Cas8b1_10_CAS-I-B', 'Cas7_0_CAS-I-B', 'Cas5_0_IB', 'Cas3_0_I'] [1, 2, 3, 4, 5] ['2.00e-53', '2.50e-250', '2.40e-48', '1.40e-19', '1.40e-34'] [0.979, 0.983, 0.935, 0.832, 0.545] [0.979, 0.991, 0.967, 0.862, 0.475] 1 NA

This file shows the same results as the cas_operons_orphan.tab file. My question is why it seems to identify a complete CRISPR/Cas locus when it doesn't find the Cas operon. Interestingly, using Bakta to annotate this asolate, it identifies the Cas genes very close to the CRISPR array (less than 5000 nucleotides away and on the same contig). It seems that CCTyper recognizes that the CRISPR/Cas system is complete, but it doesn't display this operon. What could be happening? Thanks in advance!

Execution against protein fasta and gff

Hi,

First of all, I would like to thank you for developing this wonderful tool.

My question is about the input files of the tool.
Currently I have a genome sequence where gene prediction is a bit tricky. This genome cannot be predicted by prodigal and requires manual curation.

I would like to run CrisperCasTyper against this genome.
Is it possible to use gbk or protein fasta and gff as input?

Best,

Keigo

access to sequences that used to geenrate Cas profiles

Hi @Russel88 ,

Thanks a lot for developing this valuable tool for the whole community.
I wonder if it is possible for you to share with me the sequences that were used to generate the hmm profiles?

Best,
Huanle

Orphan CRISPR definition

Dear Russel,

Thank you very much for this unique tool.

I have a simple question. How is defined an orphan CRISPR? I suppose that a lonely CRISPR in a contig could be considered an orphan, but how many ORFs are needed for considering an array as an orphan CRISPR?

Indicate completion in output

It would be useful if CCTyper in the output could give an indication of whether the systems are complete or partial

Cas6 in red in visualization?

Dear all

Why is it that the Cas6 proteins are coloured in red in the visualization, as opposed to the other enzymes of either adaptation (blue) or interference (red) modules? Is there something functionally unique about Cas6 to be coloured separately?

Thanks

Marcus

Extract spacer coordinates

Hey,

This tool (super cool btw!) extracts all spacers (true and false) in fasta file into a separate directory. Could you please let me know how to extract the coordinates for all these spacers? The coordinates available on the file crisprs_all only refer to the consensus repeat, from what I understood.

Thanks for your time!

Different results from CRISPR Cas typer web server and standalone version

Speed up CRISPR stats calculation

Long CRISPRs take time to process, because each pair of both spacers and repeats are aligned to calculate the average identity of repeats and spacers. It should be sufficient to sample a subset of the repeats/spacers to estimate the identity. Then add a CLI argument to toggle exact versus approximate identity estimation.

add repeated sequence to repeats.fa

Hi,
I wanted to use your program to get E. coli spacers. It work well on some of my strains but for some of them there are missing spacers due to some mismatch in the last repeat sequence.
Is there a way to update the repeats.fa file in the db folder with my own list of repeated sequences ?
Have a nice day,
Fabien

Docker for cctyper

Dear all

Is there pre-built Docker image for cctyper? Tried to look for it in biocontainer and but to no avail.... thanks

Marcus

No database with conda install?

Hello, I installed cctyper with conda,
conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
activated the environment,
conda activate cctyper
and then tried to run cctyper
cctyper ~/Downloads/44_contigs_1000.fasta ~/Desktop/cctyper_output_44 --prodigal meta

However, it says that it cannot find the database directory. Is there an easy way to fix this?

[2022-08-03 13:40:32] INFO: Running CRISPRCasTyper version 1.3.0
[2022-08-03 13:40:32] ERROR: Could not find database directory

Thanks,

Nicholas

ValueError: invalid literal for int() with base 10: 'lengt'

Hello.
I first ran the single bins(2159kb) and he was able to output the results properly, but when I merged the 1472 bins (3.39Gb)into one fasta file and ran it again, the following error was reported,

cctyper ~/virus/viwrap_input/merge.fa ~/virus/CRISPRCasTyper --prodigal meta --threads 12
[2024-03-20 14:32:44] INFO: Running CRISPRCasTyper version 1.8.0
[2024-03-20 14:32:57] INFO: Predicting ORFs with prodigal
[2024-03-20 18:56:28] INFO: Running HMMER against Cas profiles
100%|████████████████████████████████████████████████████████████████████| 705/705 [4:57:48<00:00, 25.35s/it]
/data4/machuang/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py:85: DtypeWarning: Columns (28) have mixed types. Specify dtype option on import or set low_memory=False.
  hmm_df = pd.read_csv(self.out+'hmmer.tab', sep='\s+', header=None,
Traceback (most recent call last):
  File "/data4/machuang/miniconda3/envs/cctyper/bin/cctyper", line 85, in <module>
    hmmeri.main_hmm()
  File "/data4/machuang/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py", line 26, in main_hmm
    self.load_hmm()
  File "/data4/machuang/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py", line 100, in load_hmm
    hmm_df['Pos'] = [int(re.sub(".*_","",x)) for x in hmm_df['ORF']]
  File "/data4/machuang/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py", line 100, in <listcomp>
    hmm_df['Pos'] = [int(re.sub(".*_","",x)) for x in hmm_df['ORF']]
ValueError: invalid literal for int() with base 10: 'lengt'

The hmmer.log shows

Fatal exception (source file easel.c, line 2248):
unexpected getcwd() error

what is the reason for this, thanks for your help.

faa file of individual Cas proteins?

Dear all

I notice that the typical output of cctyper involves a file of cas_operons.txt, with each row containing potentially an operon with multiple Cas protein hits. However, the proteins.faa file contains all proteins used for cctyper (potentially even those not identified in the cas_operons.txt file.

Is there a way to extract the information in the "Genes" column of cas_operons.txt, and extract those amino acid sequences from proteins.faa to only include protein sequences of Cas proteins?

Thanks

Marcus

![image](https://user-images.githubusercontent.com/121949837/212599561-d771b233-4bde-4850-8e5f-98a4164f78cb.png)

Make it possible to enhance plot with custom HMM database

Would be useful if one could supplement CCTyper with a HMM database, e.g. PFfam, which would then be used to annotate previously unknown genes in the vicinity of CRISPR-Cas loci

FileNotFoundError: blast.tab

Hi there

I was wondering if you know what caused this issue, "FileNotFoundError: [Errno 2] No such file or directory: DIR/blast.tab"?

Thanks!

Warning messages while running

Hi, I got warning messages while running this command

However, I got the results, but I'm not sure whether the warning from the program affects the results or not?

Thank you in advance.

Creating empty files when there's no data

Hello,
First of all, fantastic software! I've been trying it out and it's super cool.
I have a suggestion regarding the output files. As you mention in the README: files are only created if there is any data. I was wondering why not generating an empty file instead, maybe with the header only. The main reason for this, is that it would be better for integrating it with workflow management systems, like snakemake, which
expect an output file as success of the run. In general, I think it easier to deal with files that indicate no results than with absent files.

"Orphan" CRISPRs near low-quality single-effector HMM match -> Putative CRISPR-Cas

At the moment, low-quality single-effector HMM matches near a CRISPR is only visible in the plot, but are not included in the CRISPR_Cas_putatitive.tab output file. This should be fixed.

Optimize hybrid system annotation

Hybrid loci in which one of the subtypes is a Class 2 are sometimes missed. Hybrid classification should be optimized to fix these cases.

Error when running cctyper: sed: cannot rename <output filename>/sed78a7GM: Permission denied

After running INFO: Running HMMER against Cas profiles , the job will instantly terminate and I receive the following error:

sed: cannot rename <output filename>/sed78a7GM: Permission denied Traceback (most recent call last): File "/opt/miniconda3/envs/cctyper/bin/cctyper", line 83, in <module> hmmeri.main_hmm() File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py", line 26, in main_hmm self.load_hmm() File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/hmmer.py", line 85, in load_hmm hmm_df = pd.read_csv(self.out+'hmmer.tab', sep='\s+', header=None, File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/util/_decorators.py", line 211, in wrapper return func(*args, **kwargs) File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, **kwargs) File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 605, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__ self._engine = self._make_engine(f, self.engine) File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine self.handles = get_handle( File "/opt/miniconda3/envs/cctyper/lib/python3.8/site-packages/pandas/io/common.py", line 856, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '<output filename>/hmmer.tab'

I was just wondering if anyone has come across this error and if so, how they resolved it?

Thanks in advance!

Multiple cctyper runs on the same contig giving different outputs?

Dear all

I ran cctyper on a large number of contigs, and the outputs of cas_operon.tab contained say a subgroup of contigs that had Cas hits.

I then sub-selected this set of Cas-positive contigs, and re-run cctyper. This time, A large majority of the contig and their Cas operon-containing positions were identical, except for a small subset of them, where some of these operons are now chopped up into multiple smaller cas operons. Interestingly, this small group of contigs are now present in the cas_operon_putative.tab, meaning that these predictions have become less confident.

I wonder why this is the case even though the contigs selected from the two cctyper runs were a subset of that of the first, but otherwise identical contigs. Thanks

Marcus

Meaning of asterisk at the end of protein sequence

Given below are two protein sequences I have taken from protein.faa file obtained after running ccTyper, one with asterisk and other without asterisk at the end of protein sequence.

NODE_20793_length_1284_cov_1096.518308_2 # 431 # 1024 # -1 # ID=82_2;partial=00;start_type=GTG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.577
MKKKILSLAVVAVFGVMTMGPVMAGEVDPATVPEKKQTTLKLYLTAKEAYDMKKAEGDKV
LLIDVRTPEEIQYVGNLGDMMDANIPYQFNDISGYDEKKKVYASSLNSNFVAEVEELVNK
RGLDKDSTIIVSCRSGDRSAVSANLLAKAGYTHVYSVFDGFEGDLSKDGRRSVNGWKNAG
LPWTYNMDKAKMYFILR*

NODE_26395_length_1052_cov_597.440321_1 # 3 # 1052 # -1 # ID=98_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.368
LRRKNINNMIDKIYPYIHKIIKKTFSYLTLPQQKSLALTISAFFDPPSFSLYNIASKLPL
DTSNRHKHKHLIRFLDKLLINDDFWKSYITTIFLLPHITSRKKFLTLLIDATTLKDDVWI
LSASISYENRAVPIYMELWEGVNQKYDYWARVIGFVRNMRKYLPDKFSYVIIADRGFQGE
RLPKEFKKLKLDYIIRIGENYHIKTKNGEEWRELSLLDDGKYNEVVLGKTNSIEGVNVIV
SSIKDAENKKHLKWYLMSSIKDMEKEEVVGLYAKRMWIEESFKDLKGKLRWEEYTEKLPK
FDRIKKMVIISGLSYGIQLSLGSSKQVVEQRSKGESIIRGLQNALNGVSV

Asterisk in a fasta file generally means a stop codon. My question is does the asterisk mean that the protein sequence is incomplete?
Will the prediction that a given protein is a Cas protein be trusted if there is an asterisk at the end of protein sequence?
Can I estimate the size of the protein if there is an asterisk in the sequence?

Add support for numeric fasta headers

Fasta headers which only contain numbers are currently not allowed, as they result in errors several places in the script.

(Question) How can I plot svg post-hoc after a run?

I just ran this on about 60k genomes before I realized the Drawsvg version made changes that weren't address in the current CCTyper implementation (see #45).

Is there a way I can load the data that ran correctly so I can still plot it using CCTyper?

Expose minced options

Hello,

Is it possible to make the minced options available when calling cctyper ? Options like -minNR, -minRL, etc. are very useful when it comes to optimizing CRISPR array annotations. I have noticed that certain repeats are missed because the array did not carry the minimum number of repeats required.

Cheers,
Jimmy

A complement to CRISRPRCasFinder or its own thing?

Does this program stand on its own, or as a complement to CRISPRCasFinder? I have observed that in your papers you use both. Can CCTyper be used alone to detect CRISPR-Cas on sequences?

Assembly accession in web server

Make it possible to submit an Assembly accession (GCA_* or GCF_*) to the web server in addition to the currently implemented nuccore accessions.

alternative to minced?

Thanks for the great tool!

In some cases it might be nice to be able to use CRISPR repeat/spacer arrays identified using a different tool or identified previously. For example we could pass a gff with array coordinates.

About spacer sequences

I use the the software to construct the CRISPRCas database from the bacterial genome in the Refseq database.
But I found that some spacer sequences are not in the genome，for example（spacers/NC_015711.1_1.fa）：>NC_015711.1_1:1

>NC_015711.1_1:1
TCAACCAGCATTAGCACCGTCCGCGTGGCGCCCGTGT
>NC_015711.1_1:2
CTGGAGTTGTCCCCCGAGGCTGAGCCGGTGTCCCGCGT
>NC_015711.1_1:3
TCTTCCATCTGCGTCTGCGTCTGACCCTTGAACTTCG
>NC_015711.1_1:4
ATGCAGAACAGCGGCAAGGAGGCGATTATCGACCT
>NC_015711.1_1:5
GGGCAGTGAAACCCTTGGGTGGGGAAGGAGTTCTGGGGGC
>NC_015711.1_1:6
AGGAGCGCCCGCCGGCCAGACGCATAGACGACGCA

Am i missing something？

Thanks！

GCF_000219105.zip

Complete genome and draft genome

Hello, I am going to use this software to predict some Klebsiella pneumoniae genome data (some are fully assembled genome data, some are draft genome data at the Scaffolds level). Should I change the Prodigal Mode to make the prediction more accurate for different genomes?
Complete genome: cctyper KP1.fa my_output
A draft genome containing multiple sequences(scaffolds): cctyper KP2.fasta my_output2 --prodigal meta
Are these commands correct?
If I keep using a command without changing it, will the result be much less reliable ?
Best wishes！

Fusion proteins

Include HMMs for fusion proteins, such as Cas1-RT, Cas4-Cas1, and Cas6-RT-Cas1

Option to use Prodigal-gv

Prodigal-gv is a fork of Prodigal v2.6.3 that was modified to improve detection of alternative genetic codes (especially 4 and 15), which are not uncommon in certain phages families. Standard Prodigal (with or without -p meta) doesn't do a good job of detecting these cases, resulting in highly fragmented gene predictions. Given that CRISPR systems have been found in phages, it would be nice if this gene caller was an option when using cc-typer. The command line options are identical to standard prodigal, except the name of the program is prodigal-gv instead of prodigal.

https://github.com/apcamargo/prodigal-gv
https://anaconda.org/bioconda/prodigal-gv

Error in BLASTing for CRISPR near cas operons

Hi, I fed the software with metagenomic datasets but an error was raised as follows:

[2022-06-25 15:58:22] INFO: BLASTing for CRISPR near cas operons
Traceback (most recent call last):
File "/data/linzhl/anaconda3/envs/cctyper/bin/cctyper", line 95, in
repmatch.run()
File "/data/linzhl/anaconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/blast.py", line 37, in run
self.write_gff()
File "/data/linzhl/anaconda3/envs/cctyper/lib/python3.8/site-packages/cctyper/blast.py", line 369, in write_gff
all_seqs[::2] = cr.repeats
ValueError: attempt to assign sequence of size 36 to extended slice of size 35

This problem only came up in one dataset. Do you know what might be causing this?

cctyper dies with missing files

Hi Jakob,

I was running cctyper (fresh conda install) on a large fasta file (7.7GB) and it seems that it runs smoothly until minced step where it's trying to locate a missing file. Any idea on what might be the issue ?

Joseph.

cctyper -t 64 m64241e_210617_232502.hifi_reads.fasta cctyper.m64241e_210617_232502.hifi_reads.out
[2021-06-25 12:22:40] INFO: Running CRISPRCasTyper version 1.4.1
[2021-06-25 12:23:23] INFO: Predicting ORFs with prodigal
[2021-06-25 13:31:06] INFO: Running HMMER against Cas profiles
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 693/693 [13:06:04<00:00, 68.06s/it]
[2021-06-26 07:21:47] INFO: Parsing HMMER output
[2021-06-26 07:21:48] INFO: Subtyping putative operons
[2021-06-26 07:28:48] INFO: Predicting CRISPR arrays with minced
Traceback (most recent call last):
  File "/projects/codon_0000/apps/miniconda3/envs/cctyper/bin/cctyper", line 86, in <module>
    crispr_obj.run_minced()
  File "/projects/codon_0000/apps/miniconda3/envs/cctyper/lib/python3.9/site-packages/cctyper/minced.py", line 79, in run_minced
    self.write_spacers()
  File "/projects/codon_0000/apps/miniconda3/envs/cctyper/lib/python3.9/site-packages/cctyper/minced.py", line 156, in write_spacers
    f = open(self.out+'spacers/{}.fa'.format(crisp.crispr), 'w')
FileNotFoundError: [Errno 2] No such file or directory: 'cctyper.m64241e_210617_232502.hifi_reads.out/spacers/m64241e_210617_232502/165939000/ccs_1.fa'

(Question) provide GFF file for pre computed gene calls?

Let's say you already ran prodigal and/or have gene calls in GFF format, can you skip the prodigal run and provide the GFF file?

antiSMASH provides a similar option since it requires gene calls and positions but allows for precomputed GFF file to be used.

CrisprCasTyper webserver

The web server reports fatal error for every input the issue is persistent since a week.

Question about spacer direction

Dear Russel,

are the spacers correction for their orientation like CRISPRDetect/CRISPRDirection does?

With kind regards,
Daan

If minced detects a CRISPR with N's in the repeat sequence

.. xgboost will fail with a "Incompatible model" error. This should be handled more gracefully

castyping.py having an issue with positional arguments.

I've had cctyper working before on another machine. Not sure if you have seen this error before.

(cctyper)user assembly % cctype CIA1\ copy.fna CIA1_cctyper
zsh: command not found: cctype
(cctyper)user assembly % cctyper CIA1\ copy.fna CIA1_cctyper
/Users/user/miniconda3/envs/cctyper/lib/python3.9/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
warnings.warn(
[2023-05-18 10:59:17] INFO: Running CRISPRCasTyper version 1.3.0
[2023-05-18 10:59:18] INFO: Predicting ORFs with prodigal
[2023-05-18 10:59:28] INFO: Running HMMER against Cas profiles
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 691/691 [01:11<00:00, 9.65it/s]
[2023-05-18 11:01:14] INFO: Parsing HMMER output
[2023-05-18 11:01:14] INFO: Subtyping putative operons
Traceback (most recent call last):
File "/Users/user/miniconda3/envs/cctyper/bin/cctyper", line 80, in
castyper.typing()
File "/Users/user/miniconda3/envs/cctyper/lib/python3.9/site-packages/cctyper/castyping.py", line 294, in typing
single_effector_hmms = self.scores[self.scores['Hmm'].isin(list(specifics))].drop('Hmm', 1)
TypeError: drop() takes from 1 to 2 positional arguments but 3 were given

russel88 / crisprcastyper Goto Github PK

crisprcastyper's People

Contributors

Stargazers

Watchers

Forkers

crisprcastyper's Issues

Recommend Projects

Recommend Topics

Recommend Org