grunwaldlab / effectr Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 7.0 184 KB

An R package to call oomycete effectors

R 100.00%

effectr's People

Contributors

Stargazers

Watchers

Forkers

zachary-foster knausb bairupraveen ramakrishna0007 savipuray biogeeker neato-nick

effectr's Issues

Create tests

http://r-pkgs.had.co.nz/tests.html

Fatal exception (source file esl_buffer.c, line 1599): zero malloc disallowed

I am trying to process >900MB protein file through your tool and getting following error

i think it has something to do with Hmmerbuid, could you please review and let me know if i can do some changes in some parameters and make this work. Currently i am using 15 threads.

Creating HMM profile

Fatal exception (source file esl_buffer.c, line 1599):
zero malloc disallowed
sh: line 1: 413532 Aborted '/cm/shared/apps/hmmer/3.2.1/bin/hmmbuild' '--amino' '--seed' '12345' 'hmmbuild.hmm' 'MAFFT.fasta' > /dev/null

Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)

HMM profile created.

Starting HMM searches

Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)

hmmsearch finished!
Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, num.threads = 15) :
HMM failed, please supply a valid absolute path to ORFs
Execution halted

System which calls

I see

system("which mafft", intern = T)

in the file test_effector_summary.R. I suspect this will not work on Windows.

Adapt shiny app

Theorethical limitations:

How to use parameters in shiny via app launch?

Regarding regex pattern

good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins only having RXLR motifs and having EER motifs separately..

Replace perl one-liners with R equivalent?

Does Perl come preinstalled on Windows? It is preinstalled on my computer. If it is not always present and the system(paste0("perl -pi -e 's/ {2,}/\t/g' ",hmmbuild.out)) calls in hmm.search can be replaced with R code ( In think it should be easyish), it might be worth the effort to avoid a dependency.

HMM failed, please supply a valid absolute path to ORFs

Hi I am trying to use effectR and getting following error

library(effectR)
fasta.file <- "HSM6XRQW_contigs_pro.fasta"
ORF <- seqinr::read.fasta(fasta.file)
REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX)
No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT
Please be patient
MAFFT alignment finished!
Starting HMM

Creating HMM profile

Working... done.
Pressed and indexed 1 HMMs (1 names).
Models pressed into binary file: hmmbuild.hmm.h3m
SSI index for binary model file: hmmbuild.hmm.h3i
Profiles (MSV part) pressed into: hmmbuild.hmm.h3f
Profiles (remainder) pressed into: hmmbuild.hmm.h3p
HMM profile created.

Starting HMM searches

Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading

hmmsearch finished!
Error in hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) :
HMM failed, please supply a valid absolute path to ORFs

i have tried to use fasta.file instead of original file name, but still the same error

Add a column with the length of the effectR candidate in effector.summary

Prof. Lina Quesada raised a very interesting question: The length of the candidate effector protein has been used as one of the thresholds to determine the viability of the protein as a plausible effector, is there a way to add the length of the candidate protein to the effector.summary() function?

Separation of MAFFT and HMM search

Just started trying out the package. Very easy to use, I love it.

The first time I ran hmm.search on my data, MAFFT finished successfully but then hmmsearch errored out. This was my own fault - I gave a relative path instead of an absolute path to the original.seq parameter leading to my original ORFs. But because the hmm.search function returned an error, it didn't return an object that had the finished alignment of regex candidate rxlrs. The second time around running hmm.search, I gave the correct path to my original ORFs, and after another round of MAFFT I got my hmm candidate RXLRs

Since MAFFT doesn't need the original ORFs file, seems like users could save some time in not re-aligning their regex candidate RXLRs. A couple of ideas for solutions:

Before running MAFFT, validate the path given to original.seq actually leads to a fasta file. This is probably the easiest solution to help out people like me who make simple mistakes
If the MAFFT alignment succeeds when running hmm.search but the actual hmm search fails, maybe give a warning and return an object with only the Alignment and REGEX elements (and obviously without the HMM, HMM_Table elements)?
Make another separate function to call MAFFT and save it into an object or file for executing with hmm.search. I know I could do this in the terminal itself... I've obviously got MAFFT installed so I could just run the alignment outside the R environment and use the import options you've already got. But it might be nice to integrate it all into the R session. I kind of like options 1 and 2 better than this one..

Issue related to HMM threshold

good morning sir... this is Ramakrishna...
The high number of effector proteins predicted in the HMM step is a result of the low thresholds used by our package in order to obtain as many candidate effectors as possible
sir...... what is the threshold you have used in this package...
low threshold means what?
and one more is how did you separate the non-redundant and redundant candidates.

Consider adding a few more examples of pathogen effectors

Here is a good source of bacterial effector motives:

https://www.staff.ncl.ac.uk/p.dean/Bacterial_effectors_and_their_/body_bacterial_effectors_and_their_.html

Tests fail when mafft is not installed.

When running tests I receive:

Error: MAFFT not found in the specified path: '//mafft'
 Please check your MAFFT installation.
In addition: Warning messages:
1: running command 'which mafft' had status 1 
2: running command 'which hmmsearch' had status 1 
Execution halted

Exited with status 1.

This is reproduced as follows.

source('~/gits/effectR/tests/testthat/test_effector_summary.R')

Which dumps me into the browser on my system. Here I see:

Browse[1]> mafft.path
[1] "/"

which may be an unanticipated value?

add biblio on vignette

Regarding regex pattern

good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins
{1,96}R\wLR == only having RXLR motifs separately and
{1,40}eer == having EER motifs separately..

Sir, actually I need a pattern search for proteins having only "^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this pattern and not separately, i don't want only RXLR and only EER having proteins.
I tried with subsetting complete motifs form data frame, even though I am getting proteins with
only RXLR and only EER and both.

Mafft alignment failed to save

I'm trying to run effectR to search for RXLR effectors. When I run the hmm.search function it runs the mafft alignment but then fails to run hmmer because there's not alignment file found. I have tried this again with the test infestans files included and it's still not working.
This is my work flow:

> fasta.file <- system.file("extdata", "test_infestans.fasta", package = "effectR")
> ORF <- seqinr::read.fasta(fasta.file)
> REGEX <- regex.search(ORF, motif="RxLR")
> candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX , alignment.file=NULL, save.alignment=T, mafft.path = "c:/mafft/", hmm.path = "c:/Users/alongmuir/Documents/hmmer3.0_windows/")

The alignment is then printed to the console, no candidate.rxlr file is made and then I get the following message:

MAFFT alignment finished!
Starting HMM
---
Creating HMM profile

Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, alignment.file = NULL,  : 
  No MAFFT alignment found

I'm not sure if I'm doing something wrong here. Any help would be appreciated.

custom pattern not giving domain positions

Hi,

EffectR is a brilliant package for identifying effectors.
I just had a question regarding the identification of proteins with custom motifs. For some reason, when I use a custom pattern, the resulting table does not give the domain numbers and positions. I was wondering if there is some way to fix this.
Thank you!

Best,
Savithri

CRAN issues

After attempts to submit the package to CRAN, I received an email with a problem in CRAN:

Found the following (possibly) invalid URLs:
 URL: https://cran.r-project.org/web/packages/seqinr/seqinr.pdf
   From: inst/doc/effectR.html
   Status: 200
   Message: OK
   CRAN URL not in canonical form
 The canonical URL of the CRAN page for a package is
   https://CRAN.R-project.org/package=pkgname

I think I solved the problem, but that might just mean that I'm omitting something or that there could be more potential problems and I really don't want to get banned from CRAN.

I was wondering if @knausb, @zkamvar and @zachary-foster could help me review the package real quick and check that its fine for CRAN. So far, I've succesfully:

Build the package locally and via winbuilder
Passed the tests and checks locally
Installed and passed on Travis-CI (https://travis-ci.com/Tabima/effectR/jobs/105758494)
Build, installed and used the package in Windows, MacOSX and Linux (Ubuntu and CentOS)

Am I missing something else?

Testing package in windows

The effectR package seems to be working in OSX and Ubuntu. Need to test it on Windows

Close this "Does not find CRN domains in P. infestans proteome"

Forget about it, I guess I need to find ORFs with EMBOSS, not gene predicted using augustus on a masked genome

I used your effectR for the first time, and it finds many RxLRs in around 20000 proteins. But when I shift to "CRN" motif it finds none, but I can find many LFLAK domains manually. Not shure what I'm doing wrong.

Cheers
Erik

fasta.file = "/home/augustus.aa"
ORF <- seqinr::read.fasta(fasta.file)
length(ORF)
[1] 20834
crn.cand <- regex.search(ORF,motif = "CRN")
Error in regex.search(ORF, motif = "CRN") : No CRN sequences found.

From your publication
"Changing the motif="RxLR" option to “CRN” or “custom” will allow the prediction of these other motifs of interest without reloading the ORF dataset, thus significantly reducing processing time."

grunwaldlab / effectr Goto Github PK

effectr's People

Contributors

Stargazers

Watchers

Forkers

effectr's Issues

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT Please be patient MAFFT alignment finished! Starting HMM

Recommend Projects

Recommend Topics

Recommend Org

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX)
No alignment file is provided. Starting alignment with MAFFT.

Executing MAFFT
Please be patient
MAFFT alignment finished!
Starting HMM