grunwaldlab / effectr Goto Github PK
View Code? Open in Web Editor NEWAn R package to call oomycete effectors
An R package to call oomycete effectors
Hi
I am trying to process >900MB protein file through your tool and getting following error
i think it has something to do with Hmmerbuid, could you please review and let me know if i can do some changes in some parameters and make this work. Currently i am using 15 threads.
Creating HMM profile
Fatal exception (source file esl_buffer.c, line 1599):
zero malloc disallowed
sh: line 1: 413532 Aborted '/cm/shared/apps/hmmer/3.2.1/bin/hmmbuild' '--amino' '--seed' '12345' 'hmmbuild.hmm' 'MAFFT.fasta' > /dev/null
Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)
HMM profile created.
Starting HMM searches
Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)
hmmsearch finished!
Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, num.threads = 15) :
HMM failed, please supply a valid absolute path to ORFs
Execution halted
I see
system("which mafft", intern = T)
in the file test_effector_summary.R
. I suspect this will not work on Windows.
Theorethical limitations:
good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins only having RXLR motifs and having EER motifs separately..
Does Perl come preinstalled on Windows? It is preinstalled on my computer. If it is not always present and the system(paste0("perl -pi -e 's/ {2,}/\t/g' ",hmmbuild.out))
calls in hmm.search
can be replaced with R code ( In think it should be easyish), it might be worth the effort to avoid a dependency.
Hi I am trying to use effectR and getting following error
library(effectR)
fasta.file <- "HSM6XRQW_contigs_pro.fasta"
ORF <- seqinr::read.fasta(fasta.file)
REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")
Creating HMM profile
Working... done.
Pressed and indexed 1 HMMs (1 names).
Models pressed into binary file: hmmbuild.hmm.h3m
SSI index for binary model file: hmmbuild.hmm.h3i
Profiles (MSV part) pressed into: hmmbuild.hmm.h3f
Profiles (remainder) pressed into: hmmbuild.hmm.h3p
HMM profile created.
Starting HMM searches
Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading
hmmsearch finished!
Error in hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) :
HMM failed, please supply a valid absolute path to ORFs
i have tried to use fasta.file instead of original file name, but still the same error
Prof. Lina Quesada raised a very interesting question: The length of the candidate effector protein has been used as one of the thresholds to determine the viability of the protein as a plausible effector, is there a way to add the length of the candidate protein to the effector.summary()
function?
Just started trying out the package. Very easy to use, I love it.
The first time I ran hmm.search
on my data, MAFFT finished successfully but then hmmsearch errored out. This was my own fault - I gave a relative path instead of an absolute path to the original.seq
parameter leading to my original ORFs. But because the hmm.search
function returned an error, it didn't return an object that had the finished alignment of regex candidate rxlrs. The second time around running hmm.search
, I gave the correct path to my original ORFs, and after another round of MAFFT I got my hmm candidate RXLRs
Since MAFFT doesn't need the original ORFs file, seems like users could save some time in not re-aligning their regex candidate RXLRs. A couple of ideas for solutions:
original.seq
actually leads to a fasta file. This is probably the easiest solution to help out people like me who make simple mistakeshmm.search
but the actual hmm search fails, maybe give a warning and return an object with only the Alignment
and REGEX
elements (and obviously without the HMM
, HMM_Table
elements)?hmm.search
. I know I could do this in the terminal itself... I've obviously got MAFFT installed so I could just run the alignment outside the R environment and use the import options you've already got. But it might be nice to integrate it all into the R session. I kind of like options 1 and 2 better than this one..good morning sir... this is Ramakrishna...
The high number of effector proteins predicted in the HMM step is a result of the low thresholds used by our package in order to obtain as many candidate effectors as possible
sir...... what is the threshold you have used in this package...
low threshold means what?
and one more is how did you separate the non-redundant and redundant candidates.
Here is a good source of bacterial effector motives:
When running tests I receive:
Error: MAFFT not found in the specified path: '//mafft'
Please check your MAFFT installation.
In addition: Warning messages:
1: running command 'which mafft' had status 1
2: running command 'which hmmsearch' had status 1
Execution halted
Exited with status 1.
This is reproduced as follows.
source('~/gits/effectR/tests/testthat/test_effector_summary.R')
Which dumps me into the browser on my system. Here I see:
Browse[1]> mafft.path
[1] "/"
which may be an unanticipated value?
good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins
{1,96}R\wLR == only having RXLR motifs separately and
{1,40}eer == having EER motifs separately..
Sir, actually I need a pattern search for proteins having only "^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this pattern and not separately, i don't want only RXLR and only EER having proteins.
I tried with subsetting complete motifs form data frame, even though I am getting proteins with
only RXLR and only EER and both.
I'm trying to run effectR to search for RXLR effectors. When I run the hmm.search function it runs the mafft alignment but then fails to run hmmer because there's not alignment file found. I have tried this again with the test infestans files included and it's still not working.
This is my work flow:
> fasta.file <- system.file("extdata", "test_infestans.fasta", package = "effectR")
> ORF <- seqinr::read.fasta(fasta.file)
> REGEX <- regex.search(ORF, motif="RxLR")
> candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX , alignment.file=NULL, save.alignment=T, mafft.path = "c:/mafft/", hmm.path = "c:/Users/alongmuir/Documents/hmmer3.0_windows/")
The alignment is then printed to the console, no candidate.rxlr file is made and then I get the following message:
MAFFT alignment finished!
Starting HMM
---
Creating HMM profile
Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, alignment.file = NULL, :
No MAFFT alignment found
I'm not sure if I'm doing something wrong here. Any help would be appreciated.
Hi,
EffectR is a brilliant package for identifying effectors.
I just had a question regarding the identification of proteins with custom motifs. For some reason, when I use a custom pattern, the resulting table does not give the domain numbers and positions. I was wondering if there is some way to fix this.
Thank you!
Best,
Savithri
After attempts to submit the package to CRAN, I received an email with a problem in CRAN:
Found the following (possibly) invalid URLs:
URL: https://cran.r-project.org/web/packages/seqinr/seqinr.pdf
From: inst/doc/effectR.html
Status: 200
Message: OK
CRAN URL not in canonical form
The canonical URL of the CRAN page for a package is
https://CRAN.R-project.org/package=pkgname
I think I solved the problem, but that might just mean that I'm omitting something or that there could be more potential problems and I really don't want to get banned from CRAN.
I was wondering if @knausb, @zkamvar and @zachary-foster could help me review the package real quick and check that its fine for CRAN. So far, I've succesfully:
Am I missing something else?
The effectR package seems to be working in OSX and Ubuntu. Need to test it on Windows
Forget about it, I guess I need to find ORFs with EMBOSS, not gene predicted using augustus on a masked genome
Hi
I used your effectR for the first time, and it finds many RxLRs in around 20000 proteins. But when I shift to "CRN" motif it finds none, but I can find many LFLAK domains manually. Not shure what I'm doing wrong.
Cheers
Erik
fasta.file = "/home/augustus.aa"
ORF <- seqinr::read.fasta(fasta.file)
length(ORF)
[1] 20834
crn.cand <- regex.search(ORF,motif = "CRN")
Error in regex.search(ORF, motif = "CRN") : No CRN sequences found.
From your publication
"Changing the motif="RxLR" option to “CRN” or “custom” will allow the prediction of these other motifs of interest without reloading the ORF dataset, thus significantly reducing processing time."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.