Coder Social home page Coder Social logo

effectr's People

Contributors

grunwald avatar neato-nick avatar tabima avatar zachary-foster avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

effectr's Issues

Fatal exception (source file esl_buffer.c, line 1599): zero malloc disallowed

Hi

I am trying to process >900MB protein file through your tool and getting following error

i think it has something to do with Hmmerbuid, could you please review and let me know if i can do some changes in some parameters and make this work. Currently i am using 15 threads.

Creating HMM profile

Fatal exception (source file esl_buffer.c, line 1599):
zero malloc disallowed
sh: line 1: 413532 Aborted '/cm/shared/apps/hmmer/3.2.1/bin/hmmbuild' '--amino' '--seed' '12345' 'hmmbuild.hmm' 'MAFFT.fasta' > /dev/null

Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)

HMM profile created.

Starting HMM searches

Error: File existence/permissions problem in trying to open HMM file hmmbuild.hmm.
HMM file hmmbuild.hmm not found (nor an .h3m binary of it)

hmmsearch finished!
Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, num.threads = 15) :
HMM failed, please supply a valid absolute path to ORFs
Execution halted

System which calls

I see

system("which mafft", intern = T)

in the file test_effector_summary.R. I suspect this will not work on Windows.

Adapt shiny app

Theorethical limitations:

  • How to use parameters in shiny via app launch?

Regarding regex pattern

good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins only having RXLR motifs and having EER motifs separately..

Replace perl one-liners with R equivalent?

Does Perl come preinstalled on Windows? It is preinstalled on my computer. If it is not always present and the system(paste0("perl -pi -e 's/ {2,}/\t/g' ",hmmbuild.out)) calls in hmm.search can be replaced with R code ( In think it should be easyish), it might be worth the effort to avoid a dependency.

HMM failed, please supply a valid absolute path to ORFs

Hi I am trying to use effectR and getting following error

library(effectR)
fasta.file <- "HSM6XRQW_contigs_pro.fasta"
ORF <- seqinr::read.fasta(fasta.file)
REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX)
No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT
Please be patient
MAFFT alignment finished!
Starting HMM

Creating HMM profile

Working... done.
Pressed and indexed 1 HMMs (1 names).
Models pressed into binary file: hmmbuild.hmm.h3m
SSI index for binary model file: hmmbuild.hmm.h3i
Profiles (MSV part) pressed into: hmmbuild.hmm.h3f
Profiles (remainder) pressed into: hmmbuild.hmm.h3p
HMM profile created.

Starting HMM searches

Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading

hmmsearch finished!
Error in hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) :
HMM failed, please supply a valid absolute path to ORFs


i have tried to use fasta.file instead of original file name, but still the same error

Add a column with the length of the effectR candidate in effector.summary

Prof. Lina Quesada raised a very interesting question: The length of the candidate effector protein has been used as one of the thresholds to determine the viability of the protein as a plausible effector, is there a way to add the length of the candidate protein to the effector.summary() function?

Separation of MAFFT and HMM search

Just started trying out the package. Very easy to use, I love it.

The first time I ran hmm.search on my data, MAFFT finished successfully but then hmmsearch errored out. This was my own fault - I gave a relative path instead of an absolute path to the original.seq parameter leading to my original ORFs. But because the hmm.search function returned an error, it didn't return an object that had the finished alignment of regex candidate rxlrs. The second time around running hmm.search, I gave the correct path to my original ORFs, and after another round of MAFFT I got my hmm candidate RXLRs

Since MAFFT doesn't need the original ORFs file, seems like users could save some time in not re-aligning their regex candidate RXLRs. A couple of ideas for solutions:

  1. Before running MAFFT, validate the path given to original.seq actually leads to a fasta file. This is probably the easiest solution to help out people like me who make simple mistakes
  2. If the MAFFT alignment succeeds when running hmm.search but the actual hmm search fails, maybe give a warning and return an object with only the Alignment and REGEX elements (and obviously without the HMM, HMM_Table elements)?
  3. Make another separate function to call MAFFT and save it into an object or file for executing with hmm.search. I know I could do this in the terminal itself... I've obviously got MAFFT installed so I could just run the alignment outside the R environment and use the import options you've already got. But it might be nice to integrate it all into the R session. I kind of like options 1 and 2 better than this one..

Issue related to HMM threshold

good morning sir... this is Ramakrishna...
The high number of effector proteins predicted in the HMM step is a result of the low thresholds used by our package in order to obtain as many candidate effectors as possible
sir...... what is the threshold you have used in this package...
low threshold means what?
and one more is how did you separate the non-redundant and redundant candidates.

Tests fail when mafft is not installed.

When running tests I receive:

Error: MAFFT not found in the specified path: '//mafft'
 Please check your MAFFT installation.
In addition: Warning messages:
1: running command 'which mafft' had status 1 
2: running command 'which hmmsearch' had status 1 
Execution halted

Exited with status 1.

This is reproduced as follows.

source('~/gits/effectR/tests/testthat/test_effector_summary.R')

Which dumps me into the browser on my system. Here I see:

Browse[1]> mafft.path
[1] "/"

which may be an unanticipated value?

Regarding regex pattern

good evening sir...
this is Ramakrishna.
"^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this regex pattern is searching for proteins
{1,96}R\wLR == only having RXLR motifs separately and
{1,40}eer == having EER motifs separately..

Sir, actually I need a pattern search for proteins having only "^\w{10,40}\w{1,96}R\wLR\w{1,40}eer" this pattern and not separately, i don't want only RXLR and only EER having proteins.
I tried with subsetting complete motifs form data frame, even though I am getting proteins with
only RXLR and only EER and both.

Mafft alignment failed to save

I'm trying to run effectR to search for RXLR effectors. When I run the hmm.search function it runs the mafft alignment but then fails to run hmmer because there's not alignment file found. I have tried this again with the test infestans files included and it's still not working.
This is my work flow:

> fasta.file <- system.file("extdata", "test_infestans.fasta", package = "effectR")
> ORF <- seqinr::read.fasta(fasta.file)
> REGEX <- regex.search(ORF, motif="RxLR")
> candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX , alignment.file=NULL, save.alignment=T, mafft.path = "c:/mafft/", hmm.path = "c:/Users/alongmuir/Documents/hmmer3.0_windows/")

The alignment is then printed to the console, no candidate.rxlr file is made and then I get the following message:

MAFFT alignment finished!
Starting HMM
---
Creating HMM profile

Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX, alignment.file = NULL,  : 
  No MAFFT alignment found

I'm not sure if I'm doing something wrong here. Any help would be appreciated.

custom pattern not giving domain positions

Hi,

EffectR is a brilliant package for identifying effectors.
I just had a question regarding the identification of proteins with custom motifs. For some reason, when I use a custom pattern, the resulting table does not give the domain numbers and positions. I was wondering if there is some way to fix this.
Thank you!

Best,
Savithri

CRAN issues

After attempts to submit the package to CRAN, I received an email with a problem in CRAN:

Found the following (possibly) invalid URLs:
 URL: https://cran.r-project.org/web/packages/seqinr/seqinr.pdf
   From: inst/doc/effectR.html
   Status: 200
   Message: OK
   CRAN URL not in canonical form
 The canonical URL of the CRAN page for a package is
   https://CRAN.R-project.org/package=pkgname

I think I solved the problem, but that might just mean that I'm omitting something or that there could be more potential problems and I really don't want to get banned from CRAN.

I was wondering if @knausb, @zkamvar and @zachary-foster could help me review the package real quick and check that its fine for CRAN. So far, I've succesfully:

  • Build the package locally and via winbuilder
  • Passed the tests and checks locally
  • Installed and passed on Travis-CI (https://travis-ci.com/Tabima/effectR/jobs/105758494)
  • Build, installed and used the package in Windows, MacOSX and Linux (Ubuntu and CentOS)

Am I missing something else?

Close this "Does not find CRN domains in P. infestans proteome"

Forget about it, I guess I need to find ORFs with EMBOSS, not gene predicted using augustus on a masked genome


Hi

I used your effectR for the first time, and it finds many RxLRs in around 20000 proteins. But when I shift to "CRN" motif it finds none, but I can find many LFLAK domains manually. Not shure what I'm doing wrong.

Cheers
Erik

fasta.file = "/home/augustus.aa"
ORF <- seqinr::read.fasta(fasta.file)
length(ORF)
[1] 20834
crn.cand <- regex.search(ORF,motif = "CRN")
Error in regex.search(ORF, motif = "CRN") : No CRN sequences found.

From your publication
"Changing the motif="RxLR" option to “CRN” or “custom” will allow the prediction of these other motifs of interest without reloading the ORF dataset, thus significantly reducing processing time."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.