hajkd / ltrpred Goto Github PK

View Code? Open in Web Editor NEW

44.0 5.0 8.0 8.45 MB

De novo annotation of young retrotransposons

Home Page: https://hajkd.github.io/LTRpred/

License: GNU General Public License v2.0

R 78.33% UnrealScript 12.84% TeX 8.10% Dockerfile 0.73%

ltr ltr-transposons ltr-retrotransposons genome evolution pipeline diversification

ltrpred's People

Stargazers

Watchers

Forkers

gogleva anandksrao gnmcsbnfrmtcsclb makuzman mandyzhang6 altingia notebookofxiaoming lanasushko

ltrpred's Issues

Seeking advice for my test case, test run output validation, and post-processing of files

I seek help in a few different areas, by way of your response to this post. Some context first - Here are my LTR discovery goals. For my genome of interest, I want to report the following:
Type 1. Full-length LTRs
Type 2. (Internally) Truncated LTRs (both ends intact)
Type 3. Solo LTRs (missing one of the ends)
Type 4. Orphan LTRs (missing both ends, but with reecognizable internal features)
Can your LTRpred tool report all these 4 cases listed above, or only LTRs of type 1 and type 3 ?

In any case, installation of a few missing dependencies went smoothly. Most of them, I already had installed on my laptop. I ran LTRpred on a full-length genome, as a test run, with quick run syntax shown below.

library(LTRpred)
LTRpred(genome.file = "TEST.fasta", cluster = TRUE, cores = 4)

The run's STDOUT is shown in the attached file:
LTRpred_TEST_STDOUT_UPDATED.txt

The output folder contents are shown in another attached file:
LTRpred_FolderListing_Expanded.txt

Here are my observations about the output files / folders - could you please add your thoughts to these?

_ltrdigest/_index_ltrdigest.fsa : The suffixarray index file used to predict putative LTR retrotransposonswith LTRdigest.

MISSING - Is this a problem? But derivative files .fsa are present, as shown below

$ ls *_index_ltrdigest.fsa*
TEST_index_ltrdigest.fsa.des
TEST_index_ltrdigest.fsa.md5
TEST_index_ltrdigest.fsa.sds
TEST_index_ltrdigest.fsa.esq
TEST_index_ltrdigest.fsa.prj
TEST_index_ltrdigest.fsa.ssp

_ltrdigest/-ltrdigest_pdom__ali.fas : Stores the alignment information for all matches of the given protein domain model to the translations of all candidates.

MISSING - Is this a problem? However, other related file are generated, see below:

ls *-ltrdigest_pdom_*.fas
TEST-ltrdigest_pdom_RNase_H.fas           TEST-ltrdigest_pdom_RVT_2.fas             TEST-ltrdigest_pdom_rve.fas
TEST-ltrdigest_pdom_RNase_H_aa.fas        TEST-ltrdigest_pdom_RVT_2_aa.fas          TEST-ltrdigest_pdom_rve_aa.fas
TEST-ltrdigest_pdom_RVT_1.fas             TEST-ltrdigest_pdom_Retrotrans_gag.fas
TEST-ltrdigest_pdom_RVT_1_aa.fas          TEST-ltrdigest_pdom_Retrotrans_gag_aa.fas

*_orfs_nt.fsa : Stores the predicted open reading frames within the predicted LTR transposons as DNA sequence.

MISSING - Is this a problem?

*_orfs_aa.fsa : Stores the predicted open reading frames within the predicted LTR transposons as protein sequence.

MISSING - Is this a problem?

*_LTRpred_DataSheet.csv : Stores the output table as data sheet.

MISSING - Is this a problem?

However, tsv file created. Can this tsv output be used interchangably for downstream processing, instead of the expected csv? Or do I need to use your `pred2csv` function?

For generating GFF file containing 4 different types of LTRs based on their degree of sequence completeness (see top of this post), has LTRpred already generated files that I can parse to glean this information using sed, awk, Perl etc. We are total newbies at R.

If not, I'd like to first start with generating solo-LTR annotation. My understanding is that for this, I would need to use ltr.cn, am I right? If not, what functions do I use? Where can I find this info?

     ltr.cn(data.sheet, LTR.fasta_3ltr, LTR.fasta_5ltr, genome,
       ltr.similarity = 70, scope.cutoff = 0.85, perc.ident.cutoff = 70,
       output = NULL, max.hits = 65000, eval = 1e-10, cores = 1)

Thanks, in advance.
Charlotte

How to pull LTRpred in singularity

Hai
Thank you for your software.

Since I had problems accessing files with Docker, I installed Singularity, but I am not able to pull it.

Please give me your suggestions.

kind regards ramky

How to fix this run error?

Hi,

How may I fix this run error for executing the S. cerevisiae test case, using your LTRpred?

BTW, the 1st test with Hsapiens_ChrY_ltrpred input completed without any error or warning messages. More details below. If you need any additional information, please let me know.

Please help. Thank you, in advance.

PS. I much prefer to use R terminal rather than Docker container etc

R version information (updated just today)

$ R --version
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

STDOUT of failed test run using downloaded S. cerevisiae genome as input

> Scerevisiae_genome
[1] "_ncbi_downloads/genomes/Saccharomyces_cerevisiae_genomic_refseq.fna.gz"
> LTRpred(
+     genome.file = Scerevisiae_genome,
+     trnas       = paste0(system.file("tRNAs/", package = "LTRpred"),"sacCer3-tRNAs.fa"),
+     hmms        = paste0(system.file("HMMs/", package = "LTRpred"), "hmm_*"),
+     cluster     = TRUE,
+     clust.sim   = 0.9,
+     copy.number.est = TRUE,  
+     cores = 4
+ )
vsearch v2.14.1_macos_x86_64, 8.0GB RAM, 4 cores
https://github.com/torognes/vsearch

Running LTRpred on genome '_ncbi_downloads/genomes/Saccharomyces_cerevisiae_genomic_refseq.fna.gz' with 4 core(s) and searching for retrotransposons using the overlaps option (overlaps = 'no') ...


The output folder '/Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrpred' does not seem to exist yet and will be created ...


LTRpred - Step 1:
Run LTRharvest...
LTRharvest: Generating index file Saccharomyces_cerevisiae_genomic_refseq_ltrharvest/Saccharomyces_cerevisiae_genomic_refseq_index.fsa with gt suffixerator...
Running LTRharvest and writing results to Saccharomyces_cerevisiae_genomic_refseq_ltrharvest...
LTRharvest analysis finished!


LTRpred - Step 2:
Run LTRdigest...
Generating index file Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to Saccharomyces_cerevisiae_genomic_refseq_ltrdigest...
LTRdigest analysis finished!


LTRpred - Step 3:
Import LTRdigest Predictions...

Input:  Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevisiae_genomic_refseq_LTRdigestPrediction.gff  -> Row Number:  283
Remove 'NA' -> New Row Number:  283
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.


LTRpred - Step 4:
Perform ORF Prediction using 'usearch -fastx_findorfs' ...
usearch v11.0.667_i86osx32, 4.0Gb RAM (8.6Gb total), 4 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

License: [email protected]

00:00 2.2Mb   100.0% Working

WARNING: Input has lower-case masked sequences

Join ORF Prediction table: nrow(df) = 36 candidates.
unique(ID) = 36 candidates.
unique(orf.id) = 36 candidates.
Perform clustering of similar LTR transposons using 'vsearch --cluster_fast' ...
vsearch v2.14.1_macos_x86_64, 8.0GB RAM, 4 cores
https://github.com/torognes/vsearch

Running CLUSTpred with 90% as sequence similarity threshold using 4 cores ...
vsearch v2.14.1_macos_x86_64, 8.0GB RAM, 4 cores
https://github.com/torognes/vsearch

Reading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevReading file /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrdigest/Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas 100%
248182 nt in 36 seqs, min 5189, max 24168, avg 6894
Sorting by length 100%
Counting k-mers 100% 
Clustering 100%  
Sorting clusters 100%
Writing clusters 100% 
Clusters: 10 Size min 1, max 19, avg 3.6
Singletons: 7, 19.4% of seqs, 70.0% of clusters
Sorting clusters by abundance 100%
CLUSTpred output has been stored in: /Users/anand/Desktop/Mtr_Nod_TC_Working_Docs/LTRpred/Saccharomyces_cerevisiae_genomic_refseq_ltrpred
Join Cluster table: nrow(df) = 36 candidates.
unique(ID) = 36 candidates.
unique(orf.id) = 36 candidates.
Error: Problem with `summarise()` input `Clust_cn`.
✖ could not find function "n"
ℹ Input `Clust_cn` is `n()`.
ℹ The error occurred in group 1: Clust_Cluster = "cl_3".
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr_error>
Problem with `summarise()` input `Clust_cn`.
✖ could not find function "n"
ℹ Input `Clust_cn` is `n()`.
ℹ The error occurred in group 1: Clust_Cluster = "cl_3".
Backtrace:
 1. LTRpred::LTRpred(...)
 8. base::.handleSimpleError(...)
 9. dplyr:::h(simpleError(msg, call))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/dplyr_error>
Problem with `summarise()` input `Clust_cn`.
✖ could not find function "n"
ℹ Input `Clust_cn` is `n()`.
ℹ The error occurred in group 1: Clust_Cluster = "cl_3".
Backtrace:
    █
 1. ├─LTRpred::LTRpred(...)
 2. │ ├─dplyr::filter(...)
 3. │ ├─dplyr::summarise(dplyr::group_by(res, Clust_Cluster), Clust_cn = n())
 4. │ └─dplyr:::summarise.grouped_df(...)
 5. │   └─dplyr:::summarise_cols(.data, ...)
 6. │     ├─base::withCallingHandlers(...)
 7. │     └─mask$eval_all_summarise(quo)
 8. └─base::.handleSimpleError(...)
 9.   └─dplyr:::h(simpleError(msg, call))

Session Information

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] LTRpred_1.1.0  devtools_2.3.1 usethis_1.6.1 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           ape_5.4-1            lattice_0.20-41     
 [4] prettyunits_1.1.1    ps_1.3.4             Biostrings_2.56.0   
 [7] assertthat_0.2.1     rprojroot_1.3-2      digest_0.6.25       
[10] utf8_1.1.4           BiocFileCache_1.12.1 R6_2.4.1            
[13] backports_1.1.8      stats4_4.0.2         RSQLite_2.2.0       
[16] httr_1.4.2           pillar_1.4.6         zlibbioc_1.34.0     
[19] rlang_0.4.7          progress_1.2.2       curl_4.3            
[22] callr_3.4.3          blob_1.2.1           S4Vectors_0.26.1    
[25] desc_1.2.0           downloader_0.4       readr_1.3.1         
[28] stringr_1.4.0        bit_4.0.4            biomaRt_2.44.1      
[31] philentropy_0.4.0    compiler_4.0.2       askpass_1.1         
[34] pkgconfig_2.0.3      BiocGenerics_0.34.0  pkgbuild_1.1.0      
[37] tcltk_4.0.2          openssl_1.4.2        tidyselect_1.1.0    
[40] tibble_3.0.3         IRanges_2.22.2       XML_3.99-0.5        
[43] fansi_0.4.1          dbplyr_1.4.4         crayon_1.3.4        
[46] dplyr_1.0.2          biomartr_0.9.2       withr_2.2.0         
[49] rappdirs_0.3.1       grid_4.0.2           nlme_3.1-148        
[52] lifecycle_0.2.0      DBI_1.1.0            magrittr_1.5        
[55] cli_2.0.2            stringi_1.4.6        XVector_0.28.0      
[58] fs_1.5.0             remotes_2.2.0        testthat_2.3.2      
[61] ellipsis_0.3.1       generics_0.0.2       vctrs_0.3.2         
[64] tools_4.0.2          bit64_4.0.2          Biobase_2.48.0      
[67] glue_1.4.1           purrr_0.3.4          hms_0.5.3           
[70] processx_3.4.3       pkgload_1.1.0        parallel_4.0.2      
[73] AnnotationDbi_1.50.3 BiocManager_1.30.10  sessioninfo_1.1.1   
[76] memoise_1.1.0

ltrpred_data folder contents

Saccharomyces_cerevisiae_genomic_refseq_ltrpred:
total 464
-rw-r--r--   1 anand  staff   1.3K Aug 20 13:25 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas_CLUSTpred.log
-rw-r--r--   1 anand  staff   4.1K Aug 20 13:25 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas_CLUSTpred.uc
-rw-r--r--   1 anand  staff   2.7K Aug 20 13:25 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas_CLUSTpred.blast6out
drwxr-xr-x   6 anand  staff   192B Aug 20 13:11 .
-rw-r--r--   1 anand  staff   213K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas_ORF_prediction_nt.fsa
drwxr-xr-x@ 24 anand  staff   768B Aug 20 13:11 ..

Saccharomyces_cerevisiae_genomic_refseq_ltrdigest:
total 6912
-rw-r--r--   1 anand  staff    28K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_LTRdigestPrediction.gff
-rw-r--r--   1 anand  staff   248K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_complete.fas
-rw-r--r--   1 anand  staff    16K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_3ltr.fas
-rw-r--r--   1 anand  staff    15K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_5ltr.fas
-rw-r--r--   1 anand  staff   106B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_ppt.fas
-rw-r--r--   1 anand  staff   112B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pbs.fas
-rw-r--r--   1 anand  staff   5.5K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_tabout.csv
-rw-r--r--   1 anand  staff    19K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_2.fas
-rw-r--r--   1 anand  staff    27K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_2.ali
-rw-r--r--   1 anand  staff   7.1K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_2_aa.fas
-rw-r--r--   1 anand  staff    12K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_rve.fas
-rw-r--r--   1 anand  staff    16K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_rve.ali
-rw-r--r--   1 anand  staff   4.7K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_rve_aa.fas
-rw-r--r--   1 anand  staff   522B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_1.fas
-rw-r--r--   1 anand  staff   912B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_1.ali
-rw-r--r--   1 anand  staff   198B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_RVT_1_aa.fas
drwxr-xr-x  28 anand  staff   896B Aug 20 13:11 .
-rw-r--r--   1 anand  staff   306B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_Retrotrans_gag.fas
-rw-r--r--   1 anand  staff   443B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_Retrotrans_gag.ali
-rw-r--r--   1 anand  staff   126B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_pdom_Retrotrans_gag_aa.fas
-rw-r--r--   1 anand  staff   1.9K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq-ltrdigest_conditions.csv
-rw-r--r--   1 anand  staff   433B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.prj
-rw-r--r--   1 anand  staff    72B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.ssp
-rw-r--r--   1 anand  staff   2.9M Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.esq
-rw-r--r--   1 anand  staff   561B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.md5
-rw-r--r--   1 anand  staff   128B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.sds
-rw-r--r--   1 anand  staff   1.3K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index_ltrdigest.fsa.des
drwxr-xr-x@ 24 anand  staff   768B Aug 20 13:11 ..

Saccharomyces_cerevisiae_genomic_refseq_ltrharvest:
total 236808
-rw-r--r--   1 anand  staff    21K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_Prediction_sorted.gff
drwxr-xr-x  16 anand  staff   512B Aug 20 13:11 .
drwxr-xr-x@ 24 anand  staff   768B Aug 20 13:11 ..
-rw-r--r--   1 anand  staff   4.3K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_Details.tsv
-rw-r--r--   1 anand  staff   222K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_BetweenLTRSeqs.fsa
-rw-r--r--   1 anand  staff   250K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_FullLTRretrotransposonSeqs.fsa
-rw-r--r--   1 anand  staff    21K Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_Prediction.gff
-rw-r--r--   1 anand  staff   4.0M Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.llv
-rw-r--r--   1 anand  staff    12M Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.lcp
-rw-r--r--   1 anand  staff   465B Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.prj
-rw-r--r--   1 anand  staff    93M Aug 20 13:11 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.suf
-rw-r--r--   1 anand  staff    72B Aug 20 13:10 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.ssp
-rw-r--r--   1 anand  staff   2.9M Aug 20 13:10 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.esq
-rw-r--r--   1 anand  staff   561B Aug 20 13:10 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.md5
-rw-r--r--   1 anand  staff   128B Aug 20 13:10 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.sds
-rw-r--r--   1 anand  staff   1.3K Aug 20 13:10 Saccharomyces_cerevisiae_genomic_refseq_index.fsa.des

_ncbi_downloads:
total 0
drwxr-xr-x@ 24 anand  staff   768B Aug 20 13:11 ..
drwxr-xr-x   6 anand  staff   192B Aug 20 13:10 genomes
drwxr-xr-x   3 anand  staff    96B Aug 20 13:10 .

Hsapiens_ChrY_ltrpred:
total 136
drwxr-xr-x@ 24 anand  staff   768B Aug 20 13:11 ..
-rw-r--r--   1 anand  staff   1.4K Aug 20 12:55 Hsapiens_ChrY_LTRpred.bed
-rw-r--r--   1 anand  staff    13K Aug 20 12:55 Hsapiens_ChrY_LTRpred.gff
-rw-r--r--   1 anand  staff    16K Aug 20 12:55 Hsapiens_ChrY_LTRpred_DataSheet.tsv
drwxr-xr-x  12 anand  staff   384B Aug 20 12:55 .
-rw-r--r--   1 anand  staff    11K Aug 20 12:55 Hsapiens_ChrY-ltrdigest_complete.fas_ORF_prediction_nt.fsa
drwxr-xr-x  18 anand  staff   576B Aug 20 12:48 Hsapiens_ChrY_ltrdigest
drwxr-xr-x  15 anand  staff   480B Aug 20 12:47 Hsapiens_ChrY_ltrharvest
-rw-r--r--   1 anand  staff   850B Aug 19 12:30 CLUSTpred.log
-rw-r--r--   1 anand  staff   2.7K Aug 19 12:30 CLUSTpred.uc
-rw-r--r--   1 anand  staff     0B Aug 19 12:11 CLUSTpred.blast6out
-rw-r--r--   1 anand  staff    11K Aug 19 12:11 Hsapiens_ChrY-ltrdigest_complete.fas_nt.fsa

ltrdigest_complete.fas_DfamAnnotation.out' does not exist

Hi LTRpred is crushing after step 4. Dfam is manually downlaoded and kept in the directory where ltrpred set to run and assigned with annotate = "Dfam", Dfam.db = "dfam" in the ltrpred R script

:~/ltrpred$ ls -lht dfam
Dfam.hmm.h3f
Dfam.hmm.h3i
Dfam.hmm.h3m
Dfam.hmm.h3p
Dfam.hmm

perl /usr/local/bin/dfamscan.pl -help
Command line options for controlling /usr/local/bin/dfamscan.pl
-------------------------------------------------------------------------------
   --help       : prints this help messeage
   --version    : prints version information for this program and
                  both nhmmscan and trf
   Requires either
    --dfam_infile <s>    Use this is you've already run nhmmscan, and
                         just want to perfom dfamscan filtering/sorting.
                         The file must be the one produced by nhmmscan's
                         --dfamtblout flag.
                         (Note: must be nhmmscan output, not nhmmer output)
   or both of these
    --fastafile <s>      Use these if you want dfamscan to control a
    --hmmfile <s>        run of nhmmscan, then do filtering/sorting

`LTRpred - Step 4:
Perform ORF Prediction using 'usearch -fastx_findorfs' ...
00:00 37Mb      0.1% Working^M00:01 37Mb      0.2% Working^M00:02 88Mb     63.3% Working^M00:02 121Mb   100.0% Working

WARNING: Input has lower-case masked sequences

Join ORF Prediction table: nrow(df) = 4828 candidates.
unique(ID) = 4828 candidates.
unique(orf.id) = 4828 candidates.

A HMMer search against the Dfam database located at 'dfam' using 16 cores is performed to annotate de novo predicted retrotransposons ...
Run Dfam scan...
Fatal exception (source file esl_hmm.c, line 198):
malloc of size -307968 failed
Aborted (core dumped)
Error running command:
nhmmscan --noali -E 0.001 --dfamtblout /tmp/nXeK2iJYcP --cpu=16 dfam/Dfam.hmm /home/ltrpred/epo_ltrdigest/epo-ltrdigest_complete.fas
Finished Dfam scan!
A dfam query file has been generated and stored at/home/ltrpred/epo-ltrdigest_complete.fas_DfamAnnotation.out.

Error: The file '/home/ltrpred/epo-ltrdigest_complete.fas_DfamAnnotation.out' does not exist! Please check the correct path to the dfam.file.

In addition: Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Execution halted`

Any suggesion please?

Running LTRpred docker container in userspace with udocker

The LTRpred docker container can be run on systems where docker is not available and root privileges are not an option via udocker. Udocker is written in Python and can be installed via pip. The following sequence of commands starts up the LTRpred container in user space:

# Create and activate conda environment, optional
conda create -n udocker_prod -c defaults python=2.7
conda activate udocker_prod

# Install most recent version from udocker github repo
pip install git+https://github.com/indigo-dc/udocker

# Prepare container
udocker pull drostlab/ltrpred
udocker create --name=ltrpred drostlab/ltrpred

# Required on some systems to run the container
export PROOT_NO_SECCOMP=1

# Run container
udocker run ltrpred

Paper typo

Dear @HajkD,

I believe there is a typo here:
https://github.com/HajkD/LTRpred/blob/master/paper.md line 33

Here, we introduce the LTRpred pipeline which allows to de novo annotate functional and thuse potentially...

Fasta header truncated in ltrdigest_tabout.csv

Hi,

I found the fasta header is truncated in ltrdigest_tabout.csv.

$awk '{print $4}' ltrdigest_tabout.csv |head
acc|NEIGHBOR|GQ36565
acc|NEIGHBOR|GQ36825
acc|GENBANK|HQ704802
acc|GENBANK|HQ704802
acc|GENBANK|HQ704802
acc|NEIGHBOR|GU11155
acc|GENBANK|GU385355
acc|GENBANK|GU385356
acc|GENBANK|GU385357

The original fasta header is like this:

acc|GENBANK|HQ704802.1|Organic Lake phycodnavirus 1 genomic sequence.|Organic Lake phycodnavirus 1|ENV|25-JUL-2016
acc|NEIGHBOR|GQ365650.1|HIV-1 isolate 05.BR.NSP24 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009
acc|NEIGHBOR|GQ365651.1|HIV-1 isolate 01.BR.RGS45 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009
acc|NEIGHBOR|GQ365652.1|HIV-1 isolate 01.BR.RGS69 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009
acc|NEIGHBOR|GQ368252.1|Avian adeno-associated virus isolate YZ-1, complete genome.|Avian adeno-associated virus|VRL|05-JAN-2011
acc|NEIGHBOR|GU111555.1|HIV-1 isolate RBF168 from France, complete genome.|Human immunodeficiency virus 1|VRL|24-JUL-2016
acc|GENBANK|GU385355.1|Equine infectious anemia virus isolate FDDV-2 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2), truncated envelope polyprotein (env), and Rev (s3) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016
acc|GENBANK|GU385356.1|Equine infectious anemia virus isolate FDDV-15-4 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2) and truncated envelope polyprotein (env) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016
acc|GENBANK|GU385357.1|Equine infectious anemia virus isolate FDDV-7 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2), truncated envelope polyprotein (env), and Rev (s3) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016

I am wondering if it is possible to contain the whole fasta header in ltrdigest_tabout.csv

Error in .normarg_input_filepath(filepath)

Hi @HajkD,
I keep getting this error in the filtering step just before usearch clustering.
Which file is failing to be parsed at this stage?

Input:  /disk2/nguinkal/Zander_Project/pipelines/LTRPred/Sluc_ltrdigest/Sluc_LTRdigestPrediction.gff  -> Row Number:  115807
Remove 'NA' -> New Row Number:  115807
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.
Error in .normarg_input_filepath(filepath) : 
  'filepath' must be a character vector with no NAs
Calls: LTRpred ... fasta.index -> open_input_files -> .normarg_input_filepath
Execution halted

Best,
Julien

License in the repository is missing

JOSS requires a plain-text LICENSE file with the contents of an OSI approved software license

R installation instructions need to be updated

When tested on a clean R/Rstudio installation (rocker/rstudio), the installation instructions require manual intervention

install.packages(c("tidyverse", "data.table", "seqinr", "biomartr", "ape", "dtplyr", "devtools"))

ERROR: dependencies ‘biomaRt’, ‘Biostrings’ are not available for package ‘biomartr’
* removing ‘/usr/local/lib/R/site-library/biomartr’
Warning in install.packages :
  installation of package ‘biomartr’ had non-zero exit status

Installing biomaRt and Biostrings and then installing biomartr worked. Please, adjust accordingly.

"Error: Tibble columns must have compatible sizes" at Step 6

Hi,
I am getting an error causing pre-maturation of the pipeline

Here is my R script

library(LTRpred)
LTRpred(genome.file = "felv.fasta")

And the LTRPred log

$ Rscript test_simple.r
Warning message:
package ‘LTRpred’ was built under R version 4.0.3
vsearch v2.17.0_linux_x86_64, 1006.5GB RAM, 112 cores
https://github.com/torognes/vsearch

Running LTRpred on genome 'felv.fasta' with 1 core(s) and searching for retrotransposons using the overlaps option (overlaps = 'no') ...

No hmm files were specified, thus the internal HMM library will be used! See '/home/khanlab/anaconda3/envs/RVDBAnnotation/lib/R/library/LTRpred/HMMs/hmm_*' for details.
No tRNA files were specified, thus the internal tRNA library will be used! See '/home/khanlab/anaconda3/envs/RVDBAnnotation/lib/R/library/LTRpred/tRNAs/tRNA_library.fa' for details.
The output folder '/home/khanlab/users/pei-ju.chin/projects/ltr_pred/FeLv_test/felv_ltrpred' does not seem to exist yet and will be created ...

LTRpred - Step 1:
Run LTRharvest...
LTRharvest: Generating index file felv_ltrharvest/felv_index.fsa with gt suffixerator...
Running LTRharvest and writing results to felv_ltrharvest...
LTRharvest analysis finished!

LTRpred - Step 2:
Run LTRdigest...
Generating index file felv_ltrdigest/felv_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to felv_ltrdigest...
LTRdigest analysis finished!

LTRpred - Step 3:
Import LTRdigest Predictions...

Input: felv_ltrdigest/felv_LTRdigestPrediction.gff -> Row Number: 247
Remove 'NA' -> New Row Number: 247
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.

LTRpred - Step 4:
Perform ORF Prediction using 'usearch -fastx_findorfs' ...
usearch v11.0.667_i86linux32, 4.0Gb RAM (1055Gb total), 112 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

License: personal use only

00:00 37Mb 100.0% Working

WARNING: Input has lower-case masked sequences

Join ORF Prediction table: nrow(df) = 52 candidates.
unique(ID) = 22 candidates.
unique(orf.id) = 42 candidates.

LTRpred - Step 5:
Perform methylation context quantification..
Join methylation context (CG, CHG, CHH, CCG) count table: nrow(df) = 52 candidates.
unique(ID) = 22 candidates.
unique(orf.id) = 42 candidates.
Copy files to result folder '/home/khanlab/users/pei-ju.chin/projects/ltr_pred/FeLv_test/felv_ltrpred'.

LTRpred - Step 6:
Starting retrotransposon evolutionary age estimation by comparing the 3' and 5' LTRs using the molecular evolution model 'K80' and the mutation rate '1.3e-07' (please make sure the mutation rate can be assumed for your species of interest!) for 52 predicted elements ...

Please be aware that evolutionary age estimation based on 3' and 5' LTR comparisons are only very rough time estimates and don't take reverse-transcription mediated retrotransposon recombination between family members of retroelements into account! Please consult Sanchez et al., 2017 Nature Communications and Drost & Sanchez, 2019 Genome Biology and Evolution for more details on retrotransposon recombination.
Error: Tibble columns must have compatible sizes.

Size 52: Existing data.
Size 31: Column ltr_name.
ℹ Only values of size one are recycled.
Backtrace:
█

└─LTRpred::LTRpred(genome.file = "felv.fasta")
└─LTRpred::ltr_age_estimation(...)
```
└─tibble::tibble(...)
```

  └─tibble:::tibble_quos(xs[!is.null], .rows, .name_repair)

    └─tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])

Warning message:
data_frame() was deprecated in tibble 1.1.0.
Please use tibble() instead.
This warning is displayed once every 8 hours.
Call lifecycle::last_warnings() to see where this warning was generated.
Execution halted

I am not interested in evolutionary age estimation in Step 6. Is it possible to bypass this step?

Thanks!

Installation instructions of dependencies should be updated

The LTRpred package requires six command-line tools, https://hajkd.github.io/LTRpred/articles/Introduction.html#installation. The installation instructions need to be updated as they refer to tools that are 2-3 versions older than the current ones. Consequently, installation instructions differ. For example:

wget ftp://selab.janelia.org/pub/software/hmmer3/3.1b2/hmmer-3.1b2.tar.gz

doesn't work. Only the latest version can be downloaded:

wget http://eddylab.org/software/hmmer/hmmer-3.3.tar.gz

Same for GenomeTools and others - installation instructions should be updated.

Issue with genome fasta header format

dear Hajk,

I noticed that, under certain headers formats, in my case:

Chr1
Chr2
Chr3
Chr4
Chr5
Chr6
Chr7
Scaffold_1
Scaffold_2
Scaffold_3

The tool chops out part of the genomic scaffold name after the underscore in the gff and bed result files, being imposible to correctly locate the position of the predicted LTR transposon.

Changing header names seems to fix the issue.

Chr1
Chr2
Chr3
Chr4
Chr5
Chr6
Chr7
Scaffold1
Scaffold2
Scaffold3

Thanks

Error in Join solo LTR Copy Number Estimation table

Hello, @HajkD,
I keep getting this error in the Join solo LTR Copy Number Estimation table after Finished LTR CNV estimation

Filter hit results...
Estimate CNV for each LTR sequence...
Finished LTR CNV estimation!
Join solo LTR Copy Number Estimation table: nrow(df) = 8387 candidates.
unique(ID) = 8387 candidates.
unique(orf.id) = 8387 candidates.
Error: Column `cn_3ltr` must be length 8387 (the number of rows) or one, not 0
Stop executing

Then , I checked the intermediate files ，find G_soloLTRs_3ltr.bed and G_solo_LTRs_5ltr.bed have a Slight difference (also find in your source code here)

# write estimated solo LTR loci to LTRpred output folder
            cn2bed(
                solo.ltr.cn$pred_3ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_soloLTRs_3ltr**"),
                output = output.path
            )         

            cn2bed(
                solo.ltr.cn$pred_5ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_solo_LTRs_5ltr**"),
                output = output.path
            )

So, if this code mistake result in the error ? thank you!

Best,
zyq

‘httpuv’, ‘sourcetools’ had non-zero exit status

How can I solve this problem? when installing and starting a pepline

biocLite("HajkD/LTRpred")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.4 (2018-03-15).
Installing github package(s) ‘HajkD/LTRpred’
Downloading GitHub repo HajkD/LTRpred@master
from URL https://api.github.com/repos/HajkD/LTRpred/zipball/master
Installing LTRpred
пробую URL 'https://cloud.r-project.org/src/contrib/amap_0.8-14.tar.gz'
Content type 'application/x-gzip' length 259358 bytes (253 KB)
==================================================
downloaded 253 KB

Installing amap
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet
CMD INSTALL '/tmp/Rtmp2VZBrC/devtools61bed5b1463/amap'
--library='/home/aset/R/x86_64-pc-linux-gnu-library/3.4' --install-tests

installing source package ‘amap’ ...
** пакет ‘amap’ удачно распакован, MD5 sums проверены
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking for pthread_create in -lpthread... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
g++ -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c acprob.cpp -o acprob.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c burt.c -o burt.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c diss.c -o diss.o
g++ -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c distance.cpp -o distance.o
g++ -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c hclust.cpp -o hclust.o
g++ -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c kmeans.cpp -o kmeans.o
g++ -I/usr/share/R/include -DNDEBUG -I/usr/local/include/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c matrice.cpp -o matrice.o
gfortran -fpic -g -O2 -fstack-protector-strong -c pop.f -o pop.o
/bin/bash: gfortran: команда не найдена
/usr/lib/R/etc/Makeconf:182: recipe for target 'pop.o' failed
make: *** [pop.o] Error 127
ERROR: compilation failed for package ‘amap’
removing ‘/home/aset/R/x86_64-pc-linux-gnu-library/3.4/amap’
Installation failed: Command failed (1)
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet
CMD INSTALL '/tmp/Rtmp2VZBrC/devtools61be2461ee30/HajkD-LTRpred-fe9851d'
--library='/home/aset/R/x86_64-pc-linux-gnu-library/3.4' --install-tests

ERROR: dependency ‘amap’ is not available for package ‘LTRpred’

removing ‘/home/aset/R/x86_64-pc-linux-gnu-library/3.4/LTRpred’
Installation failed: Command failed (1)
installation path not writeable, unable to update packages: XML, cluster,
foreign, MASS, Matrix, nlme, survival

could not install biocLite.R

> source("http://bioconductor.org/biocLite.R")
Error: With R version 3.5 or greater, install Bioconductor packages using BiocManager; see https://bioconductor.org/install

> BiocManager::install(c("biocLite.R"))
Bioconductor version 3.9 (BiocManager 1.30.7), R 3.6.1 (2019-07-05)
Installing package(s) 'biocLite.R'
Warning message:
package ‘biocLite.R’ is not available (for R version 3.6.1)

Error: Failed to install 'LTRpred' from GitHub: (converted from warning) package ‘amap’ is not available (for R version 3.5.1)

When I run biocLite("HajkD/LTRpred"), I get the following message:

Failed to install 'LTRpred' from GitHub: (converted from warning) package ‘amap’ is not available (for R version 3.5.1)

Does that mean that LTRpred cannot be installed on R version 3.5.1? Do you have a recommended R version?

I work on macOS High Sierra.

Issue with running with singularity

Hi!

I am trying to run LTRpred with singularity but encountering a strange issue

My commands looked like:

singularity build --sandbox ltrpred ltrpred_latest.sif
singularity shell ltrpred

R
library(LTRpred)
LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"), cores = 28 )

The error comes from step 2:

vsearch v2.14.2_linux_x86_64, 62.5GB RAM, 28 cores
https://github.com/torognes/vsearch

Running LTRpred on genome '/usr/local/lib/R/site-library/LTRpred/Hsapiens_ChrY.fa' with 28 core(s) and searching for retrotransposons using the overlaps option (overlaps = 'no') ...

LTRpred - Step 1:
Run LTRharvest...
LTRharvest: Generating index file Hsapiens_ChrY_ltrharvest/Hsapiens_ChrY_index.fsa with gt suffixerator...
Running LTRharvest and writing results to Hsapiens_ChrY_ltrharvest...
LTRharvest analysis finished!

LTRpred - Step 2:
Run LTRdigest...
Generating index file Hsapiens_ChrY_ltrdigest/Hsapiens_ChrY_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to Hsapiens_ChrY_ltrdigest...
gt ltrdigest: error: fopen(): cannot open file '/usr/local/lib/R/site-library/LTRpred/tRNAs/tRNA_library.fa.des': Read-only file system
LTRdigest analysis finished!

LTRpred - Step 3:
Import LTRdigest Predictions...
Error: The file 'Hsapiens_ChrY_ltrdigest/Hsapiens_ChrY-ltrdigest_tabout.csv' does not exist! Please check the correct path or the correct output of LTRharvest() or LTRdigest().

I am not sure why 'tRNA_library.fa.des' but not 'tRNA_library.fa' is being called in step 2.... Is this issue with singularity? I was not sure where in the container I can find this script to modify the path....

Thank you for your help!

LTRpred installation in R failed: "Installation failed: Empty reply from server"

I have tried to install LTRpred in R but I failed:

biocLite("HajkD/LTRpred")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.4 (2018-03-15).
Installing github package(s) ‘HajkD/LTRpred’
Downloading GitHub repo HajkD/LTRpred@master
from URL https://api.github.com/repos/HajkD/LTRpred/zipball/master
Installation failed: Empty reply from server

I work on MacOS X.

Does this mean that the server hosting LTRpred is off?

How can I solve this problem?

library(LTRpred)
LTRpred(genome.file = system.file("/home/aset/lic.fa", package = "LTRpred"), cluster = TRUE, cores = 4)
vsearch v2.8.0_linux_x86_64, 7.7GB RAM, 8 cores
https://github.com/torognes/vsearch

No hmm files were specified, thus the internal HMM library will be used! See '/usr/local/lib/R/site-library/LTRpred/HMMs/hmm_*' for details.
No tRNA files were specified, thus the internal tRNA library will be used! See '/usr/local/lib/R/site-library/LTRpred/tRNAs/tRNA_library.fa' for details.
Folder '_ltrpred' does not exist yet and will be created...
Folder '_ltrpred' exists already and will be used...
Starting LTRpred analysis...
Step 1:
Run LTRharvest...
LTRharvest: Generating index file _ltrharvest/_index.fsa with gt suffixerator...
gt suffixerator: error: missing argument to option "-db"
Running LTRharvest and writing results to _ltrharvest...
LTRharvest analysis finished!
Step 2:
Generating index file _ltrdigest/_index_ltrdigest.fsa with suffixerator...
gt suffixerator: error: missing argument to option "-db"
LTRdigest: Sort index file...
fopen(): cannot open file '_ltrharvest/_Prediction.gff': No such file or directory
Running LTRdigest and writing results to _ltrdigest...
gt ltrdigest: error: fopen(): cannot open file '_ltrdigest/_index_ltrdigest.fsa.esq': No such file or directory
LTRdigest analysis finished!
Step 3:
Import LTRdigest Predictions...
Ошибка: The file '_ltrdigest/-ltrdigest_tabout.csv' does not exist! Please check the correct path or the correct output of LTRharvest() or LTRdigest().

Please, send me detailed instructions for installing and configuring LTRpred

I know how to install dependencies (like, RepeatMasker, HMMER, LTR_harvest and other), but I do not know how to properly configure the paths and freely run the program.

Thanks!

could LTRpred detect DIRS/DIRS in genome contigs?

Could LTRpred detect DIRS/DIRS?
Does LTRpred care about the LTR direction?
Does LTRpred care about the length of TSD? Could I set the parameter as mindistltr = 1000?
LTRpred use the default parameter overlaps = "all", but could I set the parameter as overlaps = "best"?
Are there any other parameters need to be set?
Are there some softwares to annotate DIRS/DIRS except LTRpred?
Thanks,

Running LTRPred

Dear HajKD,

it shows the following errors

43386814-085a-3de5-9412-e014919ca2fa
Error: invalid container name may already exist or wrong format

with regards

Ramky

How do I install and use the DFAM database with LTRpred?

Hi,

I've downloaded dfamscan.pl here: /usr/local/bin/dfamscan.pl. Then, I tried to pull out the help but I got an error:

perl /usr/local/bin/dfamscan.pl -help Can't locate Dfamscan.pm in @INC (you may need to install the Dfamscan module) (@INC contains: /usr/local/lib/perl5/site_perl /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/site_perl/5.22.0 /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0/darwin-thread-multi-2level /Users/user/anaconda/envs/python3env/lib/perl5/5.22.0 .) at /usr/local/bin/dfamscan.pl line 7. BEGIN failed--compilation aborted at /usr/local/bin/dfamscan.pl line 7.
What is Dfamscan.pm? How do I download the DFAM database and make it available to LTRpred so that I can get better prediction and description of LTR retrotransposons? All my dfam columns are NAs in the results so far.

I am working on Mac OS X.

Installation of dependencies fail - Docker wrapper needed

The LTRpred package requires six command-line tools. Each of them has its own dependencies and/or require sudo privileges. Testing with Docker containers using different OS versions failed, installation of tools breaks with various errors. Different tools break under different conditions. After hours of attempts, it seems impossible to install all the dependencies to test the package.

A Docker file containing all the dependencies and the LTRpred package is needed.

"sh: usearch: command not found Error: It seems like you don't have USEARCH installed"

Hi,

When I run this command:

LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"))

I get the following error:

sh: usearch: command not found
Error: It seems like you don't have USEARCH installed locally on your machine or the PATH variable to the USEARCH program is not set correctly. Please consult the Installation vignette or http://drive5.com/usearch/download.html for details on how to install USEARCH.

But I have installed USEARCH. I even used it on random fasta files:

usearch -cluster_fast c_elegans.PRJNA13758.WS263.genomic.fa -id 0.9 -centroids c_elegans.PRJNA13758.WS263.genomic.fa

and it ran (albeit with a memory error). I have added the folder in which usearch is located to my PATH variable and added the whole path to USEARCH to my PATH variable. Note that I have renamed the executable to usearch.

I work on Mac OSX and I have installed version 10.0.240 of USEARCH.

Error: Column `cn_3ltr` must be length 48 (the number of rows) or one, not 0

Dear Dr. Hajk-Georg Drost
Recently, I used one of your programs named "LTRpred".
I keep getting this error in the Join solo LTR Copy Number Estimation table after Finished LTR CNV estimation with the parameter of "copy.number.est = TRUE".

> LTRpred(genome.file = "moso10w.fasta", cores = 16,cluster=TRUE,copy.number.est = TRUE)
vsearch v2.14.2_linux_x86_64, 125.7GB RAM, 32 cores
https://github.com/torognes/vsearch

Running LTRpred on genome 'moso10w.fasta' with 16 core(s) and searching for retrotransposons using the overlaps option (overlaps = 'no') ...


No hmm files were specified, thus the internal HMM library will be used! See '/usr/local/lib/R/site-library/LTRpred/HMMs/hmm_*' for details.
No tRNA files were specified, thus the internal tRNA library will be used! See '/usr/local/lib/R/site-library/LTRpred/tRNAs/tRNA_library.fa' for details.
The output folder '/home/rstudio/ltrpred_data/moso10w_ltrpred' does not seem to exist yet and will be created ...


LTRpred - Step 1:
Run LTRharvest...
LTRharvest: Generating index file moso10w_ltrharvest/moso10w_index.fsa with gt suffixerator...
Running LTRharvest and writing results to moso10w_ltrharvest...
LTRharvest analysis finished!


LTRpred - Step 2:
Run LTRdigest...
Generating index file moso10w_ltrdigest/moso10w_index_ltrdigest.fsa with suffixerator...
LTRdigest: Sort index file...
Running LTRdigest and writing results to moso10w_ltrdigest...
LTRdigest analysis finished!


LTRpred - Step 3:
Import LTRdigest Predictions...

Input:  moso10w_ltrdigest/moso10w_LTRdigestPrediction.gff  -> Row Number:  547
Remove 'NA' -> New Row Number:  547
(1/8) Filtering for repeat regions has been finished.
(2/8) Filtering for LTR retrotransposons has been finished.
(3/8) Filtering for inverted repeats has been finished.
(4/8) Filtering for LTRs has been finished.
(5/8) Filtering for target site duplication has been finished.
(6/8) Filtering for primer binding site has been finished.
(7/8) Filtering for protein match has been finished.
(8/8) Filtering for RR tract has been finished.


LTRpred - Step 4:
Perform ORF Prediction using 'usearch -fastx_findorfs' ...
usearch v11.0.667_i86linux32, 4.0Gb RAM (132Gb total), 32 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

License: personal use only

00:00 45Mb    100.0% Working

WARNING: Input has lower-case masked sequences

Join ORF Prediction table: nrow(df) = 66 candidates.
unique(ID) = 66 candidates.
unique(orf.id) = 66 candidates.
Perform clustering of similar LTR transposons using 'vsearch --cluster_fast' ...
vsearch v2.14.2_linux_x86_64, 125.7GB RAM, 32 cores
https://github.com/torognes/vsearch

Running CLUSTpred with 90% as sequence similarity threshold using 16 cores ...
vsearch v2.14.2_linux_x86_64, 125.7GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file /home/rstudio/ltrpred_data/moso10w_ltrdigest/moso10w-ltrdigest_complete.fas 100%  
622022 nt in 66 seqs, min 4330, max 22210, avg 9425
Sorting by length 100%
Counting k-mers 100% 
Clustering 100%  
Sorting clusters 100%
Writing clusters 100% 
Clusters: 61 Size min 1, max 3, avg 1.1
Singletons: 58, 87.9% of seqs, 95.1% of clusters
Sorting clusters by abundance 100%
CLUSTpred output has been stored in: /home/rstudio/ltrpred_data/moso10w_ltrpred
Join Cluster table: nrow(df) = 66 candidates.
unique(ID) = 66 candidates.
unique(orf.id) = 66 candidates.
Join Cluster Copy Number table: nrow(df) = 66 candidates.
unique(ID) = 66 candidates.
unique(orf.id)) = 66 candidates.


LTRpred - Step 5:
Perform methylation context quantification..
Join methylation context (CG, CHG, CHH, CCG) count table: nrow(df) = 66 candidates.
unique(ID) = 66 candidates.
unique(orf.id) = 66 candidates.
Copy files to result folder '/home/rstudio/ltrpred_data/moso10w_ltrpred'.


LTRpred - Step 6:
Starting retrotransposon evolutionary age estimation by comparing the 3' and 5' LTRs using the molecular evolution model 'K80' and the mutation rate '1.3e-07' (please make sure the mutation rate can be assumed for your species of interest!) for 66 predicted elements ...


Please be aware that evolutionary age estimation based on 3' and 5' LTR comparisons are only very rough time estimates and don't take reverse-transcription mediated retrotransposon recombination between family members of retroelements into account! Please consult Sanchez et al., 2017 Nature Communications and Drost & Sanchez, 2019 Genome Biology and Evolution for more details on retrotransposon recombination.


LTRpred - Step 7:
The LTRpred prediction table has been filtered (default) to remove potential false positives. Predicted LTRs must have an PBS or Protein Domain and must fulfill thresholds: sim = 70%; #orfs = 0. Furthermore, TEs having more than 10% of N's in their sequence have also been removed.
Input #TEs: 66
Output #TEs: 48
Perform solo LTR Copy Number Estimation....
Run makeblastdb of the genome assembly...


Building a new DB, current time: 10/28/2022 08:06:19
New DB name:   /home/rstudio/ltrpred_data/moso10w.fasta
New DB title:  moso10w.fasta
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/rstudio/ltrpred_data/moso10w.fasta
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.072499 seconds.
Perform BLAST searches of 3' prime LTRs against genome assembly...
Perform BLAST searches of 5' prime LTRs against genome assembly...
Import BLAST results...
Filter hit results...
Estimate CNV for each LTR sequence...
Finished LTR CNV estimation!
Join solo LTR Copy Number Estimation table: nrow(df) = 48 candidates.
unique(ID) = 48 candidates.
unique(orf.id) = 48 candidates.
Error: Column `cn_3ltr` must be length 48 (the number of rows) or one, not 0
In addition: Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
>

Small details on the vignette

Dear @HajkD,
Congratulations on the software! I only have three minor points to comment:

LTRpred/vignettes/Introduction.Rmd

Line 154 in 26bda71

    
           docker run --rm -p 8787:8787 -v /put/here/your/path/to/ltrpred_data:/app/ltrpred_data -ti ltrpred

Update to: docker run --rm -p 8787:8787 -v /put/here/your/path/to/ltrpred_data:/app/ltrpred_data -ti drostlab/ltrpred

LTRpred/vignettes/Introduction.Rmd

Line 183 in 26bda71

    
           LTRpred(genome.file = "ltrpred_data/yeast_genome/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa", cores = 2)

[1] "Successful job 1 ."
Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.

Just a suggestion to update to tibble().

LTRpred/vignettes/Introduction.Rmd

Line 262 in 26bda71

    
           __Please read more details about how to transfer genome files and the Dfam database in the

You want to add the link to the section here.

Also, I think you could provide a few details/instructions on how to copy the results from the container to the host.

hajkd / ltrpred Goto Github PK

ltrpred's People

Stargazers

Watchers

Forkers

ltrpred's Issues

MISSING - Is this a problem? But derivative files .fsa are present, as shown below

MISSING - Is this a problem? However, other related file are generated, see below:

MISSING - Is this a problem?

MISSING - Is this a problem?

MISSING - Is this a problem?

However, tsv file created. Can this tsv output be used interchangably for downstream processing, instead of the expected csv? Or do I need to use your pred2csv function?

Recommend Projects

Recommend Topics

Recommend Org

However, tsv file created. Can this tsv output be used interchangably for downstream processing, instead of the expected csv? Or do I need to use your `pred2csv` function?