annadalmolin / craft Goto Github PK

CRAFT is a computational pipeline that predicts circRNA sequence and molecular interactions with miRNAs and RBPs, along with their coding potential. CRAFT provides a comprehensive graphical visualization of the results, links to several knowledge databases, extensive functional enrichment analysis and combination of predictions for different circRNAs. CRAFT is a useful tool to help the user explore the potential regulatory networks involving the circRNAs of interest and generate hypotheses about the cooperation of circRNAs into the regulation of biological processes.

License: Other

Shell 20.58% RMarkdown 79.42%

bioinformatics circrna-prediction mirna-mrna-interaction open-reading-frame rna-binding-proteins

craft's Introduction

CRAFT

Installation

Installation from the Docker image

The Docker image saves you from the installation burden. A Docker image of CRAFT is available from DockerHub at https://hub.docker.com/r/annadalmolin/craft; just pull it with the command:

docker pull annadalmolin/craft:v1.0

Usage

Input data

Prepare your project directory with the following files:

list_backsplice.txt: file with circRNA coordinates. The file format is a tab-separated text file, with circRNA backsplice coordinates in the first column and circRNA strand in the second. An example of list_backsplice.txt is:
```
  4:143543509-143543972	+
  11:33286413-33287511	+
  15:64499292-64500166	+
```
path_files.txt: file with the relative paths for Ensembl annotation and genome files. The file format is a text file with a path written in each row, in the following order:
1. path to annotation file
2. path to genome file
An example of path_files.txt is:
```
  /data/input/Homo_sapiens.GRCh38.104.gtf
  /data/input/Homo_sapiens.GRCh38.dna.primary_assembly.fa
```
The gene annotation (in GTF format) and the genome sequence (in FASTA format) files must be downloaded by the user from Ensembl database and placed into the input/ directory contained in the project directory. Annotation and genome files for Homo sapiens (GRCh38) can be downloaded from http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/ and http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/, respectively.
params.txt: file with the parameters to be setted in CRAFT. The file format is a text file with a/more parameter/s written in each row, in the following order:
1. kind of prediction; it can be "M" for miRNA prediction, "R" for RBP prediction, "O" for ORF prediction, "MR", "MO", "RO" or "MRO" for a combination of the previous.
2. investigated species; it can be one of the species in miRBase database: hsa for Homo sapiens, mmu for Mus musculus, etc.
3. parameters for miRanda tool (optional); in a single row, they must be the miRanda_score and the miRanda_energy, in order, separated by tab. The user must set or both parameters or neither of the two; default values are 80 (score) and -15 (energy).
4. parameters for beRBP tool (optional); in a single row, in order and separated by a tab, they must be the PWM/s and the RBP/s investigated. The syntax is: PWM RBP; multiple PWMs (separated by ", ") and associated RBP (separated by ", ") are also allowed. The default is all all, searching for all PWMs and RBPs included in beRBP database. The user must set both parameters or none of the two.
5. prefix of the genome and indexes downloaded from UCSC website; f.i. hg38 for Homo sapiens. The human genome file (f.i. hg38.fa.gz) can be downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ . Index files can be obtained following the instructions reported in https://bioinfo.vanderbilt.edu/beRBP/download/beRBP.standalone.README.txt . Genome (.fa) and indexes (.00.idx, .01.idx, .02.idx, .nhr, .nin, .nsq, .shd) must be included in the input/ directory.
6. parameters for ORFfinder tool (optional); in order, separated by tab, the user must specify: the genetic code to use, the start codon to use, the minimal ORF length, whether to ignore nested ORFs and the strand in which putative ORFs are searched. The user must set all parameters or none of them. The allowed options for each parameter are:
  1. genetic code: 1-31, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details; default: 1
  2. start codon: 0 = "ATG" only, 1 = "ATG" and alternative initiation codons, 2 = any sense codon; default: 0
  3. minimal ORF length (nt): allowed values are 30, 75, or 150; default: 30
  4. ignore nested ORFs (ORF completely placed within another). allowed values are "TRUE" or "FALSE"; default: "FALSE"
  5. strand (output ORFs on specified strand only): allowed values are "both", "plus" or "minus"; default: "plus"
7. parameters for the graphical output for a single circRNA investigated (optional, but advised); the default parameters are: l=50000, QUANTILE1=”FALSE”, thr1=0.95, score_miRNA=120, energy_miRNA=-22, QUANTILE2=”FALSE”, thr2=0.95, dGduplex_miRNA=-20, dGopen_miRNA=-11, QUANTILE3=”FALSE”, thr3=0.9, voteFrac_RBP=0.15, orgdb="org.Hs.eg.db", meshdb="MeSH.Hsa.eg.db", symbol2eg="org.Hs.egSYMBOL2EG", eg2uniprot="org.Hs.egUNIPROT", org="hsapiens". The user must specify only the parameters to be changed with respect to the default, in a comma-separated list format; the parameter order does not matter. Available parameters:
  1. l: maximum length of circRNAs analyzed
  2. QUANTILE: whether to filter predictions based on a quantile threshold (thr); QUANTILE1 and thr1 are set for miRanda predictions, QUANTILE2 and thr2 for PITA predictions, QUANTILE3 and thr3 for beRBP predictions
  3. score_miRNA and energy_miRNA: respectively, score and energy values of miRanda tool. Best predictions are obtained with higher score and lower energy
  4. dGduplex_miRNA and dGopen_miRNA: respectively, dGduplex and dGopen values of PITA tool. Best predictions are obtained with lower dGduplex and higher dGopen
  5. voteFrac_RBP: voteFrac value of beRBP tool. Best predictions are obtained with higher voteFrac
  6. orgdb and meshdb: databases for miRNA enrichment analysis; the default values are “org.Hs.eg.db” and “MeSH.Hsa.eg.db”, respectively (Homo sapiens)
  7. symbol2eg and eg2uniprot: databases for RBP enrichment analysis; the default values are “org.Hs.egSYMBOL2EG” and “org.Hs.egUNIPROT”, respectively (Homo sapiens)
  8. org: organism, in the form: human - ’hsapiens’, mouse - ’mmusculus’; the default value is for Homo sapiens
8. parameters for the summary graphical output for all circRNAs investigated (optional, but advised); the default parameters are the same as the previous point. The user must specify only the parameters to be changed with respect to the default, in a comma-separated list format; the parameter order does not matter. Available parameters: the same as before, except for meshdb and org. It is advised to set point 7 and point 8 parameters with the same values.
An example of params.txt file is:
```
  M
  hsa


  hg38
  
  score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10
  score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10, voteFrac_RBP=0.3
```

and directory:

input/: directory containing the following files:
- genome and annotation files from Ensembl database, and genome and indexes files from UCSC databases (see above)
- backsplice_gene_name.txt: file with circRNA gene names. It must be created by the user. The file format is a tab-separated text file, with circRNA backsplice in the first column and circRNA host gene name in the second; the official gene name has to be used. The header line is needed. An example of backsplice_gene_name.txt is:
```
  circ_id	gene_names
  4:143543509-143543972	SMARCA5
  11:33286413-33287511	HIPK3
  15:64499292-64500166	ZNF609
```
- AGO2_binding_sites.bed (optional): file with validated AGO2 binding sites. The file, in BED6 format, must have the following fields: chromosome, start genomic position (0-based), end genomic position, the string “AGO2_binding_site”, a dot, the strand. Keep attention to use the same genome reference version as that included in the input/ directory. An example of AGO2_binding_sites.bed is:
```
  4    143543521    143543542    AGO2_binding_site    .    +
  4    143543530    143543559    AGO2_binding_site    .    +
  4    143543562    143543607    AGO2_binding_site    .    +
```
  The number of miRNA binding sites overlapped with AGO2 binding sites is written in the standard output. Check it in order to decide to keep AGO2 overlapping or re-running the analysis without this information (i.e. when very few sites are overlapping).

Running the analysis

To run CRAFT from the Docker container use:

sudo docker run -it -v $(pwd):/data annadalmolin/craft:v1.0

All paths in path_files.txt must be relative to the directory in the container where the volumes were mounted (f.i. /data/input/file_name, as detailed above). If you want the container to give your user permissions, you need to set the owner id with "-u id -u":

sudo docker run -u `id -u` -it -v $(pwd):/data annadalmolin/craft:v1.0

Output data

After CRAFT successful run end, you will find the following new directories in your project directory:

sequence_extraction/: contains intermediary files for the sequence reconstruction step
functional_predictions/: contains final files of sequence reconstruction step and the three directories for miRNA, RBP and ORF predictions, respectively
graphical_output/: contains the directory general/ with the summary predictions of all circRNA analyzed, and a directory for each single circRNA with the specific investigation

sequence_extraction/

The output files for the sequence reconstruction step are:
- backsplice_sequence_1.fa: file with the retrieved genomic sequence for each circRNA in FASTA format
- backsplice_sequence_1.txt: tab-separated file with the retrieved genomic sequence for each circRNA in TXT format; the file appear with the circRNA backsplice coordinates in the first column and the sequence in the second
- backsplice_circRNA_length_1.txt: tab-separated file with circRNA sequence length, with circRNA backsplice in the first column and circRNA length in the second
All these files are found in the functional_predictions/ directory.
functional_predictions/

The output files of functional prediction step are (the final output of each tool is highlighted in bold):
- miRNA_detection/:
  - backsplice_sequence_per_miRNA.fa: the sequence used for miRNA prediction, obtained repeating the first 20 nt of the sequence at the end of each circRNA
  - miRanda/:
    - output_miRanda.txt: original output of miRanda
    - output_miRanda_c_per_R.txt: output of miRanda (list of miRNA binding sites), not overlapping with AGO2 binding sites, if AGO2_binding_sites.bed is provided, otherwise this file is missing
    - output_miRanda_per_R.txt: final output of miRanda (list of miRNA binding sites), overlapping with AGO2 binding sites if AGO2_binding_sites.bed is provided, otherwise it contains the list of miRNA binding sites not overlapping with AGO2 binding sites
  - PITA/:
    - pred_pita_results.tab, pred_pita_results_targets.tab, pita.err, pita.log, pred_pita_results.gxp: original output of PITA
    - pred_pita_results_targets_b.txt: output for multiple sites
    - pred_pita_results_c.txt: output of PITA (list of miRNA binding sites), not overlapping with AGO2 binding sites, if AGO2_binding_sites.bed is provided, otherwise this file is missing
    - pred_pita_results_per_R.txt: final output of PITA (list of miRNA binding sites), overlapping with AGO2 binding sites if AGO2_binding_sites.bed is provided, otherwise it contains the list of miRNA binding sites not overlapping with AGO2 binding sites
- RBP_detection/:
  - backsplice_sequence_per_RBP.fa: the sequence used for RBP prediction, obtained repeating the first 20 nt of the sequence at the end of each circRNA
  - beRBP/:
    - analysis_RBP/:
      - resultMatrix.tsv: original output of beRBP
      - resultMatrix_b.tsv: final output of beRBP in TSV (list of RBP binding sites)
      - resultMatrix_b.txt: final output of beRBP in TXT (list of RBP binding sites)
- ORF_detection/:
  - backsplice_sequence_per_ORF_MIN_LENGTH.fa: the sequence used for ORF prediction (with minimal length of the ORF = MIN_LENGTH), obtained doubling circRNA sequence twice
  - ORFfinder/:
    - result_list_ORF_MIN_LENGTH.txt, result_list_CDS_MIN_LENGTH.txt, result_text_ORF_MIN_LENGTH.txt, result_table_ORF_MIN_LENGTH.txt, ORF0_MIN_LENGTH.log, ORF1_MIN_LENGTH.log, ORF2_MIN_LENGTH.log, ORF3_MIN_LENGTH.log, ORF0_MIN_LENGTH.perf, ORF1_MIN_LENGTH.perf, ORF2_MIN_LENGTH.perf, ORF3_MIN_LENGTH.perf: original output of ORFfinder (with minimal length of the ORF = MIN_LENGTH)
    - ORF_backsplice.txt and ORF_backsplice0.txt: final output of ORFfinder (list of ORF detected crossing the backsplice junction), respectively with ORF start position in 1-based and in 0-based format
    - ORF_backsplice_open.txt and ORF_backsplice_open0.txt: final output of ORFfinder (list of rolling ORF detected), respectively with ORF start position in 1-based and in 0-based format
    - result_list_CDS.fa and result_list_CDS.txt: nucleotidic sequence of all detected ORF, respectively in FASTA and TXT format
    - result_list_ORF.fa and result_list_ORF.txt: amino acid sequence of all detected ORF, respectively in FASTA and TXT format
graphical_output/

The output files for the graphical output step are:
- general/: directory with the summary predictions of all circRNA analyzed:
  - functional_predictions_all_circRNAs.html: output HTML file summarizing all predictions of all circRNA tested (see CRAFT paper for more details)
  - single figures pulled out from the HTML file
  - All_validated_TGs.csv: table pulled out from the HTML file; it can be loaded into Cytoscape for network analysis
- a directory for each single circRNA with it own predictions:
  - functional_predictions_CIRC_ID.html: output HTML file with the predictions related to CIRC_ID (see CRAFT paper for more details)
  - single figures and tables pulled out from the HTML file

CircRNA sequence provided by the user

If circRNA sequences are available to the user, CRAFT doesn’t perform the sequence reconstruction step. So, to let CRAFT use the provided circRNA sequences, the user must follow these steps:

create the sequence_extraction/ directory into the project directory
add the backsplice_sequence_1.fa, backsplice_sequence_1.txt and backsplice_circRNA_length_1.txt files, in the format described above, to sequence_extraction/
add the backsplice_gene_name.txt file, in the format described above, to sequence_extraction/
if the user wants to filter for miRNA binding sites overlapped with AGO2 binding sites, he/she must also add the file region_to_extract_1.bed to sequence_extraction/. The file in BED6 format must have six tab-separated columns: circRNA chromosome, 0-based start position, 1-based end position, backsplice coordinates, score, strand. Each row represents a single separated region from which the circRNA is arranged (exon, intron, part of exon/intron or intergenic region). An example of region_to_extract_1.bed is:
```
 11	33286412	33287511	11:33286412-33287511	.	+
 15	64499291	64500166	15:64499291-64500166	.	+
 4	143543508	143543657	4:143543508-143543972	.	+
 4	143543852	143543972	4:143543508-143543972	.	+
```

Additional notes

Functional enrichments on validated target genes of miRNAs with predicted binding sites in circRNA sequences can be performed only for Homo sapiens (hsa), Mus musculus (mmu) and Rattus norvegicus (rno) species.
The output clearness and intelligibility improve at the growing of filtering stringency; f.i., if a figure is not understandable or CRAFT crashes due to too many predictions, simply re-run the graphical part of the analysis increasing CRAFT stringency.

How to cite

If you use CRAFT for your analysis, please add the following citation to your references:

Dal Molin A, Gaffo E, Difilippo V, Buratin A, Tretti Parenzan C, Bresolin S, Bortoluzzi S, CRAFT: a bioinformatics software for custom prediction of circular RNA functions, Brief Bioinform. 2022 Mar 10;23(2):bbab601.

craft's People

Contributors

Stargazers

Watchers

Forkers

egaffo eddie-a-salinas

craft's Issues

Unable to get the complete results of craft, but Im able to run the beRBP tools individually

Respected professor

As of now im just replicating your paper and file results and then I will produce results from my own data. In my root directory I have made a subdirectory called data in which I have made three files :

list_backsplice.txt :
4:143543509-143543972 +
11:33286413-33287511 +
15:64499292-64500166 +

path_files.txt :

/data/input/Homo_sapiens.GRCh38.104.gtf
/data/input/Homo_sapiens.GRCh38.dna.primary_assembly.fa

params.txt :
MRO
hsa

HG38

score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10
score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10, voteFrac_RBP=0.3

Now I have made input subdirectory within data, which has the fasta file, gtf file, and the index files respectively:
HG38.00.idx HG38.02.idx HG38.nhr HG38.nsq Homo_sapiens.GRCh38.104.gtf
HG38.01.idx HG38.fa HG38.nin HG38.shd Homo_sapiens.GRCh38.dna.primary_assembly.fa

Now when im running the craft using your command "sudo docker run -it -v $(pwd):/data annadalmolin/craft:v1.0" Im getting all the directories of sequence_extraction/, functional_predictions/, graphical_output/ with the respective folders inside it which are empty. Alongwith that Im getting the following errors :

mv: cannot stat '/data/sequence_extraction/backsplice_sequence_1.fa': No such file or directory
mv: cannot stat '/data/sequence_extraction/backsplice_sequence_1.txt': No such file or directory
mv: cannot stat '/data/sequence_extraction/backsplice_circRNA_length_1.txt': No such file or directory
cat: backsplice_circRNA_length_1.txt: No such file or directory
mv: cannot stat '/data/input/backsplice_gene_name.txt': No such file or directory
cat: ../backsplice_sequence_1.fa: No such file or directory
cat: ../backsplice_sequence_1.fa: No such file or directory
cat: ../backsplice_sequence_1.fa: No such file or directory
MiRNA binding site prediction analysis completed.
cat: ../backsplice_sequence_1.fa: No such file or directory
cp: cannot stat '../backsplice_sequence_1.txt': No such file or directory
cat: ../backsplice_circRNA_length_1.txt: No such file or directory
cat: ORF_backsplice2.bed: No such file or directory
rm: cannot remove 'ORF_backsplice2.bed': No such file or directory
cat: result.txt: No such file or directory
cat: result.txt: No such file or directory
rm: cannot remove 'result.txt': No such file or directory
cat: result_list_CDS_30.txt: No such file or directory
cat: result_list_ORF_30.txt: No such file or directory
ORF prediction analysis completed.

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

rmarkdown::render('/data/.scripts/functional_predictions_single_circRNA.Rmd', output_file='functional_predictions_single_circRNA.html', output_dir='.', params = list(circ='4:143543509-143543972', score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10))

processing file: functional_predictions_single_circRNA.Rmd
| | 1%
inline R code fragments

|. | 1%
label: setup (with options)
List of 1
$ include: logi FALSE

|. | 2%
ordinary text without R code

|.. | 2%
label: libraries (with options)
List of 2
$ echo : logi FALSE
$ include: logi FALSE

Bioconductor version '3.14' is out-of-date; the current release version '3.17'
is available with R version '4.3'; see https://bioconductor.org/install

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag
The following objects are masked from 'package:base':

intersect, setdiff, setequal, union
Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

between, first, last
Loading required package: viridisLite

Attaching package: 'reshape2'

The following objects are masked from 'package:data.table':

dcast, melt
Failed to create bus connection: No such file or directory
-- Attaching packages -------------------------------------------------------------------------------------------- tidyverse 1.3.1 --
v tidyr 1.1.4 v stringr 1.4.0
v readr 2.1.1 v forcats 0.5.1
v purrr 0.3.4
-- Conflicts ----------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x data.table::between() masks dplyr::between()
x dplyr::filter() masks stats::filter()
x data.table::first() masks dplyr::first()
x dplyr::lag() masks stats::lag()
x data.table::last() masks dplyr::last()
x purrr::transpose() masks data.table::transpose()
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
Attaching package: 'plyr'

The following object is masked from 'package:purrr':

compact
The following objects are masked from 'package:dplyr':

arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1

Execution halted
mv: cannot stat 'functional_predictions_single_circRNA.html': No such file or directory

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

rmarkdown::render('/data/.scripts/functional_predictions_single_circRNA.Rmd', output_file='functional_predictions_single_circRNA.html', output_dir='.', params = list(circ='11:33286413-33287511', score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10))

processing file: functional_predictions_single_circRNA.Rmd
| | 1%
inline R code fragments

|. | 1%
label: setup (with options)
List of 1
$ include: logi FALSE

|. | 2%
ordinary text without R code

|.. | 2%
label: libraries (with options)
List of 2
$ echo : logi FALSE
$ include: logi FALSE

Bioconductor version '3.14' is out-of-date; the current release version '3.17'
is available with R version '4.3'; see https://bioconductor.org/install

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag
The following objects are masked from 'package:base':

intersect, setdiff, setequal, union
Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

between, first, last
Loading required package: viridisLite

Execution halted
mv: cannot stat 'functional_predictions_single_circRNA.html': No such file or directory

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

rmarkdown::render('/data/.scripts/functional_predictions_single_circRNA.Rmd', output_file='functional_predictions_single_circRNA.html', output_dir='.', params = list(circ='15:64499292-64500166', score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10))

processing file: functional_predictions_single_circRNA.Rmd
| | 1%
inline R code fragments

|. | 1%
label: setup (with options)
List of 1
$ include: logi FALSE

|. | 2%
ordinary text without R code

|.. | 2%
label: libraries (with options)
List of 2
$ echo : logi FALSE
$ include: logi FALSE

Bioconductor version '3.14' is out-of-date; the current release version '3.17'
is available with R version '4.3'; see https://bioconductor.org/install

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag
The following objects are masked from 'package:base':

intersect, setdiff, setequal, union
Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

between, first, last
Loading required package: viridisLite

Attaching package: 'reshape2'

The following objects are masked from 'package:data.table':

The following object is masked from 'package:purrr':

compact
The following objects are masked from 'package:dplyr':

arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:dplyr':

combine, intersect, setdiff, union
The following objects are masked from 'package:stats':

IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:plyr':

rename
The following object is masked from 'package:tidyr':

expand
The following objects are masked from 'package:data.table':

first, second
The following objects are masked from 'package:dplyr':

first, rename
The following objects are masked from 'package:base':

I, expand.grid, unname
Attaching package: 'IRanges'

The following object is masked from 'package:glue':

trim
The following object is masked from 'package:plyr':

desc
The following object is masked from 'package:purrr':

reduce
The following object is masked from 'package:data.table':

shift
The following objects are masked from 'package:dplyr':

collapse, desc, slice
c
Attaching package: 'AnnotationDbi'

The following object is masked from 'package:dplyr':

select
c
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1

Execution halted
mv: cannot stat 'functional_predictions_single_circRNA.html': No such file or directory

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Execution halted

These are also the errors that im encountering:

Error: The genome file annotation_chr.genome has no valid entries. Exiting.
Error: The genome file annotation_chr.genome has no valid entries. Exiting.
mv: cannot stat '/data/input/backsplice_gene_name.txt': No such file or directory

Madam could you please guide me with these

can't detect input files

Hi Anna,

Thank you very much for this nice tool.

But I really came across some problems when using it.
The system is MacOS

Here's the structure of my working directory data/
.
├── input
│   ├── Mus_musculus.GRCm39.104.gtf
│   ├── Mus_musculus.GRCm39.dna.primary_assembly.fa
│   ├── backsplice_gene_name.txt
│   ├── mm39.00.idx
│   ├── mm39.01.idx
│   ├── mm39.fa
│   ├── mm39.ndb
│   ├── mm39.nhr
│   ├── mm39.nin
│   ├── mm39.njs
│   ├── mm39.not
│   ├── mm39.nsq
│   ├── mm39.ntf
│   ├── mm39.nto
│   └── mm39.shd
├── list_backsplice.txt
├── params.txt
└── path_files.txt

my command is:
docker run -it -v $(pwd):/data annadalmolin/craft:v1.0

the path_files.txt:
/Users/lily/tool/craft/data/input/Mus_musculus.GRCm39.104.gtf
/Users/lily/tool/craft/data/input/Mus_musculus.GRCm39.dna.primary_assembly.fa

the params.txt:
MRO
mmu
.
.
mm39
.
orgdb="org.Mm.eg.db", meshdb="MeSH.Mmu.eg.db", symbol2eg="org.Mm.egSYMBOL2EG", eg2uniprot="org.Mm.egUNIPROT", org="mmusculus"
orgdb="org.Mm.eg.db", symbol2eg="org.Mm.egSYMBOL2EG", eg2uniprot="org.Mm.egUNIPROT"

reported error:
grep: /Users/lily/tool/craft/data/input/Mus_musculus.GRCm39.104.gtf: No such file or directory
grep: /Users/lily/tool/craft/data/input/Mus_musculus.GRCm39.104.gtf: No such file or directory
cat: /Users/lily/tool/craft/data/input/Mus_musculus.GRCm39.dna.primary_assembly.fa: No such file or directory
Error: The genome file annotation_chr.genome has no valid entries. Exiting.
Error: The genome file annotation_chr.genome has no valid entries. Exiting.
rm: cannot remove 'backsplice_sequence_bed*': No such file or directory
rm: cannot remove 'circ_id.txt': No such file or directory
rm: cannot remove 'circ_length.txt': No such file or directory
rm: cannot remove 'out_region*': No such file or directory
rm: cannot remove 'region_to_extract_for_?.bed': No such file or directory
rm: cannot remove 'region_to_extract_rev_?.bed': No such file or directory
mv: cannot stat '/data/sequence_extraction/backsplice_sequence_1.fa': No such file or directory
mv: cannot stat '/data/sequence_extraction/backsplice_sequence_1.txt': No such file or directory
mv: cannot stat '/data/sequence_extraction/backsplice_circRNA_length_1.txt': No such file or directory
cat: backsplice_circRNA_length_1.txt: No such file or directory
is not a valid option. Try with one of the followings: M, R, O, MR, MO, RO, MRO.

Looking forward to your response. Thank you!

Error when predicting miRNA for chrMT circRNA

Hello,

I am trying to use CRAFT to predict miRNA binding sites for three circular RNA, one of which comes from the mitochondrial chromosome. I am able to successfully finish predictions for two of the three circular RNA but not for the mitochondrial one. CRAFT hangs for a long time (10+ hours) at this point:

|...............................                                       |  45%
  ordinary text without R code

  |................................                                      |  46%
label: validated_TG (with options) 
List of 2
 $ echo   : logi FALSE
 $ include: logi FALSE

And then eventually ends up giving this error:

Quitting from lines 842-907 (functional_predictions_all_circRNAs.Rmd)
Error in function (type, msg, asError = TRUE)  :
  Could not resolve host: multimir.org
Calls: <Anonymous> ... submit_request -> <Anonymous> -> .postForm -> <Anonymous> -> fun
In addition: Warning messages:
1: package(s) not installed when version(s) same as current; use `force = TRUE` to
  re-install: 'grid'
2: In read.table(file_gene_names, header = T, sep = "\t") :
  incomplete final line found by readTableHeader on '/data/functional_predictions/backsplice_gene_name.txt'

Execution halted

Here is my backsplice_gene_name.txt input file:

circ_id	gene_names
1:16891302-16893846	NBPF1
5:89791493-89802491	POLR3G
MT:9533-9756	MT-CO3

Is it an issue to use sequences from the mitochondrial chromosome with CRAFT?

Thank you for your time.

Best,
Megan

Failed to run the demo

@annadalmolin Hello, I failed to run the demo. I am in China. Do you have any special requirements on the network?

CRAFT cannot detect input files

Hi,
CRAFT looks like very convenient tool for circRNA analysis. However I am not able to run even test example because of dome errors. I run the following command:
sudo docker run -u id -u -it -v $(pwd):/data annadalmolin/craft:v1.0
After running command I have the following errors:

cat: /data/path_files.txt: No such file or directory
cat: /data/path_files.txt: No such file or directory
cat: /data/list_backsplice.txt: No such file or directory

Input files are the copies of your examples.
My working directory is:
/media/marcin/data/lymphoma_circs/test
and structure of the files is:

.
└── data
    ├── input
    │   ├── backsplice_gene_name.txt
    │   ├── est.fa
    │   ├── est.fa.gz.md5
    │   ├── hg38.2bit
    │   ├── hg38.agp
    │   ├── hg38.chromAlias.bb
    │   ├── hg38.chromAlias.txt
    │   ├── hg38.chromAlias.txt.0
    │   ├── hg38.chromFaMasked.tar
    │   ├── hg38.chromFa.tar
    │   ├── hg38.chrom.sizes
    │   ├── hg38.fa
    │   ├── hg38.fa.align
    │   ├── hg38.fa.masked
    │   ├── hg38.fa.out
    │   ├── hg38.gc5Base.bw
    │   ├── hg38.gc5Base.wib
    │   ├── hg38.gc5Base.wig
    │   ├── hg38.gc5Base.wigVarStep
    │   ├── hg38.trf.bed
    │   ├── Homo_sapiens.GRCh38.104.gtf
    │   ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa
    │   ├── md5sum.txt
    │   ├── mrna.fa
    │   ├── mrna.fa.gz.md5
    │   ├── README.txt
    │   ├── refMrna.fa
    │   ├── refMrna.fa.gz.md5
    │   ├── upstream1000.fa
    │   ├── upstream1000.fa.gz.md5
    │   ├── upstream2000.fa
    │   ├── upstream2000.fa.gz.md5
    │   ├── upstream5000.fa
    │   ├── upstream5000.fa.gz.md5
    │   ├── xenoMrna.fa
    │   ├── xenoMrna.fa.gz.md5
    │   ├── xenoRefMrna.fa
    │   └── xenoRefMrna.fa.gz.md5
    ├── list_backsplice.txt
    ├── params.txt
    └── path_files.txt

Thanks for any help.

Marcin

Im now unable to run CRAFT using your test data now earlier it was running fine with these following contents

Im a root user, in which i have a directory called test, inside which i have another directory called input. T he contents of the directories are as follows:

1.) root
1.A.) test/ :
a) list_backsplice.txt : 4:143543509-143543972 +
11:33286413-33287511 +
15:64499292-64500166 +
b) params.txt : MO
hsa

hg38

score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10
score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10, voteFrac_RBP=0.3.

c) path_files.txt :
/data/input/HG38.gtf
/data/input/Homo_sapiens.GRCh38.dna.primary_assembly.fa

1.B) input/ :

a) AGO2_binding_sites.bed
b) backsplice_gene_name.txt
c) hg38.02.idx
d) hg38.nin
e) hg38.nto
f) HG38.gtf
g) bigWigToWig
h) hg38.fa
i) hg38.not
j)hg38.shd
k)Homo_sapiens.GRCh38.104.gtf
l) hg38.00.idx
m) hg38.ndb
n) hg38.nsq
o) Homo_sapiens.GRCh38.dna.primary_assembly.fa
p) hg38.01.idx
q) hg38.nhr
r) hg38.ntf

Running CRAFT for long circRNA sequences

Hi,

I am interested in using the CRAFT tool for running functional prediction on long circRNA (3k-30k nt). The tool works perfectly fine for smaller circRNA, however it takes really long to do the same for long circRNA and the job eventually runs out of time before the prediction can be finished. I am only running miRNA prediction as I am only interested in the disease associations of these circRNA. I have tried setting high scores, however that leads to either some circRNA taking forever or some circRNA not having any miRNA output. Given the wide range of the length I am working in, I am not sure what would be the best solution here and I would really appreciate it if you could provide some input on what would be the best parameters to analyze circRNA this long.

Best Regards,
Avleen

Error: CircRNA Sequences Provided by User Not Completing Predictions

Hello,

I was able to get CRAFT to complete when having it perform the sequence extraction for me. However, I am running into completion issues when providing CRAFT with the circRNA sequences.

When processing the individual circRNAs, I get this error for two of them.

  |...........................................................           |  84%
label: disease_association_RBP (with options) 
List of 2
 $ echo   : logi FALSE
 $ include: logi FALSE

Please wait we are processing your accessions ...

Error in file(file, ifelse(append, "a", "w")) :                               
  all connections are in use
Calls: <Anonymous> ... <Anonymous> -> signalCondition -> <Anonymous> -> cat -> file
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Execution halted
mv: cannot stat 'functional_predictions_single_circRNA.html': No such file or directory

For the last circRNA prediction, I get this error in the same location, at 84%.

Please wait we are processing your accessions ...
Quitting from lines 2772-2830 (functional_predictions_single_circRNA.Rmd)     
Error: Can't subset columns that don't exist.
x Column `Involvement.in.disease` doesn't exist.
Backtrace:
     x
  1. +-rmarkdown::render(...)
  2. | \-knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
  3. |   \-knitr:::process_file(text, output)
  4. |     +-base::withCallingHandlers(...)
  5. |     +-knitr:::process_group(group)
  6. |     \-knitr:::process_group.block(group)
  7. |       \-knitr:::call_block(x)
  8. |         \-knitr:::block_exec(params)
  9. |           \-knitr:::eng_r(options)
 10. |             +-knitr:::in_dir(...)
 11. |             \-knitr:::evaluate(...)
 12. |               \-evaluate::evaluate(...)
 13. |                 \-evaluate:::evaluate_call(...)
 14. |                   +-evaluate:::timing_fn(...)
 15. |                   +-base:::handle(...)
 16. |                   +-base::withCallingHandlers(...)
 17. |                   +-base::withVisible(eval(expr, envir, enclos))
 18. |                   \-base::eval(expr, envir, enclos)
 19. |                     \-base::eval(expr, envir, enclos)
 20. \-UniprotR::Get.diseases(PathologyObj)
 21.   +-dplyr::select(Pathology_object, "Involvement.in.disease")
 22.   \-dplyr:::select.data.frame(Pathology_object, "Involvement.in.disease")
 23.     \-tidyselect::eval_select(expr(c(...)), .data)
 24.       \-tidyselect:::eval_select_impl(...)
 25.         +-tidyselect:::with_subscript_errors(...)
 26.         | +-base::tryCatch(...)
 27.         | | \-base:::tryCatchList(expr, classes, parentenv, handlers)
 28.         | |   \-base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
 29.         | |     \-base:::doTryCatch(return(expr), name, parentenv, handler)
 30.         | \-tidyselect:::instrument_base_errors(expr)
 31.         |   \-base::withCallingHandlers(...)
 32.         \-tidyselect:::vars_select_eval(...)
 33.           \-tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 34.             \-tidyselect:::eval_c(expr, data_mask, context_mask)
 35.               \-tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 36.                 \-tidyselect:::walk_data_tree(new, data_mask, context_mask)
 37.                   \-tidyselect:::as_indices_sel_impl(...)
 38.                     \-tidyselect:::as_indices_impl(x, vars, strict = strict)
 39.                       \-tidyselect:::chr_as_locations(x, vars)
 40.                         \-vctrs::vec_as_location(x, n = length(vars), names = vars)
 41.                           \-(function () ...
 42.                             \-vctrs:::stop_subscript_oob(...)
 43.                               \-vctrs:::stop_subscript(...)
There were 50 or more warnings (use warnings() to see the first 50)

Execution halted
mv: cannot stat 'functional_predictions_single_circRNA.html': No such file or directory

The final prediction generating thefunctional_predictions_all_circRNA.htmland functional_predictions_all_circRNA.knit.md finishes completely with no errors.

I also get this error at the beginning of the run about permission denied to access .mature_hsa.fa. I checked my input files and I don't have this file in my inputs.

Putative sequence/s already extracted or provided by the user.
/scripts/pipeline_predictions.sh: line 102: /data/input/.mature_hsa.fa: Permission denied
MiRNA binding site prediction analysis already performed.
RBP binding site prediction analysis already performed.

Any help would be greatly appreciated. Thank you for your time.

Best,

Megan

CRAFT cannot detect input data #2

Hi,
I have an issue running an analysis. I prepared the necessary files in the following directory structure:

.
└── test
├── input
│ ├── backsplice_gene_name.txt
│ ├── hg38.fa
│ ├── hg38.ndb
│ ├── hg38.nhr
│ ├── hg38.nin
│ ├── hg38.njs
│ ├── hg38.not
│ ├── hg38.nsq
│ ├── hg38.ntf
│ ├── hg38.nto
│ ├── hg38.shd
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa
│ ├── Homo_sapiens.GRCh38.104.gtf
│ ├── hg38.00.idx
│ ├── hg38.01.idx
│ ├── hg38.02.idx
├── list_backsplice.txt
├── params.txt
└── path_files.txt

which lies inside C:\test. Then I paste the command: docker run -it -v C:\test :/data annadalmolin/craft:v1.0 to WIndows command prompt, but then I get an error:

Quitting from lines 90-153 (functional_predictions_all_circRNAs.Rmd)
Error in read.table(file_parameters, header = F) :
no lines available in input
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> read.table
In addition: Warning message:
package(s) not installed when version(s) same as current; use force = TRUE to
re-install: 'grid'

As far as I understand, it cannot detect input data or/and some packages in R are not appropriate version. But maybe I just should run it in Linux or directly in Docker?
I'm new to bioinformatics, so I might be making some basic mistakes. I find your analysis process so cool though, I would really like to make it work :)

The genome file annotation_chr.genome has no valid entries

Hello,
The CRAFT software is a power tools for me.
However, I got the following error messages when I want to use fasta file from GENCODE, which is used to find my circRNA.
Error: The genome file annotation_chr.genome has no valid entries. Exiting.
Error: The genome file annotation_chr.genome has no valid entries. Exiting.
mv: cannot stat '/data/input/backsplice_gene_name.txt': No such file or directory
I have changed the list_backsplice.txt and backsplice_gene_name.txt from 1 to chr1, to adapt my fasta file.
But the error still here.

I don't know whether I MUST use the genome file from Ensembl for CRAFT.
Can you give me some helps?

Thanks in advance.

Regards,
Kingatsu

Index instructions not anymore available

Hello and thanks for the tool, the link to beRBP instructions for genome indexing are not anymore available...could you indicate alternative sources or tools needed to produce them?

Thank you in advance
Eva

Alternative Graphical_Output Figure File Type?

Hello,

Is it possible to produce the figures in the graphical_output folder in any other format than .png? Specifically, I'm trying to produce them as .eps files.

Thank you for your time.

Best,
Megan

annadalmolin / craft Goto Github PK

craft's Introduction

CRAFT

Installation

Installation from the Docker image

Usage

Input data

Running the analysis

Output data

CircRNA sequence provided by the user

Additional notes

How to cite

craft's People

Contributors

Stargazers

Watchers

Forkers

craft's Issues

Recommend Projects

Recommend Topics

Recommend Org