Those events can be found in the data files and are plotted.
However, what does it mean?
The BSJ is neither covered by one nor by two mates, so how exactly is it covered?
First of all, thank you for making these scripts available. This is not really an issue, is more a request.
The full documentation for the primer module is missing. Could you solve this? Thank you
Cheers.
Right now the Excel file could provide more detailed information that is scattered throughout different files. This information should be unified in the produced .XLS file.
Bonus points: can we also incorporate newly detected exons from the reconstruct/FUCHS module?
The feature selection for exon and gene works without any errors. However, the exact same work flow fails after quite some time (~30 minutes) with a bedtools error:
/biosw/bedtools/git_unstable/bin/bedtools
Parsing annotation...
Processed 133938 entries
Done parsing annotation
Parsing annotation...
Processed 142387 entries
Done parsing annotation
Parsing BED input file...
Done parsing BED input file:
=> 294976 peaks, 45 nt average width
Parsing annotation...
Processed 58051 entries
Done parsing annotation
Parsing circular RNA input file...
Done parsing circular RNA input file:
=> 1956 circular RNAs, 11218 nt average (theoretical unspliced) length
Starting random shuffling of input peaks
Starting data acquisition from samplings
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/biosw/python3/3.5.1/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/biosw/python3/3.5.1/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/enrichment/enrichment_check.py", line 761, in random_sample_step
circular_intersect = self.do_intersection(shuffled_peaks[iteration], circ_rna_bed)
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/enrichment/enrichment_check.py", line 480, in do_intersection
intersect_return = base_bed.intersect(query_bed, c=True)
File "/home/tjakobi/.local/lib/python3.5/site-packages/pybedtools/bedtool.py", line 806, in decorated
result = method(self, *args, **kwargs)
File "/home/tjakobi/.local/lib/python3.5/site-packages/pybedtools/bedtool.py", line 337, in wrapped
decode_output=decode_output,
File "/home/tjakobi/.local/lib/python3.5/site-packages/pybedtools/helpers.py", line 356, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
bedtools intersect -c -b /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp -a /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.m5g3e86b.tmp
Error message was:
Error: line number 216606 of file /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp has 3 fields, but 6 were expected.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/tjakobi//.local/bin/circtools", line 18, in <module>
import circtools
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/__init__.py", line 2, in <module>
main()
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/circtools.py", line 30, in main
CircTools()
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/circtools.py", line 66, in __init__
getattr(self, args.command)()
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/circtools.py", line 187, in enrich
enrich.run_module()
File "/home/tjakobi/.local/lib/python3.5/site-packages/circtools/enrichment/enrichment_check.py", line 180, in run_module
), range(self.cli_params.num_iterations + 1))
File "/biosw/python3/3.5.1/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/biosw/python3/3.5.1/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
pybedtools.helpers.BEDToolsError:
Command was:
bedtools intersect -c -b /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp -a /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.m5g3e86b.tmp
Error message was:
Error: line number 216606 of file /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp has 3 fields, but 6 were expected.
Command exited with non-zero status 1
Command being timed: "circtools enrich -c /home/tjakobi/work/projects/circRNA/encode3/dcc_k562_hepg2_total_vs_cytosol//circTest/circRNA_HepG2_RNaseR_P_signif_1percFDR.csv -b /home/tjakobi/work/data/circtools/encode_hg38_clip_peaks/HepG2/combined/RBFOX2_HepG2_combined.bed -a /home/tjakobi/work/data/circtools/input/GRCh38.85.gtf -g /home/tjakobi/work/data/circtools/input/hg38.chrom.sizes -i 10000 -I three_prime_utr -I five_prime_utr -p 40 -P 1 -T 1 -o /home/tjakobi/work/data/circtools/output/encode/hepg2/utr// -F RBFOX2_HepG2_combined -t /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/"
User time (seconds): 56499.98
System time (seconds): 2763.36
Percent of CPU this job got: 3341%
Elapsed (wall clock) time (h:mm:ss or m:ss): 29:33.64
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 148304
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3629391
Minor (reclaiming a frame) page faults: 352230918
Voluntary context switches: 96145491
Involuntary context switches: 4247851
Swaps: 0
File system inputs: 553612394
File system outputs: 294311389
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
Interestingly, while the first temporary bed files exists, the second one cannot be recovered after the crash. Also the number of columns is different for the different runs:
tjakobi@porta:[run]{0}> grep "Error: line number" slurm-90*
slurm-90457.out:Error: line number 210067 of file /scratch/tjakobi/circtools/EIF3D_HepG2_combined/pybedtools.5fffa2o0.tmp has 4 fields, but 6 were expected.
slurm-90457.out:Error: line number 210067 of file /scratch/tjakobi/circtools/EIF3D_HepG2_combined/pybedtools.5fffa2o0.tmp has 4 fields, but 6 were expected.
slurm-90458.out:Error: line number 216897 of file /scratch/tjakobi/circtools/HNRNPC_HepG2_combined/pybedtools.0f_wvxui.tmp has 4 fields, but 6 were expected.
slurm-90458.out:Error: line number 216897 of file /scratch/tjakobi/circtools/HNRNPC_HepG2_combined/pybedtools.0f_wvxui.tmp has 4 fields, but 6 were expected.
slurm-90459.out:Error: line number 280584 of file /scratch/tjakobi/circtools/QKI_HepG2_combined/pybedtools.1j18rnyf.tmp has 5 fields, but 6 were expected.
slurm-90459.out:Error: line number 280584 of file /scratch/tjakobi/circtools/QKI_HepG2_combined/pybedtools.1j18rnyf.tmp has 5 fields, but 6 were expected.
slurm-90460.out:Error: line number 216606 of file /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp has 3 fields, but 6 were expected.
slurm-90460.out:Error: line number 216606 of file /scratch/tjakobi/circtools/RBFOX2_HepG2_combined/pybedtools.604gh2qr.tmp has 3 fields, but 6 were expected.
It would be nice to have some kind of auto-generated diagrams visualizing the enrichment results. However, it should not be a function that just draws 2.000 plots; instead something more high level would be better.
When run in feature mode, in order to compute the peaks / length the length is taken from the annotation.
This behavior will cause problems when the circRNA is located in an exon-rich region of the gene while the remaining linear part may span several KB of intron space (or vice-versa). It would make more sense to only account for accumulated feature length instead.
As far as I can see, the source page does not provide any instructions on how to install the software. Therefore, we have to provide walk through for the user. However, less external dependencies would be even better, of course. Example: samtools only requires htslib which more or less is okay if compiler and gzip is installed.
Benchmarks shows that the IO-heavy work flow of bedtools has a severe impact of the performance. It would be reasonable to instead use Brandon's replacement functions.
This script can then be called by the circtools platform. The infrastructure for calling R scripts has already been developed. We need to make a decision where the R scripts will be deployed during installation.
Right now code is split in two files, one for single samples and another one (_twin.R) for direct one-to-one comparisons. It would make sense to merge everything in one file and guess from the user input if we have one or 2 samples.
As an addition to the - arbitrary - categorization of circRNAs in groups based on their length, we should add a quantile plot to the visualization routines.
Hi Sir
I tried to install circtools on my PC(ubuntu 16.04) and I got the following error message. Could you help with this? Thank you
$ python3 setup.py install --verbose --user
running install
Requirement already satisfied: statsmodels in /home/cd/miniconda3/lib/python3.6/site-packages (0.9.0)
Requirement already satisfied: patsy in /home/cd/miniconda3/lib/python3.6/site-packages (from statsmodels) (0.5.0)
Requirement already satisfied: pandas in /home/cd/miniconda3/lib/python3.6/site-packages (from statsmodels) (0.23.0)
Requirement already satisfied: six in /home/cd/miniconda3/lib/python3.6/site-packages (from patsy->statsmodels) (1.11.0)
Requirement already satisfied: numpy>=1.4 in /home/cd/miniconda3/lib/python3.6/site-packages (from patsy->statsmodels) (1.14.3)
Requirement already satisfied: python-dateutil>=2.5.0 in /home/cd/miniconda3/lib/python3.6/site-packages (from pandas->statsmodels) (2.7.3)
Requirement already satisfied: pytz>=2011k in /home/cd/miniconda3/lib/python3.6/site-packages (from pandas->statsmodels) (2018.4)
Bioconductor version 3.6 (BiocInstaller 1.28.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install
BioC_mirror: https://bioconductor.statistik.tu-dortmund.de
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.4 (2018-03-15).
installation path not writeable, unable to update packages: DBI, RMySQL,
codetools, foreign, lattice, spatial
Skipping install of 'CircTest' from a github remote, the SHA1 (2fd16602) has not changed since last install.
Use force = TRUE to force installation
Skipping install of 'primex' from a github remote, the SHA1 (f715f111) has not changed since last install.
Use force = TRUE to force installation
Cloning into 'DCC'...
remote: Counting objects: 876, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 876 (delta 6), reused 17 (delta 4), pack-reused 857
Receiving objects: 100% (876/876), 226.95 KiB | 239.00 KiB/s, done.
Resolving deltas: 100% (599/599), done.
Checking connectivity... done.
Traceback (most recent call last):
File "setup.py", line 11, in
from setuptools import setup
ImportError: No module named setuptools
Cloning into 'FUCHS'...
remote: Counting objects: 1928, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 1928 (delta 2), reused 7 (delta 1), pack-reused 1920
Receiving objects: 100% (1928/1928), 36.20 MiB | 4.58 MiB/s, done.
Resolving deltas: 100% (1024/1024), done.
Checking connectivity... done.
Traceback (most recent call last):
File "setup.py", line 11, in
from setuptools import setup
ImportError: No module named setuptools
Traceback (most recent call last):
File "setup.py", line 234, in
'Documentation': 'http://docs.circ.tools'
File "/home/cd/miniconda3/lib/python3.6/site-packages/setuptools/init.py", line 129, in setup
return distutils.core.setup(**attrs)
File "/home/cd/miniconda3/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/cd/miniconda3/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/cd/miniconda3/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 71, in run
subprocess.check_call(["bash", "scripts/install_external.sh"])
File "/home/cd/miniconda3/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bash', 'scripts/install_external.sh']' returned non-zero exit status 1.
Currently visualization is done using raw #circRNA isoforms/gene counts. It would be more helpful to normalize the count in a RPKM/FPKM like manner, e.g. isorms per kilobase .
Currently the installation fails on fresh systems when numpy is not installed BEFORE statsmodels is installed. The requirements.txt or setup.py order do not have any influence on the build order.
A module to design siRNA sequences for use with circlular RNA was suggest on the SPP1738 conference and seems like a natural extension of circtools' functionality.
For n=2000 iterations we found quite often 1980 entries in the column for linear raw count. Given that the sample was run with -i 2000 and -p 20 that seems like the last iterations may be skipped.
The easiest is way would probably be to directly interface with the corresponding main class and to circumvent the need to call another python instance from within python. We need something like an API for DCC and FUCHS that is open to the outside.
The script should be based on the existing step 1 analysis script.
However, as the rest of circtools it has to work independently and needs work to be as generic as possible.
When running the primer design install script, I get stuck in a loop and can't continue the installation:
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
The current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]
^CThe current version of circtools can work only with RSQLite version <= 1.1.5
Your version is 2.0
Would you like to install the 1.1.15 one? [y/n]