aidenlab / juicer Goto Github PK
View Code? Open in Web Editor NEWA One-Click System for Analyzing Loop-Resolution Hi-C Experiments
Home Page: http://aidenlab.org
License: MIT License
A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
Home Page: http://aidenlab.org
License: MIT License
Hi there
I'm running HiCCUPs on a server with ~200 CPUs but no GPUS, using the following command:
java -Xmx2g -jar /path/Juicer/scripts/juicer_tools_linux_0.8.jar hiccups -m 500 -r 5000,10000 -f 0.1,0.1 -p 4,2 -i 7,5 -d 20000,20000 -c 22 --ignore_sparsity /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic HMEC.hiccups.loops
this outputs the following:
Reading file: /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic
HiC file version: 8
Using the following configurations for HiCCUPS:
Config res: 5000 peak: 4 window: 7 fdr: 10% radius: 20000
Config res: 10000 peak: 2 window: 5 fdr: 10% radius: 20000
Warning Hi-C map is too sparse to find many loops via HiCCUPS.
Running HiCCUPS for resolution 5000
GPU/CUDA Installation Not Detected
Exiting HiCCUPS
That's a bummer. Often times there's a way to use CPUs instead of GPUs (e.g. with Tensorflow).
Does this exist? Can I use my many CPUs instead of GPUs for this task?
Hi,
I have gotten this error in trying to run Juicer .
The job output suggests it is successfully completed, but it looks like that job runs scripts to create _msplit_optdups.txt failed, and opt_dups.txt is an empty file.
Does this error matter ?
Any thoughts you have would be really appreciated!
Thanks.
Chengfei
I would like to perform an APA analysis
with your software. First of all, I have both a raw matrix and an ICE corrected matrix (500 ICE iterations) in text format. In order to perform the APA analysis
, should I create the .hic
file with the raw data or with the normalized one?
Once I know what kind of data to use, I should create the .hic
file. I can do it, according to the docs, with the Pre
tool. One of the accepted formats is the Short with score format
, which has the following columns:
<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <score>
So, as I have already binned data at 80K
, do I have to create this file, for instance, as below (ignoring the fragment, 0
, and the strand always to +
)?
+ chr1 80000 0 + chr1 320000 0 356
Once I have the .hic
file. I need also to have a loops file. I do have a list of TADs from HiCExplorer
. The APA analysis
is suitable to check my TADs or do I need to check the loops with Juicer
?
Thank you.
Hello,
Thanks very much for juicer!
I am new to juicer, and I'd like to know how I should proceed from pre-aligned R1 and R2 BAMs? Is there a work-around without re-aligning? From the error output, I can see juicer were looking for specific intermediate files to continue which I dont have.
Any help would be appreciated!
cheers,
Simo
Dear professor,
I tried to install the juicer on the computer, but I can't find juicebox_clt.jar files, I am not sure whether it has been replaced by juicer_tools_0.7.5.jar or just because I didn't install it correctly?
Best,
Yu
After the update to the PBS version of the juicer scripts I am able to run juicer.sh. However now all the jobs are created but the first job for some reason terminates and ends up causing the remaining jobs to become orphans. I am just trying it on the small test data set provided in the wiki.
When I first run juicer.sh it creates 5 jobs seen here:
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
207555.merlot AlnWrpC18126 stansfieldjc 0 R workq
207556.merlot MStWrpC18126 stansfieldjc 0 H workq
207557.merlot RDpWrpC18126 stansfieldjc 0 H workq
207558.merlot SpWrp1C18126 stansfieldjc 0 H workq
207561.merlot SpWrp2C18126 stansfieldjc 0 H workq
I then get the following email from the cluster after a minute or two:
PBS Job Id: 207556.merlot.bis.vcu.edu
Job Name: MStWrpC18126
Aborted by PBS Server
Job deleted as result of dependency on job 207555.merlot.bis.vcu.edu
And after that the remaining 3 jobs remain orphaned and on hold.
I then got the next email from the cluster:
PBS Job Id: 207555.merlot.bis.vcu.edu
Job Name: AlnWrpC18126
Post job file processing error; job 207555.merlot.bis.vcu.edu on host node10
Do you know what is going on here or how I can fix it?
Hi there,
After alignment the pipeline crashes around line 512 in the chimeric_blacklist.awk script.
Syntax error - I havent tested yet but could be a stray "}"
Nicola
(-: Looking for fastq files...fastq files exist
Tue 3 Jan 2017 23:28:47 GMT
Juicer version:1.5
../juicer.sh -z ../references/genome.fa -p ../references/genome.dict -y ../restriction_sites/ -D ..
(-: Aligning files matching
opt/juicer/CPU/fastq/*_R*.fastq*
in queue to genome hg19 with site file ../restriction_sites/
(-: Created /opt/juicer/CPU/splits and
/opt/juicer/CPU/aligned.
Running command bwa mem -t 4 ../references/genome.fa opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq > opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19773217 bp)...
[M::mem_process_seqs] Processed 215849 reads in 1673.624 CPU sec, 1568.074 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa
/opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq
[main] Real time: 1587.733 sec; CPU: 1684.597 sec
(-: Align of /opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam done successfully
Running command bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq > /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19766979 bp)...
[M::mem_process_seqs] Processed 215849 reads in 2004.714 CPU sec, 1860.978 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq
[main] Real time: 1881.219 sec; CPU: 2015.746 sec
(-: Mem align of /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam done successfully
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /opt/juicer/CPU/splits/CTCF_S1_L001.fastq.sam created successfully.
awk: syntax error at source line 512 source file ../scripts/common/chimeric_blacklist.awk
context is
t_norm, count_abnorm) >> >>> fname1".res.txt" <<< ;
awk: illegal statement at source line 513 source file ../scripts/common/chimeric_blacklist.awk
hello,
I used juicer_tools to dump my Hi-C data recently. In juicer's dump you provided three normalization methods: VC, VC_SQRT, KR, and I want to know what are the principles of them. I searched them on the internet and your paper(Rao et al. 2014), but only find KR.
I have to know about the normalization method of Hi-C in my study, so would you tell me the general principles of the three normalization method in your tools? Or some relative materails and references is good.
Thank you!
Yours,
J.Wan
Hi there,
I was wondering if it would be possible to allow a user to perform HICCUPS/APA/Arrowheads-analyses using their own matrices. I can imagine that not everybody has access to the original data and still want to use this excellent tool-kit.
The most easy way of doing this would be to make a conversion-tool (from e.g. Hi-C summary files/validpairs) to .hic files. This will lead to more people using the "aiden-lab Hi-C ecosystem".
Thanks for both reading this issue and for developing juicer 👍
Kind regards,
Robin
(happy to help btw)
Dear,
I want to use the generate_site_positions.py
script to create a restriction sites file for my study genome. But I don't know what is the [location]
parameter in this python script?
generate_site_positions.py <restriction enzyme> <genome> [location]
By the way, can I use the [-s site]
or alternatively use[-y restriction site file]
parameters in juicer.sh
? I meaning the restriction sites file is not need if I set the [-s site]
parameter.
What is the [-p chrom.sizes path]
parameter and function in the juicer sortware?
Thanks.
Hi,
Sorry to bother you but I try to extract the matrix from the .hic data download from GEO, however, I always came with the error as below:
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:487)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:356)
at juicebox.tools.HiCTools.main(HiCTools.java:85)
Could you help me with that issue?
Best,
Yu
Hi, i've updated the CPU / chimeric_blacklist.awk script and now get an error on line 223 when running the example.
223: str[j] = and(tmp[2],16);
it may be that and() is not defined in the OS X version of awk. bwa runs and completes, then after two sorting steps there is an error:
$ ./juicer.sh -s HindIII -g hg38
(-: Looking for fastq files...fastq files exist
Fri 6 Jan 2017 13:32:35 GMT
Juicer version:1.5
....
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam created successfully.
awk: calling undefined function and
input record number 3, file /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam
source line number 223
thanks,
Stuart
Hi,
I have the following directory structure:
references:
total 8374528
-rwxr-xr-x+ 1 sm2556 mane 3157608038 Sep 13 12:32 Homo_sapiens_assembly19.fasta
-rw-r--r--+ 1 sm2556 mane 6663 Sep 13 13:31 Homo_sapiens_assembly19.fasta.amb
-rw-r--r--+ 1 sm2556 mane 939 Sep 13 13:31 Homo_sapiens_assembly19.fasta.ann
-rw-r--r--+ 1 sm2556 mane 3095694072 Sep 13 13:30 Homo_sapiens_assembly19.fasta.bwt
-rw-r--r--+ 1 sm2556 mane 773923497 Sep 13 13:31 Homo_sapiens_assembly19.fasta.pac
-rw-r--r--+ 1 sm2556 mane 1547847040 Sep 13 13:44 Homo_sapiens_assembly19.fasta.sa
-rw-r--r--+ 1 sm2556 mane 377 Sep 13 15:19 Homo_sapiens_assembly19.sizes
restriction_sites:
total 15360
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII_new.txt
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII.txt
scripts:
total 92800
-rwxr-xr-x+ 1 sm2556 mane 3519 Sep 13 11:26 check.sh
-rwxr-xr-x+ 1 sm2556 mane 15349 Sep 13 11:26 chimeric_blacklist.awk
-rwxr-xr-x+ 1 sm2556 mane 1971 Sep 13 11:26 cleanup.sh
-rwxr-xr-x+ 1 sm2556 mane 3584 Sep 13 11:26 collisions.awk
-rwxr-xr-x+ 1 sm2556 mane 1616 Sep 13 11:26 countligations.sh
-rwxr-xr-x+ 1 sm2556 mane 13448 Sep 13 11:26 diploid.pl
-rw-r--r--+ 1 sm2556 mane 2449 Sep 13 11:26 diploid_split.awk
-rwxr-xr-x+ 1 sm2556 mane 5325 Sep 13 11:26 dups.awk
-rw-r--r--+ 1 sm2556 mane 3726 Sep 13 11:26 fragment_4dnpairs.pl
-rwxr-xr-x+ 1 sm2556 mane 3711 Sep 13 11:26 fragment.pl
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:31 juicebox
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:30 Juicebox.jar
-rw-r--r--+ 1 sm2556 mane 30751431 Sep 13 12:30 juicebox_tools.7.0.jar
-rwxr-xr-x+ 1 sm2556 mane 2388 Sep 13 11:26 juicer_arrowhead.sh
-rwxr-xr-x+ 1 sm2556 mane 3269 Sep 13 11:26 juicer_hiccups.sh
-rwxr-xr-x+ 1 sm2556 mane 3651 Sep 13 11:26 juicer_postprocessing.sh
-rwxr-xr-x+ 1 sm2556 mane 41529 Sep 13 11:26 juicer.sh
-rwxr-xr-x+ 1 sm2556 mane 4659 Sep 13 11:26 LibraryComplexity.class
-rwxr-xr-x+ 1 sm2556 mane 7204 Sep 13 11:26 LibraryComplexity.java
-rwxr-xr-x+ 1 sm2556 mane 2354 Sep 13 11:26 makemega_addstats.awk
-rwxr-xr-x+ 1 sm2556 mane 12782 Sep 13 11:26 mega.sh
-rwxr-xr-x+ 1 sm2556 mane 2455 Sep 13 11:26 relaunch_prep.sh
-rwxr-xr-x+ 1 sm2556 mane 5200 Sep 13 11:26 split_rmdups.awk
-rwxr-xr-x+ 1 sm2556 mane 14572 Sep 13 11:26 statistics.pl
-rwxr-xr-x+ 1 sm2556 mane 1751 Sep 13 11:26 stats_sub.awk
fastq:
total 0
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:33 S1_003_HiC_R1.fastq.gz -> ../../analysis jul052016/S1_003_HiC/Unaligned/S1_003_HiC_1.fastq.gz
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:34 S1_003_HiC_R2.fastq.gz -> ../../analysis-jul052016/S1_003_HiC/Unaligned/S1_003_HiC_2.fastq.gz
My run.sh script for the SLURM batch submission looks as follows:
#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=Juicer
#SBATCH --ntasks=1 --nodes=1
#SBATCH --mem-per-cpu=6000
module load BWA; module load Java; bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt - D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x
The scripts folder was copied from the cloned GitHub repository of the juicer/SLURM/scripts
.
I get tons of error messages about dependencies not being satisfied, but I still get the part of script that "split" the fastq.gz file correctly, but still ends with error. The actual bwa mem
call never happens on the cluster. When I tried to run the script in the CPU mode it started the alignment. But my files are too big, and CPU mode will take a long time. Am I doing something wrong?
In chimeric_blacklist, the size of the mitochondria is hardcoded to hg19. This is to deal with circular chromosomes - trying to assign position correctly based on CIGAR string, sometimes the position will end up off the end of the chromosome, in which case it maps to beginning. This is the code:
# Mitochondria loops around
if (chr[j] ~ /MT/ && pos[j] >= 16569) {
pos[j] = pos[j] - 16569;
}
Theoretically, any differently sized MT could go off the end (mouse for example); and any mitochondrial chromosome not named "MT" could also go off the end. In practice this hasn't happened; however, we should keep our eyes on this issue.
Hi,
Why I cannot download juice_tools form https://github.com/theaidenlab/juicer/wiki/Download ?
Thank You!
Thanks for your work on hic scaffolding . when i download the code indicated in science , but it did not existed in the website: github.com/theaidenlab/HiC-assembly-pipeline-archive
.Could you help me?
Hi,
I am trying to map Hi-C raw reads downloaded from GEO using juicer.
For some samples (not all) , juicer stopped in the merging step with the out file like:
"
_### Sun Apr 30 15:21:29 EDT 2017
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
/ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq.sam created successfully.
***! No /ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq_norm.txt file created "
Also see this in the attached file:
merge-1327310.txt
It seems that the problem comes from the script:
chimeric_blacklist.awk
Could you please tell me what caused this problem?
Thank you!
Chengfei Yan
Postdoc Associate from the Gerstein Lab at Yale Univeristy
I'm using AWS EC2 instances, and I was wondering how I can utilize more than one cpu (which is how I assume the cpu "version" works). Was there something like a --threads flag? I also noticed the AMI for Juicer is for version 1.06, so I decided to install it on a fresh instance instead.
Hello!
The title says it all. Is there any way to discover why these reads are registering as chimeric ambiguous? None of the reference sets tend to have such odd stats. I have substituted the names of our conditions and genes in order to protect our ability to publish the results.
Here is the syntax used to run juicer:
module load juicer
cd /scratch/Experiment1/
juicer.sh -p $JUICER/references/hg19.chrom.sizes -s HindIII -y /usr/local/apps/juicer/juicer-1.5/SLURM/restriction_sites/hg19_HindIII.txt
Each folder has two fastq files and they are paired with the _R1.fastq.gz extension.
-bash-4.1$ head .hic -n 14
HI69/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 51,225,101
Normal Paired: 4,837,169 (9.44%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 46,387,931 (90.56%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 17,626,936 (34.41%)
Alignable (Normal+Chimeric Paired): 4,837,169 (9.44%)
Unique Reads: 4,371,746 (8.53%)
PCR Duplicates: 460,436 (0.90%)
Optical Duplicates: 4,987 (0.01%)
Library Complexity Estimate: 23,718,686
Intra-fragment Reads: 41,555 (0.08% / 0.95%)
Below MAPQ Threshold: 832,482 (1.63% / 19.04%)
-bash-4.1$ head .hic -n 14
HI /usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 53,112,216
Normal Paired: 4,531,213 (8.53%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 48,581,002 (91.47%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 15,353,175 (28.91%)
Alignable (Normal+Chimeric Paired): 4,531,213 (8.53%)
Unique Reads: 4,165,313 (7.84%)
PCR Duplicates: 361,219 (0.68%)
Optical Duplicates: 4,681 (0.01%)
Library Complexity Estimate: 26,831,778
Intra-fragment Reads: 51,098 (0.10% / 1.23%)
Below MAPQ Threshold: 821,990 (1.55% / 19.73%)
-bash-4.1$ head .hic -n 14
HIn▒/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 70,885,255
Normal Paired: 4,735,332 (6.68%)
Chimeric Paired: 1 (0.00%)
Chimeric Ambiguous: 66,149,921 (93.32%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 16,650,157 (23.49%)
Alignable (Normal+Chimeric Paired): 4,735,333 (6.68%)
Unique Reads: 4,391,686 (6.20%)
PCR Duplicates: 338,623 (0.48%)
Optical Duplicates: 5,024 (0.01%)
Library Complexity Estimate: 31,443,095
Intra-fragment Reads: 50,678 (0.07% / 1.15%)
Below MAPQ Threshold: 807,014 (1.14% / 18.38%)
Thanks,
James D
General announcement -
Please use our forum, aidenlab.org/forum.html for asking questions about juicer, including anything related to installation, running the software, interpreting warnings/errors, or general questions related to 3D genomics.
Please use Github issues specifically for reporting bugs in the software or for new feature requests.
Hi,
I'm using juicer 1.5.5 and I'm running into an issue right of the gate with a script called useuse. It's referenced in juicer.sh, but is not included in the 1.5.5 release.
source/juicer-1.5.5/UGER/scripts/juicer.sh: line 337: /broad/software/scripts/useuse: No such file or directory
Hi there,
It seems like there is a bit of a path error for script execution with the "CPU" pipeline. For example,
JUICER_INSTALL_DIR=/lab/solexa_weng/testtube/juicer
SCRIPT_DIR=/lab/solexa_weng/testtube/juicer/CPU/common/
JUICER_WORK_DIR=(absolute path to directory in current directory, fastq files properly setup there)
${JUICER_INSTALL_DIR}/CPU/juicer.sh -D $SCRIPT_DIR -g MyGenome -t 16 -z ../../MyGenome.fasta -p chrom_sizes.txt -y MyGenome.fasta_MboI.txt -d $JUICER_WORK_DIR 1>juicer.stdout.log 2>juicer.stderr.log
Runs for a bit, giving a non-exiting error:
/lab/solexa_weng/testtube/juicer/CPU/juicer.sh: line 363: /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/countligations.sh: No such file or directory
and the exiting error:
awk: fatal: can't open source file /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/chimeric_blacklist.awk' for reading (No such file or directory)
If we look at where the countligations.sh and chimeric_blacklist.awk files are:
tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name countligations.sh
/lab/solexa_weng/testtube/juicer/CPU/common/countligations.sh
/lab/solexa_weng/testtube/juicer/SLURM/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/AWS/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/UGER/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/LSF/scripts/countligations.sh
tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/CPU/common/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/SLURM/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/AWS/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/UGER/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/LSF/scripts/chimeric_blacklist.awk
It looks like the issue may be that the juicer.sh script expects "/scripts/common/" as a hardcoded prefix, however the CPU scripts don't follow this convention. Do you agree? Or am I executing the juicer pipeline wrong
For some reason when I run juicer I don't get any output for arrowhead. I do, however, get a good .hic file that I can visualize in juicebox. When I run arrowhead separately using juicer tools I get very few TAD domains (~100 for the entire genome at best when messing around with the r and m settings).
When I use another program (HiCexplorer) I get a HiC matrix that looks exactly the same, but it provides me with thousands of TAD domains. Any suggestions on what the issue might be with juicer?
Additionally, when visualizing the contact matrix in juicebox I'd like to change the order in which it displays the scaffolds/chromosomes. Is there any way to do this?
One last question, I can't seem to figure out how to run hiccup on my mac or linux. It requires GPU and I have no experience using GPU. Any suggestions on how to get it to work?
java -jar ~/tools/juicebox/juicer_tools_0.7.0.jar eigenvector VC K526.links.hic chr11 BP 100000 -p > test.txt
It looks like you're printing the HiC file version to stdout rather than stderr.
note that Juicer is assumed to be located in /opt/juicer, when I run the command as the instruction suggests, I got error "***! Reference sequence /opt/juicer/references/Homo_sapiens_assembly19.fasta does not exist", I don't have the root to build Juices in /opt/, is there any way that I can use it without root?
Thanks.
on line 673 in the file juicer/SLURM/scripts/juicer.sh there is fi.
I cannot find the matching if statement. Is this a bug?
Thank you.
I use the CPU version of juicer to dump data from two .hic files, but the programs seems can't reconganize the .hic file. BTW, juicer works well on single .hic with the same command.
Juicer Tools Version 1.7.6
Resolution=10000
JUICER=/home/software/juicer/CPU/juicer_tools.jar
for j in {1..22}; do java -jar ${JUICER} dump observed NONE GSM1551601_HIC052_30.hic,GSM1551602_HIC053_30.hic ${j} ${j} BP $Resolution raw_${Resolution}.chr${j}; done
error
Could not read hic file: null
Could not read hic file: null
Could not read hic file: null
The sort by name should be -k1,1f
I'm calculating eigenvectors from hic files and when I go below 500,000 bp resolution, I get this warning:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.
It is possible to bypass this warning and forge ahead with higher resolution?
Hello, I'm downloading reference files for use with Juicer. hg19 works fine (Homo_sapiens_assembly19.*
at https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references
), but I can't access mm9. I tried Mus_musculus_assembly9_norandom.fasta
as in the "Installation" wiki, but that does not work; it fails with a 403 Forbidden response. I tried some variants on the name, but none of those was a hit. I can generated the necessary files if needed, but are they available for mm9 on AWS or elsewhere? Thanks!
Hi,
Is it necessary to have GPU, if i am not immediately interested in running HICCUPS?
Sameet
Dear professor,
when I use the command ,there also have an error. could you help me?
java -jar /share/nas30/liufuyan/Project/AT/Interaction/06.TAD/Soft/juicebox-master/out/artifacts/Juicebox_tools_jar/juicebox_tools.jar pre -f ../QC/digest_AT.fa.bed -q 0 tmp/93500_allValidPairs.pre_juicebox_sorted test ../QC/AT.fa.len
Skipping Chr1 30427671
Skipping Chr2 19698289
Skipping Chr3 23459830
Skipping Chr4 18585056
Skipping Chr5 26975502
Warning: Unable to process fragment file. Pre will continue without fragment file.
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1457)
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1228)
at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:642)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:373)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:106)
at juicebox.tools.HiCTools.main(HiCTools.java:83)
I am trying to run calculate_map_resolution.sh on GSM1551620_HIC071_merged_nodups.txt from your 2014 GEO repository. When I run the following command
./calculate_map_resolution.sh GSM1551620_HIC071_merged_nodups.txt 50bp.txt
I get this error:
../calculate_map_resolution.sh: line 104: [: -lt: unary operator expected
Thanks for your good software. But I didn't find a clear way to call TAD or compartment.Can juicer_tools run to call TAD or compartment (by using Arrowhead,HiCCUPS etc?)
Many Thanks!
Hi,
Very powerful software. I just wonder whether we can take advantage of juicer to call the 6 subcompartment with interchromosomal matrix?
Best,
Yu
Hi,sorry to bother you!
could you please tell me about the three normalization methods(VC,VC_SQRT,KR)?
Are they similar to the distance normalization when we calculated the eigenvector?
waiting for you reply!
Best wishes
Dear professor
I tried to use eigenvectors to generate compartment with .hic files, when I set resolution as 250,000, it just states like this and failed to generate any files:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.
I have checked the issues and found other one has mentioned that before, but I am sorry I can't find out the solutions, should I updated any file ?
Best,
Yu
Hi,
I generated the input file for pre from BAM file (generated by Babraham HiCUPs) with the kind solution provided in this forum. Subsequently, I ran pre command, and have generated the .hic file. I am not sure if it is successful as I have encountered an exception.
java.lang.IndexOutOfBoundsException: Index: 6, Size: 4 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at java.util.Collections$UnmodifiableList.get(Collections.java:1211) at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:143) at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:247) at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:496) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:374) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:286) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:105) at juicebox.tools.HiCTools.main(HiCTools.java:97)
My question is how do I know if the hic file generated was complete and did not stop at the point where the exception was thrown?
Thank you.
PS:
Code for converting bam to input file for pre:
samtools view read1_2.hicup.bam | awk 'BEGIN {FS="\t"; OFS="\t"} {name1=$1; str1=and($2,16); chr1=substr($3, 4); pos1=$4; mapq1=$5; getline; name2=$1; str2=and($2,16); chr2=substr($3, 4); pos2=$4; mapq2=$5; if(name1==name2) { if (chr1>chr2){print name1, str2, chr2, pos2,1, str1, chr1, pos1, 0, mapq2, mapq1} else {print name1, str1, chr1, pos1, 0, str2, chr2, pos2 ,1, mapq1, mapq2}}}' | sort -k3,3d -k7,7d > Arrowhead.input
my command for generating .hic file:
java -Xmx2g -jar /mnt/projects/wlwtan/cardiac_epigenetics/george/juicer/juicer_tools.1.6.2_linux_jcuda.0.8.jar pre -f mm9_DpnII.txt -q 30 Arrowhead.input Arrowhead.hic mm9
Hi,
Thanks for sharing the code in this much detail!
just wondering where can I find the restriction site file
$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";
Is it generated by HICUP? just wanna know what's the format look like.
Thanks!
Hurley
Discovered on lines 407 and 452 code is:
if [ -v shortread ] || [ "$shortreadend" -eq 1 ]
but getting a
[: -v: unary operator expected
error
I'm guessing this needs to be changed to:
if [ -v $shortread ] || [ "$shortreadend" -eq 1 ]
to call the $shortread variable. Can you please check this?
Some SLURM users can't use scontrol update. The workaround is to run in two stages. If stage is early exit, the scontrol commands should not happen.
Hello, I'm new to manipulation of hic data and am trying to extract dense matrices from a hic file, and the dump command fails:
java -jar juicer/scripts/juicer_tools.jar dump -d observed KR GSE80701_DpnII_HinfI_combo.hic 25000 arm_2L arm_2L 2L.matrix
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:455)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:347)
at juicebox.tools.HiCTools.main(HiCTools.java:96)
I successfully extracted sparse matrices using straw.
Could anyone help me understanding the errors?
Thanks in advance
Remi - NYUSoM
Slight logic error in the if/else loops at the top.
If genome is one of the listed genomes, AND the location is provided, run still fails because /seq/reference isn't universal.
You need to add an
elif len(sys.argv)==3
(etc) at line 24 or check if the len(sys.argv)==4 before and use filename= instead.
Hi,
I cannot seem to find an AMI corresponding to ami-458fc22f. Was the tutorial moved? Is it still available?
thanks
I am trying to run juicer on a PBS cluster using the new PBS scripts. When I run the juicer.sh script I get the following error:
Starting job to launch other jobs once splitting is complete
207474.merlot.bis.vcu.edu
below is the jID_alignwrap jobid
207474.
#PBS -W depend=afterok:207474.
qsub: illegal -W value
I think this is because of the period after the job ID number. For reference on our cluster jobs are named like this: 206998.merlot and can be called using only the number.
How can I modify the script to only use the number and drop the period from the job ID being used for the PBS -W command?
hi,
I am using pre to produce .hic file and the command line is:
java -jar juicer_tools.1.7.6_jcuda.0.8.jar pre -r 40000 -q 30 -f ../01.data/hg19_MobI.txt ../01.data/test3.txt.gz ./M3-736.hic hg19
while I met this problem:
Start preprocess
Writing header
Writing body
......Error: the chromosome combination 14_15 appears in multiple blocks
Do you know why the error happen? Look forward for your reply. Thank you so much.
min
I have run a single CPU version of juicer by the command
bash ~/juicer/scripts/juicer.sh -d ~/juicer/work/DNA -s none -z ~/juicer/references/hg19.fa -p ~/juicer/references/hg19.sizes -D ~/juicer -x
And I got the following error
awk: /home/ljw/juicer/scripts/common/chimeric_blacklist.awk: line 515: function and never defined
It seems that chimeric_blacklist.awk only has 513 lines. How can I fix this? Thank you.
The error message throws like this:
juicer$ ./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Looking for fastq files...fastq files exist
Wed Jan 4 11:52:48 EST 2017
Juicer version:1.5
./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Aligning files matching XXX/fastq/_R.fastq*
in queue to genome hg19 with site file ./restriction_sites/hg19_HindIII.txt
--- Using already created files in XXX/splits
gzip: XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz: No such file or directory
gzip: XXX/splits/XXXHiC-HI-1_S0_R2.fastq.gz: No such file or directory
The problem has to do with the soft links and relative directories. If you do ls -lh XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz
it probably points to XXX/fastq/XXXHiC-HI-1_S0_R1.fastq.gz
- which from the perspective of that directory, does not exist (would be under the splits directory).
To correct it, either run juicer from your directory (i.e., cd XXX
then run juicer instead of sending in “-d” flag - juicer calls ‘pwd’ which gives the absolute path) or run with the -d flag but with the absolute directory (i.e.-d /path/to/my/folder/XXX
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.