stevenwingett / hicup Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 11.0 2.38 MB

Hi-C data processing pipeline

License: GNU Lesser General Public License v3.0

Perl 95.23% R 4.77%

hicup's People

Contributors

Stargazers

Watchers

Forkers

vmalysheva s-andrews markyanghc distilledchild jing-xinxing pele0412 lldelisle tong2200 yexiang2046 mtekman andy3nieto

hicup's Issues

Git Actions

Write a basic Git actions pipeline for HiCUP.

Error when passing HiCUP parameters via command line

Ran HiCUP without a configuration file and got the following errors:

[wingetts@headstone hicup_patches]$ hicup --zip --digest /bi/scratch/Genomes/Human/GRCh38/Digest_Homo_sapiens_GRCh38_HindIII_None_14-43-31_10-02-2016.txt.gz --index /bi/scratch/Genomes/Human/GRCh38/Homo_sapiens.GRCh38 --longest 700 --shortest 50 --bowtie2 /bi/apps/bowtie2/2.4.1/bowtie2 test_dataset/*.fastq
Starting HiCUP pipeline (v0.7.3)
FASTQ quality format not specified, analysing file 'test_dataset/test_dataset1.fastq' to predict file format used
FASTQ quality set to 'Sanger'
Detecting R automatically
Found R at '/bi/apps/R/3.6.1/bin/R'
Reading genome digest file '/bi/scratch/Genomes/Human/GRCh38/Digest_Homo_sapiens_GRCh38_HindIII_None_14-43-31_10-02-2016.txt.gz' to determine Hi-C restriction enzyme
Truncating with HiCUP Truncater v0.7.3
Truncating sequences at occurrence of sequences '[AAGCTAGCTT]'
Truncating sequences
Truncating test_dataset/test_dataset1.fastq
Truncating test_dataset/test_dataset2.fastq
Truncating complete
Mapping with HiCUP Mapper v0.7.3
Using aligner 'bowtie2'
Mapping test_dataset1.trunc.fastq.gz
test_dataset1.trunc.fastq.gz Aligner error: Use of uninitialized value $bt2_args[7] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 423.
Use of uninitialized value $bt2_args[8] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 423.
Use of uninitialized value in exists at /bi/apps/bowtie2/2.4.1/bowtie2 line 81.
Use of uninitialized value in exists at /bi/apps/bowtie2/2.4.1/bowtie2 line 81.
Use of uninitialized value $bt2_args[7] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 459.
Use of uninitialized value $bt2_args[8] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 459.
Mapping test_dataset2.trunc.fastq.gz
test_dataset2.trunc.fastq.gz Aligner error: Use of uninitialized value $bt2_args[7] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 423.
Use of uninitialized value $bt2_args[8] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 423.
Use of uninitialized value in exists at /bi/apps/bowtie2/2.4.1/bowtie2 line 81.
Use of uninitialized value in exists at /bi/apps/bowtie2/2.4.1/bowtie2 line 81.
Use of uninitialized value $bt2_args[7] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 459.
Use of uninitialized value $bt2_args[8] in join or string at /bi/apps/bowtie2/2.4.1/bowtie2 line 459.
Mapping complete
Pairing files with HiCUP Mapper v0.7.3
Pairing test_dataset1.map.sam and test_dataset2.map.sam
Pairing complete
Filtering with HiCUP Filter v0.7.3
Processing digest file /bi/scratch/Genomes/Human/GRCh38/Digest_Homo_sapiens_GRCh38_HindIII_None_14-43-31_10-02-2016.txt.gz
Sonication protocol followed - Restriction_Enzyme1:HindIII [A^AGCTT]
Filtering test_dataset1_2.pair.bam
Filtering complete
Removing duplicates with HiCUP Deduplicator v0.7.3
De-duplicating test_dataset1_2.filt.bam
De-duplication complete
Creating HTML and text file HiCUP summary reports
HiCUP processing complete.

How to use the outputfile hicup.bam to run 3ddna

hi,
Thank you for developing useful tools. Now I have run the whole pipeline and obtained all results. I want to use the output to run 3D-DNA to assemble. Which file I should use and how to address it. I hope you can help me. Thanks.

NlaIII digestion problem

We used NlaIII enzyme to digest in our Hi-C.
I specified --re1 CATG^,NlaIII when I run hicup_digester, the result file seems good, here shows the head of the file.

But when I run HICUP with it, it gives no results, from the log I found there is no sequence in [].

Truncating with HiCUP Truncater v0.7.4
Truncating sequences at occurrence of sequences '[]'
Truncating sequences

how to chose --format option while running hicup_mapper

Hi,
First of all, thanks and congrats for your tool, it's very user-friendly and efficient.
I had a doubt when running hicup_mapper. more in detail I am not sure about what --format option to use. For example, if my data have been sequenced via an Illumina HiSeq 2500 instrument, have I to chose Sanger, Solexa_Illumina_1.0, Illumina_1.3 or Illumina_1.5? Is there a way to find out what option to use?
Thank you in advance,
Giuseppe

Summary report file

Hi. I tried to run a pipeline followed by this video. But at the end, I don't have a summary report file. Who can help me to find the solution? Thank you.

Check Mac compatibility

While HiCUP is not designed for use on a Mac, changing how zipped files are opened on a Mac may make it work.

what is the difference between hicup.bam and pair.bam? Why not all files will produce hicup.bam?

UMIs incorporated into deduplicator

Hi Steven,

I'm doing a cHiC experiment using adapters containing a 11bp UMI. I wanted to utilize identical UMIs as an additional criteria in the deduplication process to remove PCR duplicates (along with the usual criterion of removing reads mapping to the same start and end coordinates). Do you have any suggestions of how I can get HiCUP_dedup to do that? Or is there a package I can use in tandem with dedup to account for UMIs?

Best,
Atreyo

Truncation at very start of sequence.

Hi,

Hope you're well. I just noticed this and wasn't sure if this was expected behaviour or not. It appears that when a ligation sequence appears at the very start of a sequence it is truncated to it's first letter. But when it is in the middle of the sequence it is truncated up to the cut site. I don't imagine this will have any impact since such short reads probably wouldn't make it past the mapping stage but thought I'd post it anyway.

Hopefully the toy example explains what I mean. Each sequence in R1 contains a ligation sequence GATCGATC, but after truncation one ends in G, the other in GATC.

Thanks,
Stephen

index error

Hi,
I made configuration file, and run HiCUP.
Then the error massage is always as follows.
HiCUP-0.8.3/hicup --config HiCUP-0.8.3/config_files/hicup_example.conf
Starting HiCUP pipeline (v0.8.3)
PLEASE NOTE: FROM VERSION 8, HICUP REQUIRES THE R PACKAGES TIDYVERSE AND PLOTLY INSTALLED
SEE DOCUMENTATION FOR MORE DETAILS
bowtie2 index file '/Users/takato/mm10.1.bt2' does not exist
bowtie2 index file '/Users/takato/mm10.2.bt2' does not exist
bowtie2 index file '/Users/takato/mm10.3.bt2' does not exist
bowtie2 index file '/Users/takato/mm10.4.bt2' does not exist
bowtie2 index file '/Users/takato/mm10.rev.1.bt2' does not exist
bowtie2 index file '/Users/takato/mm10.rev.2.bt2' does not exist
Detecting R automatically
Found R at '/opt/homebrew/bin/R'
Please change configuration file and/or command-line parameters and/or installation accordingly

Do you know how to solve problem?

HiCUP error after pairing - Error in read.table/no lines available in input/Could not produced hicup_mapper summary/fail to read the header from "pair.bam"

Dear all,

I have samtools and bowtie2 up-dated and installed the R packages required (tidyverse / plotly) as well.
I don't know from where the header error could came from I have never encountered such an issue since the .conf file used it is has a very standardised form and files have been read correctly from Hicup Truncater:

Starting HiCUP pipeline (v0.8.3) PLEASE NOTE: FROM VERSION 8, HICUP REQUIRES THE R PACKAGES TIDYVERSE AND PLOTLY INSTALLED SEE DOCUMENTATION FOR MORE DETAILS Detecting R automatically Found R at '/data/users/mzoia/anaconda3/envs/htseq_pipelines/bin/R' Reading genome digest file '/data/projects/p616_Cis-regulatory_landscapes_in_heart_development/htseq_pipelines/HI-C/mm10_Hand2_Hi-C_GuillaumePIP/mm10_Hand2_HiC/Digest_mm10_DpnII_None_13-21-44_15-03-2022.txt' to determine Hi-C restriction enzyme Truncating with HiCUP Truncater v0.8.3 Truncating sequences at occurrence of sequences '[GATC]' Truncating sequences Truncating s_2_1_Hand2FL.fastq.gz Truncating s_2_2_Hand2FL.fastq.gz Truncating s_1_1_Hand2MD.fastq.gz Truncating s_1_2_Hand2MD.fastq.gz Truncating s_3_2_Hand2HT.fastq.gz Truncating s_3_1_Hand2HT.fastq.gz Truncating complete Mapping with HiCUP Mapper v0.8.3 Using aligner 'bowtie2' Mapping complete Pairing files with HiCUP Mapper v0.8.3 Pairing s_1_1_Hand2MD.map.sam and s_1_2_Hand2MD.map.sam Pairing s_2_1_Hand2FL.map.sam and s_2_2_Hand2FL.map.sam Pairing s_3_1_Hand2HT.map.sam and s_3_2_Hand2HT.map.sam s_2_1_Hand2FL.trunc.fastq.gz does not exits in summary results hash. s_1_1_Hand2MD.trunc.fastq.gz does not exits in summary results hash. s_3_1_Hand2HT.trunc.fastq.gz does not exits in summary results hash. Pairing complete Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Calls: read.delim -> read.table Execution halted Could not produce hicup_mapper summary bar chart: /data/users/mzoia/anaconda3/envs/htseq_pipelines/bin/Rscript /data/projects/p616_Cis-regulatory_landscapes_in_heart_development/htseq_pipelines/HI-C/Softwares_HI-C_Guillaume_pipeline/HICUP_v0.8.3/r_scripts/hicup_mapper_summary.r ./ ./hicup_mapper_summary_shHjZdRlkJ_10-02-14_25-04-2022.txt: Operation not permitted at /data/projects/p616_Cis-regulatory_landscapes_in_heart_development/htseq_pipelines/HI-C/Softwares_HI-C_Guillaume_pipeline/HICUP_v0.8.3/hicup_mapper line 240. [main_samview] fail to read the header from "s_3_1_2_Hand2HT.pair.bam". s_3_1_2_Hand2HT.pair.bam contains no data [main_samview] fail to read the header from "s_2_1_2_Hand2FL.pair.bam". s_2_1_2_Hand2FL.pair.bam contains no data [main_samview] fail to read the header from "s_1_1_2_Hand2MD.pair.bam". s_1_1_2_Hand2MD.pair.bam contains no data All the files in the HiCUP pipeline have been removed for containing no data. slurmstepd: error: Detected 1 oom-kill event(s) in StepId=7886599.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. srun: error: binfservas11: task 0: Out Of Memory

Best regards and thank you for your help,

cut site

This is more of a question than an issue!
I'm using HiCUP in an NG3C experiment that uses dpnii but the cut sites are not biotinylated. I believe HiCUP assumes the ends are biotinylated and I could not find a place in config file to specify that compare to some other softwares. Would it affect my results or how I should proceed?
I appreciate any suggestions.

Potential Bowtie2 error?

recently i used hicup mapped my hic data to genome. i get a series Perl errors/warnings when it running bowtie2 align. but program not terminated and bam file still be created. so i wonder why this warnings messages occured, and whether caused some problem in bam files.

here a message in STDERR:

Starting HiCUP pipeline (v0.8.1)                                                                                                                                           
PLEASE NOTE: FROM VERSION 8, HICUP REQUIRES THE R PACKAGES TIDYVERSE AND PLOTLY INSTALLED                                                                                  
SEE DOCUMENTATION FOR MORE DETAILS                                                                                                                                         
Detecting R automatically                                                                                                                                                  
Found R at '/usr/bin/R'                                                                                                                                                    
Reading genome digest file '/data/01/user112/project/project/11.Hic/4.pbc-hic-polyploid/2.hicup/Digest_unspecified_genome_DpnII_None_17-13-20_12-07-2021.txt' to deter$
ine Hi-C restriction enzyme                                                                                                                                                
Truncating with HiCUP Truncater v0.8.1                                                                                                                                     
Truncating sequences at occurrence of sequences '[GATCGATC]'
Truncating sequences                                                                                                                                                       
Truncating /data/01/user112/project/project/11.Hic/4.pbc-hic-polyploid/0.hic_reads/ye_fastp_R1.fq.gz                                                                  
Truncating /data/01/user112/project/project/11.Hic/4.pbc-hic-polyploid/0.hic_reads/ye_fastp_R2.fq.gz                                                                  
Truncating complete
Mapping with HiCUP Mapper v0.8.1
Using aligner 'bowtie2'
ye_fastp_R1.trunc.fastq.gz Aligner error: Use of uninitialized value $bt2_args[7] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 li$
e 423.
ye_fastp_R2.trunc.fastq.gz Aligner error: Use of uninitialized value $bt2_args[7] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 li$
e 423.
Use of uninitialized value $bt2_args[8] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 423.                                   
Use of uninitialized value in exists at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 81.                                                         
Use of uninitialized value in exists at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 81.                                                         
Use of uninitialized value $bt2_args[7] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 459.                                   
Use of uninitialized value $bt2_args[8] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 459.                                   
Mapping ye_fastp_R1.trunc.fastq.gz
Use of uninitialized value $bt2_args[8] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 423.                                   
Use of uninitialized value in exists at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 81.                                                         
Use of uninitialized value in exists at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 81.                                                         
Use of uninitialized value $bt2_args[7] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 459.                                   
Use of uninitialized value $bt2_args[8] in join or string at /data/00/user/user112/software/bowtie2-2.4.1-linux-x86_64/bowtie2 line 459.                                   
Mapping ye_fastp_R2.trunc.fastq.gz
Mapping complete

My config file have been upload in appendix,please check it.
I will really appreciate it if you can give us some advice and some pieces of guidance

all bests,
Nemo Wu
hicup_example.conf.txt

Error in pairing while running hicup_mapper

Hi,
I am trying to run the hicup pipeline on some test data. I correctly truncated reads via ´hicup_truncater´, generating the files test_dataset1.trunc.fastq and test_dataset2.trunc.fastq (I am assuming that they refer to the forward and reverse readset, respectively). However, I'm getting an error when I run hicup_mapper

hicup_mapper --bowtie2 $BOWTIE2 --index $BWT_IDX --format Illumina_1.5 --threads $NT test_dataset1.trunc.fastq test_dataset2.trunc.fastq

Detecting R automatically
Found R at '/srv/ngsdata/dalteriog/Tools/miniconda3/bin/R'
Mapping with HiCUP Mapper v0.8.3
Using aligner 'bowtie2'
Mapping test_dataset1.trunc.fastq
Mapping test_dataset2.trunc.fastq
Mapping complete
Pairing files with HiCUP Mapper v0.8.3
Pairing test_dataset1.map.sam and test_dataset2.map.sam
Can't read './test_dataset1.map.sam' : No such file or directory at /srv/ngsdata/dalteriog/Tools/miniconda3/bin/hicup_mapper line 552.
Pairing complete
Could not delete './.test_dataset1.map.sam'
Could not delete './.test_dataset2.map.sam'
During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: read.delim -> read.table
Execution halted
Could not produce hicup_mapper summary bar chart: /srv/ngsdata/dalteriog/Tools/miniconda3/bin/Rscript /srv/ngs/analysis/dalteriog/Tools/miniconda3/share/hicup-0.8.3-0/r_scripts/hicup_mapper_summary.r ./ ./hicup_mapper_summary_uEOmJNsSPk_15-55-06_29-03-2022.txt: No such file or directory at /srv/ngsdata/dalteriog/Tools/miniconda3/bin/hicup_mapper line 240.

Here follows my variables explanation:
$BOWTIE2 --> path to bowtie2 executable
$BWT_IDX --> path to bowtie2 index (without trailing .X.bt2)
$NT --> 8

I cannot develop in perl, but I tried to take a look at the script in the line 552:

cat /srv/ngsdata/dalteriog/Tools/miniconda3/bin/hicup_mapper | awk 'NR==552{print}'
open( FORWARD, $fileForward ) or die "Can't read '$fileForward' : $!";

I am assuming that the problem is in the pairing of the produces mapping files, probably because they are not produced at all.
Is there something I am forgetting?

SAM flags could be improved?

From Helen Ray-Jones:

We figured out what is going wrong with WASP. The problem is that the re-mapping step sometimes swaps the designation for read 1 and read 2, compared with the initial mapping.

Example:

In “to_remap.bam”:

A00551:155:HCYCKDSXY:3:1158:21097:31516 147 9 133332642 42 150M = 133348035 0 GCCTTCCTGGCCTTCTCTTTCGCCCACAGCTCCTTTCGCTTCCTCTTCTTCCGGTCCCGTTCCTGCTTTCTCCGCCGCCTTTTCTCCAAGGCGGCAGGGGACAGCTCCTTGGCACTGCCCTGGGGGAAAGAGGCACCCACTCATTAAAGT FFF:FFFFFFF:FFF,F:FFFFFFFFFFFFFF,F,FFFFFFFF:FFFFFFFFFF,FFFF:FFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:25G124 YT:Z:UU CT:Z:FAR
A00551:155:HCYCKDSXY:3:1158:21097:31516 99 9 133348035 42 146M = 133332642 0 CCGCCGCAGTCTCTCTTCCCCGCCGCGCCGCGGTCCGAAAACCTAGTCAGCCGCCGCAGCCTCTCAGCCCCGCCTCGATTTTTAGCTTTATAGGAATGCTGTTGCTTTAAATCCGAAATCCCGTGCCGGTATCAACTCTCGCGATC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FF:FFFFFFFFFFF AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:65G80 YT:Z:UU CT:Z:FAR

The first read at position ending 642 has FLAG 147 (“second in pair”) and the second read at position ending 035 has FLAG 99 (“first in pair”).

Then in “remapped.bam”:

A00551:155:HCYCKDSXY:3:1158:21097:31516.133332642-133348035.1.1 99 9 133332642 42 150M = 133348035 0 GCCTTCCTGGCCTTCTCTTTCGCCCGCAGCTCCTTTCGCTTCCTCTTCTTCCGGTCCCGTTCCTGCTTTCTCCGCCGCCTTTTCTCCAAGGCGGCAGGGGACAGCTCCTTGGCACTGCCCTGGGGGAAAGAGGCACCCACTCATTAAAGT FFF:FFFFFFF:FFF,F:FFFFFFFFFFFFFF,F,FFFFFFFF:FFFFFFFFFF,FFFF:FFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YT:Z:UU
A00551:155:HCYCKDSXY:3:1158:21097:31516.133332642-133348035.1.1 147 9 133348035 42 146M = 133332642 0 CCGCCGCAGTCTCTCTTCCCCGCCGCGCCGCGGTCCGAAAACCTAGTCAGCCGCCGCAGCCTCTCAGCCCCGCCTCGATTTTTAGCTTTATAGGAATGCTGTTGCTTTAAATCCGAAATCCCGTGCCGGTATCAACTCTCGCGATC FFFFFFFFFFF:FF,FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:65G80 YT:Z:UU

The first read at position ending 642 now has FLAG 99 (“first in pair”) and the second read at position ending 035 has FLAG 147 (“second in pair”).

This is fine but means that WASP thinks that the cigars do not match up after remapping (even though they do in reality). So we can keep these reads that WASP is discarding.

Tutorial test data not pairing with HiCUP Mapper after bowtie2 alignment

hicup_example_conf.txt

HiCUP_error_std_output.txt

I am trying to run the HiCUP 0.8.2 pipeline on the Yale University cluster, where I have installed all dependencies (R, SAMtools, bowtie2 etc) as modules.

There is a problem when I run the test data on HiCUP. After the pairing complete step, it showed that

"Can't read '/home/ap2549/project/capture_analysis_trial/HiCUP/output/test_dataset1.map.sam' : No such file or directory at /gpfs/ycga/project/noonan/ap2549/capture_analysis_trial/HiCUP/HiCUP-0.8.2/hicup_mapper line 552."

Would you please help me to solve this? I am also attaching my conf file and the command line std output for your perusal.

Best,
Atreyo

Multi-thread option for Hicup filter and deduplicator.

Hi @StevenWingett @rdali
I'm running HiCUP pipeline on deep sequences paired end fastq data, Hicup provides the threads option to run the pipeline on multiple cores but based on CPU usage log, I can see Hicup filter and deduplicator steps are not using all cores provided.
I'm having config file were all parameter are specified to run HICUP pipeline, Can you guys please let me how I can enable threads option for hicup filter and deduplicator steps.

hicup_reporter can't run if it detects other previous reports

Hello!

I've just updated my version of HiCUP to the latest (0.8.2) and run it as usual for a new sample. In my directory structure, all the HiCUP output files are stored in the same folder for several samples and it wasn't a problem to generate the reports with the previous method (HTML); but now all is fine in the HiCUP pipeline except the hicup_reporter step because it detects that I have previous reports and won't continue with the new one.

Let me explain my case:
Several months ago I run the pipeline (0.7.2) for 2 samples, and all was fine. Now, I've updated the version (0.8.2) and run it with a new sample and when it reaches the hicup_reporter gives me the following error:

Creating combined HiCUP summary report
Reading folder '/my/output/path/
Writing to: /my/output/path/
	3 complete summary results groups identified in folder
Output file '/my/output/path/HiCUP_summary_report_nELHKeaSsU_11-11-57_17-01-2021.txt' already exists
Output file '/my/output/path/HiCUP_summary_report_JAqzREEjBU_05-48-27_18-06-2021.txt' already exists
Please adjust configuration.

It's ok that it doesn't repeat the previously done reports, but it would be great that it did the one that is missing, that is the one for the sample currently processing. Because otherwise, the pipeline couldn't finish and when running on a cluster the error means killing the job.

I hope there is an easy workaround to solve this.

Thanks.

Create a hicup_reporter script

Write a hicup_reporter script that generates results separately from the hicup script.

Update documentation

Add to DockerHub / Anaconda

(If added to DockerHub, will be able to automatically generate singularity images)

HiCUP report di-tag size plot

Add calculated lengths to X-axis of size plot. Also set the starting zoom to something sensible.

Truncation of 3' Partial Ligation Sequences

Hi,

During the HICUP Truncation step, partial ligation sequences may occur at the 3' end of reads and would escape truncation. In such cases would it be better to truncate partial sequences if they occur at the 3' end.

Thanks,
Stephen

link for HiCUP v0.5.3

I want to repeat the results of a publication. But I failed to find the download link for HiCUP v0.5.3(the paper used). Where I can find the download link

R error?

Hi,
I do not know the mean of follows message ...
Do you know what happened?

base) [t-goto@gwB1 ~]$ HiCUP-0.8.3/hicup --config HiCUP-0.8.3/config_files/hicup_example.conf
Starting HiCUP pipeline (v0.8.3)
PLEASE NOTE: FROM VERSION 8, HICUP REQUIRES THE R PACKAGES TIDYVERSE AND PLOTLY INSTALLED
SEE DOCUMENTATION FOR MORE DETAILS
Detecting R automatically
Found R at '/home/t-goto/local/bin/R'
Reading genome digest file 'Digest_mm10_re1_unspecified_None_11-50-59_24-01-2023.txt.gz' to determine Hi-C restriction enzyme
R not found at '/home/t-goto/local/bin/R'
Detecting R automatically
Found R at '/home/t-goto/local/bin/R'
Truncating with HiCUP Truncater v0.8.3
Truncating sequences at occurrence of sequences '[AAGCTAGCTT]'
Truncating sequences
Truncating Nature_neuro/SRR12066917_1.fastq.gz
Truncating Nature_neuro/SRR12066917_2.fastq.gz

/home/t-goto/local/lib64/R/bin/exec/R: error while loading shared libraries: libblas.so.3: cannot open shared object file: No such file or directory
Could not produce hicup_truncater summary bar chart: /home/t-goto/local/bin/Rscript /lustre7/home/t-goto/HiCUP-0.8.3/r_scripts/hicup_truncater_summary.r Nature_neuro/results/ Nature_neuro/results/hicup_truncater_summary_MvZOrHJiIH_19-45-44_01-02-2023.txt: at /lustre7/home/t-goto/HiCUP-0.8.3/hicup_truncater line 281.
Truncating complete
R not found at '/home/t-goto/local/bin/R'
Detecting R automatically
Found R at '/home/t-goto/local/bin/R'
Mapping with HiCUP Mapper v0.8.3
Using aligner 'bowtie2'
Mapping Nature_neuro/results/SRR12066917_2.trunc.fastq.gz
Mapping Nature_neuro/results/SRR12066917_1.trunc.fastq.gz

Mapping complete
Pairing files with HiCUP Mapper v0.8.3
Pairing SRR12066917_1.map.sam and SRR12066917_2.map.sam
Nature_neuro/results/SRR12066917_1.trunc.fastq.gz does not exits in summary results hash.
Pairing complete
/home/t-goto/local/lib64/R/bin/exec/R: error while loading shared libraries: libblas.so.3: cannot open shared object file: No such file or directory
Could not produce hicup_mapper summary bar chart: /home/t-goto/local/bin/Rscript /lustre7/home/t-goto/HiCUP-0.8.3/r_scripts/hicup_mapper_summary.r Nature_neuro/results/ Nature_neuro/results/hicup_mapper_summary_MvZOrHJiIH_19-45-44_01-02-2023.txt: Inappropriate ioctl for device at /lustre7/home/t-goto/HiCUP-0.8.3/hicup_mapper line 240.
Nature_neuro/results/SRR12066917_1_2.pair.bam contains no data
All the files in the HiCUP pipeline have been removed for containing no data.

Processing "multi"-tag reads

The idealised view of a di-tag is not always correct, for a paired end-read may comprise components from not just two, but rather multiple regions of a genome. We have made steps to address this in a HelpDesk job which involved writing a “pre-HiCUP” script that cuts at DpnII HiC junctions in FASTQ reads. The script then takes these segments and generates all the segment-segment permutations into 2 new FASTQ files. (See http://www.bioinformatics.babraham.ac.uk/cgi-bin/helpdeskuser.cgi?action=show_job&public_id=divefeet)

We shall now expand on this, as described in the attached image from Mikhail.

Add Scribler to the HiCUP Misc folder

Assists with referencing for people using Scribler.

hicup2homer

Check hicup2homer assigning cut position to correct end

Double digest problem

From: 温宇豪 [email protected]
Sent: 15 February 2019 03:08
To: Steven Wingett [email protected]
Subject: A problem about HiCUP

Dear Wingett,

Hi,I'm a researcher from China and meet a problem using HiCUP to analyse my data.
I have processed a two-enzyme-digest HiC experiment and use the tool hicup_digester to create the digested reference genome file with the parameter --re1 ^GATC,DpnII:AT^TAAT,AseI .
Then I passed it to the configuration file and run HiCUP but it reported wrong information as "The restriction site (re1) needs to be a valid DNA sequence". Could you help mr to solve it?

Thanks for your time and your great package!

Yuhao WEN
Institut Pasteur of Shanghai,
Chinese Academy of Sciences
Xuhui, Shanghai 200031, P.R.China

Why not use BWA mem as a mapper?

Hello!

I was wondering what the reasoning was for limiting the HiCUP pipeline to only using bowtie. In my (limited) testing I've noticed bwa seems to give many more alignments than bowtie, at least for capture HiC data. I'd like to utilize the scripts in HiCUP with an alignment produced by BWA mem, but it seems there are too many incompatabilities between the formatting produced by hicup_mapper and bwa mem for processing.

Thanks in advance!

Using hicup_deduplicator more than once

Sometimes it is necessary to use hicup_deduplicator more than once (for example when combining then de-duplicating technical replicates). This leaves identical tags in the SAM/BAM file and this appears not to adhere to this format (i.e. it is rejected when importing into SeqMonk)

Find a solution to this.

HiCUP output if Tidyverse, Plotly and/or Pandoc not installed

HiCUP does not give nice output if HTML summary reports are not generated owing to lack of Tidyverse, Plotly and/or Pandoc. Make this more elegant.

How to merge data from multiple lanes or bological replicates

Hi,

What's the recommended way to merge data from biological replicates or from a single experiment having multiple lanes? If I place the paired files on adjacent lines, HiCUP generates one output BAM/SAM file against a file pair. What if I have a single experiment having multiple lanes, what's the recommended way to merge them ?

Thanks,

Hammad

Integrate the capture script into main pipeline

Integrate capture script into main pipeline and print the results in the HiCUP summary report.

DNase compatibility

Make HiCUP compatible with DNAse protocols.

Write out BAM file of muti-mapping reads for Bori for next update

Hi Bori,

HiCUP won’t count them directly, but they can be counted/filtered using the FASTQ IDs in the resulting BAM file. I’ve made a note in the Git repository to add this feature.

Cheers,

Steven

CIGAR string

Check HiCUP reads CIGAR string in assingning positions

Truncation adds Ns to sequece

When truncating using multiple sequences (e.g. --re1 AGCT^AN), the N is written to the truncated sequence read in the FASTQ file.

Add R and Y in digester script

At the moment there is no provision in the digester script for sites that have “R”(A/G) or “Y” (C/T) etc.
e.g. ApoI (R^AATTY) digests.

From: A.M.A. Imam [email protected]

create_baitmap_rmap.pl

Hi,
I am so sorry but I want to ask about CHiCAGO after HiCUP analyzing.
I want to make rmap and baitmap as follows command by chicagotools;
create_baitmap_rmap.pl
and then,
Specify i) digest file and ii) oligos file

Where can I get ii) oligos file ??

Add documentation to pipeline combinations branch in separate repo

Bowtie warning message

Warning: gzbuffer added in zlib v1.2.3.5. Unable to change buffer size from default of 8192.

Stop this message from Bowtie being displayed as it is confusing users

Bowtie2 Failing?

I suspect sometimes that Bowtie2 fails (e.g. exceeds RAM), but this is not caught by HiCUP. HiCUP proceeds and then generates error messages which appear to be a problem with HiCUP (see attached).

Check this and write code for HiCUP to fail elegantly, if necessary.

HiCUP Deduplicator Headers

The HiCUP De-duplicator script is not writing out a full header, for example
@pg ID:HiCUP Deduplicator VN:0.7.2

But instead, is only writing out:
@pg VN:0.7.2

Arima cut sites

Check the Arima cut sites exactly match HiCUP digester results

Option --nofill appears to no longer work

Recently I download latest version (0.7.2) and found problem with hiccup_digest. Firstly I run the following command:
./hicup_digester -z --re1 ^GATC,DpnII --genome hg19 ../DATA/hg19/chr*.fa
program return that Digest completed
but when I run the following:
./hicup --nofill --config hicup_example.conf.txt
Program return error:
Starting HiCUP pipeline (v0.7.2)
Detecting R automatically
Found R at '/usr/local/bin/R'
Reading genome digest file 'Digest_unspecified_genome_DpnII_None_16-42-52_18-01-2019.txt.gz' to determine Hi-C restriction enzyme
'nofill' option is not supported for multiple restriction enzyme digestion...
Can't run hicup_truncater
. at ./hicup line 324.

Hicup capture script not working

Hello,

I am working through the tutorial in the Nature Protocols paper "Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools" (https://www.nature.com/articles/s41596-021-00567-5). I have downloaded their downsampled test data (the MyLa cell line) from https://osf.io/kt67f/ and have run the HiCUP pipeline successfully. However, when I run the hicup_capture step

perl Misc/hicup_capture --baits designDir/HindIII_GWAS_baits.txt MyLa_rep1_CHiC_DS20M_R1_2.filt.bam

I get an empty object for the captured BAM file. The capture_summary.txt is attached below. Do you know why this is happening? I am using the same bait file as in the paper (from https://osf.io/sx7fu/), and am also attaching it below. I have also aligned everything to hg19 coordinates (as in the paper).

capture_summary.txt
HindIII_GWAS_baits.txt

Display percentages in HTML report

The new HTML HiCUP summary report does not display percentages in the Mapping/Truncation graph. Add these.

di-tag parameter

I have tried different parameters of Maxima and Minima di-tag length. And about 20% loops are not shared with different parameters. If we doesn't know the selection fragment size, how to suitable parameters of longest and shortest di-tag length?