Coder Social home page Coder Social logo

ay-lab / fithic Goto Github PK

View Code? Open in Web Editor NEW
77.0 8.0 17.0 53 MB

Fit-Hi-C is a tool for assigning statistical confidence estimates to chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.

License: MIT License

Python 91.92% Shell 8.08%

fithic's Introduction

FitHiC and FitHiC2

install with bioconda Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

Fit-Hi-C (or FitHiC) was initially developed by Ferhat Ay, Timothy Bailey, and William Noble January 19th, 2014. It is currently maintained and updated by Ferhat Ay (ferhatay@lji.org) and Arya Kaul (akaul@lji.org) at the Ay Lab in the La Jolla Institute for Allergy and Immunology.

The current version is named as FitHiC2 (or FitHiC 2.0) due to the addition of many new features compared to FitHiC, like:

  1. finding inter-chromosomal significant interactions,

  2. applying a merging filter algorithm to filter out putative bystander interactions and keep only the direct CIS chromosomal interactions,

  3. Reporting the expected contact count between interacting pairs of bins, along with the raw (observed) contact count, to assess the enrichment of observed from the expected contact count.

Please use the Google Group for discussions/bug reports/analysis questions. Sending an email to fithic@googlegroups.com will also post directly to the Group.

Installation

Fit-Hi-C may be installed through one of three ways.

  1. Bioconda
  2. Github
  3. Pip

Out of all of the following, we recommend installing through bioconda to automatically install all dependencies.

Bioconda Installation

If this is your first time using the conda distribution system, we recommend using Miniconda as your preferred conda distribution system. This is chosen because it is the most lightweight out of all of Anaconda's distribution system; however, you're welcome to use any one you would like. More information on each may be found here.

Once your conda distribution platform has been set up, you need to add the bioconda channel to access the bioinformatics recipes hosted there. If you're unfamiliar with bioconda, I highly recommend you check out the wonderful work they've done ( link here!). To set up the bioconda channel, run the following:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

Afterwards, simply run:

conda install fithic

This command will automatically install a command-line executable version of Fit-Hi-C along with all of its dependencies. Please run:

fithic -V

and ensure that the version number matches the version number in the Anaconda Cloud badge at the top of this README.

Github Installation

Install git and then run:

git clone https://github.com/ay-lab/fithic.git

You will need the following dependencies installed to run Fit-Hi-C:

  • Python 3.+
  • Numpy 1.14.+
  • Scipy 1.1.+
  • Scikit-learn 0.19.+
  • SortedContainers 2.0.+
  • Matplotlib 2.2.+

This will create a direct clone of Fit-Hi-C in your working directory with the name fithic. You may now run Fit-Hi-Cv.2 by calling the fithic.py file in the fithic/ directory:

python fithic.py --ARGUMENTS

Cloning into the repository does not automatically install the command line version of Fit-Hi-C, and if you desire that functionality follow the other installation instructions.

PyPi Installation

Ensure that you have Python3 successfully installed on your computer. Then run:

pip install fithic

After this is done, run fithic --help to ensure all necessary dependencies have been installed. Some users report that this automatically installs all dependencies, while some say it does not. If fithic does not work properly, install the necessary dependencies using pip. i.e.

pip install DEPENDENCY

Testing

A good part of any software installation is being able to run tests on the correct installation of it.

Bioconda/PyPi Installation

Run the command:

svn export https://github.com/ay-lab/fithic/trunk/fithic/tests/

(If you receive an error, ensure that you have svn installed correctly.) This command will generate a tests folder in your working directory. Going into that and running ./run_tests-cli.sh will automatically run Fit-Hi-C on a variety of data and if everything was installed correctly you should see a final message that Fit-Hi-C executed correctly!

Github Installation

Simply navigate into the repo and run ./fithic/tests/run_tests-git.sh. If everything is working fine, you will see a final message that Fit-Hi-C executed correctly!

Using Fit-Hi-C

Congratulations! If you have gotten to this point, then you have a working, fully installed version of Fit-Hi-C running on your computer. Good on you! But that was the easy part, now comes the difficult question.

How do I correctly use Fit-Hi-C to analyze my desired Hi-C dataset?

Correctly answering this question requires navigating through several basic understandings:

  1. What exactly is Hi-C data?
  2. What does Fit-Hi-C tell me about this Hi-C data?
  3. Why is what Fit-Hi-C tells me important?

If you feel utterly comfortable with the answers to these three questions, then feel free to skip to the next section. If you are unclear about the answer to any of the above, then read on!

What is Hi-C data?

While a beautiful, Latex type-set, easy-to-understand, and comprehensive document is being created, read this !

What does Fit-Hi-C tell me about this Hi-C data?

At the beginning of this README, I stated:

Fit-Hi-C is a tool for assigning statistical confidence estimates to chromosomal contact maps produced by genome architecture assays.

The phrase 'chromosomal contact maps produced by genome architecture assays' may be faithfully reduced to 'Hi-C data.' Applying that change yields:

Fit-Hi-C is a tool for assigning statistical confidence estimates to Hi-C data.

Much less scary! From the above sentence, the only real phrase that could be misinterpreted is 'assigning statistical confidence estimates.' What does that mean? Well to find out you should read the paper Dr. Ay wrote (found here)!

Why is what Fit-Hi-C tells me important?

Fit-Hi-C tells you what contacts are significant. This is incredibly important because not all of the contacts seen in your Hi-C data are truly unexpected interactions. By assigning statistical confidence to each interaction, you will be able to determine which interactions are the most important and consequently, which ones warrant further investigation.

Running Fit-Hi-C

Arguments


Required Arguments

-f, --fragments

The -f argument is used to pass in a full path to what we deem a 'fragments file,' Each line will have 5 entries. The second and fifth fields can be any integer as they are not needed in most cases. The first field is the chromosome name or number, the third field is the coordinate of the midpoint of the fragment on that chromosome, the fourth field is the total number of observed mid-range reads (contact counts) that involve the specified fragment. The fields can be separated by space or tab. All possible fragments need to be listed in this file.

One example file would look like below (excluding the header which is not a part of input):

chr extraField fragmentMid marginalizedContactCount mappable? (0/1)
1 0 15000 234 1
1 0 25000 0 0
... ... ... ... ...

Note: the file should be gzipped before providing as an input parameter

-i, --interactions

The interactions file contains a list of mid-range contacts between the fragments/windows/meta-fragments listed in the first file above. Each fragment will be represented by its chromosome and midpoint coordinate. Each line will have 5 fields. The first two will represent first fragment, the following two will represent the second and the fifth field will correspond to number of contacts between these two fragments. The fields can be separated by space or tab. Only the fragment pairs with non-zero contact counts are listed in this file.

One example file would look like below (excluding the header which is not a part of input):

chr1 fragmentMid1 chr2 fragmentMid2 contactCount
1 15000 1 35000 23
1 15000 1 55000 12
... ... ... ... ...

Note: the file should be gzipped before providing as an input parameter

-o, --outdir

A full path to an output directory of your choice. If it is not already created, it creates if for you.

-r, --resolution

Numerical value indicating resolution of fixed-size dataset being analyzed. If non-fixed size data being studied, set -r 0.


Optional Arguments

-t, --biases

Accepts - a fullpath to a bias file generated by ICE or Knight-Ruiz normalization for Fit-Hi-C with the following format:

chr midpoint bias
1 20000 1.061
... ... ...

Default - None

Description - Bias files help Fit-Hi-C accurately generate statistical significance estimates. If you have it, use it!

Note: the file should be gzipped before providing as an input parameter

-p, --passes

Accepts - Number of spline passes.

Default - 1

Description - Increasing it beyond 2 is unlikely to affect Fit-Hi-C's output significantly. If you don't understand what spline fit means then you have not read the paper!

-b, --noOfBins

Accepts - integer representing number of equal occupancy bins you would like Fit-Hi-C to bin your data with

Default - 100

Description - used for spline fitting

-m, --mappabilityThres

Accepts - integer representing the minimum number of hits per locus that has to exist to call it mappable

Default - 1

Description - Increasing it leads to more stringent requirements for treating an interaction as reasonable. Decreasing it leads to less stringent requirements. If you have extremely high resolution data, it may help to bump this up.

-l, --lib

Accepts - String representing prepending information for output files

Default - fithic

Description - Name of the library that is to be analyzed.

-U, --upperbound

Accepts - Integer representing upper bound for the intrachromosomal interactions to be considered in base pairs.

Default - -1 (no limit)

Description - Highly recommended to bound the intrachromosomal interactions being considered.

-L, --lowerbound

Accepts - Integer representing lower bound for the intrachromosomal interactions to be considered in base pairs.

Default - -1 (no limit)

Description - Highly recommended to bound the intrachromosomal interactions being considered.

-v, --visual

Accepts - no argument

Default - None (no plots)

Description - Use if plots of spline fitting are desired. Unfortunately, different matplotlib versions are unstable in different ways. If you're getting an error, I suggest trying to run Fit-Hi-C without this option and see if that helps.

-x, --chromosome_region

Accepts - 'interOnly', 'intraOnly', 'All'

Default - intraOnly

Description -

interOnly is used if you would only like to analyze interchromosomal interactions.

intraOnly is used if youd would only like to analyze intrachromosomal interactions.

All is used if you would like to analyze inter and intrachromosomal interactions.

While you may now be thinking, "Why would I ever not choose 'All'? More analysis is better!" It is not this simple. Since you are adding significantly more interactions when you analyze interchromosomal and intrachromosomal interactions in tandem, qvalues will be depressed across the board. In addition, few to no datasets are at a high enough resolution to find significanct interchromosomal interactions.

-bL, --biasLowerBound

Accepts - float value of lower bound for bias values

Default - 0.5

Description - bias values below this number will be discarded

-bU, --biasUpperBound

Accepts - float value of upper bound for bias values

Default - 2

Description - bias values above this number will be discarded


Other Arguments

-V, --version

Accepts - No arguments

Default - None

Description - Prints version number. Check to make sure this is the latest version based on version.log file here

-h, --help

Accepts - No arguments

Default - None

Description - prints help message with all options


Output

Each step of Fit-Hi-C, the number of which is user-defined through the -p flag, generates two output files. For step N and library name prefix denoted by ${PREFIX} the two output files will have the following names:

  1. ${PREFIX}.fithic_passN.txt
  2. ${PREFIX}.spline_passN.significances.txt.gz

The first file will report the results of equal occupancy binning in five fields. An example of which is shown below:

avgGenomicDist contactProb stdErr numLocusPairs CCtotal
20077 2.38e-05 2.11e-06 210 19574
20228 1.88e-05 1.44e-06 268 19662
... ... ... .. ...

The second file will have the exact same lines as in the input file that contains the list of mid-range contacts. This input file had 5 fields as described above. The output from each step will append the following columns to these fields:

  1. p-value: p-value of the corresponding interaction, as computed by the binomial distribution model employed in FitHiC.

  2. q-value: q-value or FDR obtained by applying Benjamini-Hochberg correction to the p-values.

  3. bias1: Bias value of the first interacting fragment.

  4. bias2: Bias value of the second interacting fragment.

  5. ExpCC: Expected contact count of the current interaction, computed using the raw contact count, spline fit probability of the raw contact count (with respect to the loop distance), and the given bias values. Enrichment of the raw (observed) contact count with respect to the expected contact count is reflected in the q-value.

chr1 fragmentMid1 chr2 fragmentMid2 contactCount p-value q-value bias1 bias2 ExpCC
1 15000 1 35000 23 1.000000e+00 1.000000e+00 1 1.2 22
1 15000 1 55000 12 2.544592e-02 1.202603e-01 1.1 0.9 4
... ... ... ... ... .. .. .. .. ..

Utilities

These utilities are provided as part of Fit-Hi-C (/fithic/utils/) to aid in certain common pre-processing/post-processing steps. They are as follows:

  • HiCKRy.py (Pre-processing. Generates --bias calculation)
  • HiCPro2FitHiC.py (Pre-processing. Generates --interactions, --fragments, and --bias inputs)
  • createFitHiCFragments-fixedsize.py (Pre-processing. Generates --fragments input)
  • createFitHiCFragments-nonfixedsize.sh (Pre-processing. Generates --fragments input)
  • validPairs2FitHiC-fixedSize.sh (Pre-processing. Generates --interactions input)
  • createFitHiCContacts-hic.sh (Pre-processing. Generates --interactions input from .hic output)
  • visualize-UCSC.sh (Post-processing. Visualizes Fit-Hi-C interactions on the UCSC Genome Browser)
  • createFitHiCHTMLout.sh (Post-processing. Generates HTML page describing Fit-Hi-C run)
  • merge-filter.sh (Post-processing. Filters Fit-Hi-C interactions and merges nearby ones using FANCY GRAAAAAAAAAAAPH magic)
  • merge-filter-parallelized.sh (Post-processing. Filters Fit-Hi-C interactions and merges nearby ones using FANCY GRAAAAAAAAAAAPH magic + parallelizes per chr)

HiCKRy

Regardless of the implementation, we strongly recommend the use of a normalization method in order to have meaningful results for further analysis. The only way for Fit-Hi-C to utilize data from Hi-C normalization is through the bias files. As long as the bias value are scaled to have an average of 1 and high values represent loci with higher overall raw counts, Fit-Hi-C will be able to use them in significance assignment.

HiCKRy is an in-house version of Hi-C contact map normalization using the Knight-Ruiz algorithm for fast matrix balancing. It takes three arguments:

-i,--interactions       Path to the interactions file to generate bias values. Required.
-f, --fragments            Path to the interactions file to generate bias values. Required.
-o, --output            Full path to output the generated bias file to. Required.
-x, --percentOfSparseToRemove     Percent of sparse low contact count loci to remove. The default value is 0.05.

It then outputs a bias file in the format of Fit-Hi-C's -t input option.

HiCPro2FitHiC

HiC-Pro is a common Hi-C mapping tool used to extract information from the raw reads after the Hi-C assay is run. The following script enables the generation of Fit-Hi-C input directly from HiC-Pro's output.

It takes the following arguments:

-i MATRIX, --matrix MATRIX     Input matrix file with raw contact frequencies. Required.
-b BED, --bed BED     BED file with bins coordinates. Required.
-s BIAS, --bias BIAS     The bias file provided after IC normalization.
-o OUTPUT, --output OUTPUT     Output path.
-r RESOLUTION, --resolution RESOLUTION     Resolution of the matrix.

The output is the contact maps and fragments file in the format of Fit-Hi-C.

createFitHiCFragments-fixedsize

Generates the fragments file if using a fixed-size resolution with your Hi-C data.

The script takes the following arguments:

--chrLens         Path to a file describing chromosome lengths of the model organism. Required.
--resolution      Resolution of dataset being studied. Required.
--outFile         Full path to the output file desired.

Output is a fragments file in the format of Fit-Hi-C.

createFitHiCFragments-nonfixedsize

A bash script to generate the fragments file if the Hi-C data is RE-digested. Note, order of arguments is critical.

bash createFitHiCFragments-nonfixedsize.sh [outputFile] [RE] [fastaReferenceGenome]
        
[outputFile]               A desired output file path. Required.
[RE]                       Either the name of the restriction enzyme used, or the cutting position using “^”. For example, A^AGCTT for HindIII. Required.
[fastaReferenceGenome]     A reference genome in fasta format. Required.

validPairs2FitHiC-fixedSize

A bash script to generate the contact maps input for Fit-Hi-C from a valid pairs file. Note, order of arguments is critical.

bash validPairs2FitHiC-fixedSize.sh [resolution] [libraryName] [validPairsFile]        

[resolution]         The resolution of the dataset being studied. Required.
[libraryName]        The prefix of the file generated. Required.
[validPairsFile]     A textfile containing the validPairs. Required.

visualize-UCSC.sh

A bash script to convert Fit-Hi-C output into visualization input for UCSC's Genome Browser in 'interact' format.

DESCRIPTION:
bash visualize-UCSC.sh [inputFile] [outputFile] [QvalThresh]
        
         [inputFile]                Input Fit-Hi-C file to visualize 
         [outputFile]               Output file for UCSC to visualize 
         [QvalThresh]               Q-value threshold to filter Fit-Hi-C interactions at 

createFitHiCHTMLout

A bash script to generate an HTML report of the Fit-Hi-C run. Note, works best if Fit-Hi-C was run with --visual option.

bash createFitHiCHTMLout.sh [Library Name] [No. of passes] [Fit-Hi-C output folder]
        
[Library Name]            The library name (-l option) used during Fit-Hi-C’s run
[No. of passes]            The number of spline passes conducted by the Fit-Hi-C run
[Fit-Hi-C output folder]    Path to the output folder for that Fit-Hi-C run (-o option)

createFitHiCContacts-hic.sh

A bash script to create Fit-Hi-C contacts from .hic files.

bash createFitHiCContacts-hic.sh [Juicer's dump command] [chr1] [chr2] [Output file name]
        
[Juicer's dump command]    Full path to the output of Juicer's dump command
[chr1]                     Chromosome 1 of the argument used in Juicer's dump command
[chr2]                     Chromosome 2 of the argument used in Juicer's dump command
[Output file name]          Name of output file

merge-filter.sh

A bash script to merge nearby significant interactions and filter Fit-Hi-C output

bash merge-filter.sh [inputFile] [resolution] [outputDirectory] [fdr]

[inputFile]                Input file of Fit-Hi-C interactions
[resolution]               Resolution used
[outputFile]               Output file to dump output to
[fdr]                      False Discovery rate to use when subsetting interactions
[utilities]                Full path to utilities folder (folder where CombineNearbyInteraction.py is)

merge-filter-parallelized.sh

A bash script to parallelize merge nearby significant interactions and filter Fit-Hi-C output. Note - you will have to modify the actual script contents to assume your cluster config/ organism.

bash merge-filter-parallelized.sh [inputFile] [resolution] [outputDirectory] [fdr]

[inputFile]                Input file of Fit-Hi-C interactions
[resolution]               Resolution used
[outputDirectory]          Directory to dump output to
[fdr]                      False Discovery rate to use when subsetting interactions
[utilities]                Full path to utilities folder (folder where CombineNearbyInteraction.py is)

Citing Fit-Hi-C

If Fit-Hi-C was used in your analysis, please issue the following citations:

  1. Arya Kaul, Sourya Bhattacharyya & Ferhat Ay 2020. "Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2." Nature Protocols. 15:991-1012, 2020. doi: 10.1038/s41596-019-0273-0.

  2. Ferhat Ay, Timothy L. Bailey, William S. Noble. 2014. "Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts." Genome Research. 24(6):999-1011, 2014. doi: 10.1101/gr.160374.113.

License

Copyright (c), 2012, University of Washington

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

fithic's People

Contributors

ay-lab avatar gongyh avatar souryacs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fithic's Issues

Illegal fragment pair

Hello When I use the fithic ,I got a error :Illegal fragment pair, and the following is my input file
$ head fithic.interactionCounts
chr1 25000 chr1 25000 624.0
chr1 25000 chr1 75000 760.0
chr1 25000 chr1 125000 315.0
chr1 25000 chr1 175000 146.0
chr1 25000 chr1 225000 113.0
chr1 25000 chr1 275000 98.0
chr1 25000 chr1 325000 80.0
chr1 25000 chr1 375000 56.0
chr1 25000 chr1 425000 48.0
chr1 25000 chr1 475000 38.0

$ head fithic.fragmentMappability
chr1 0 25000 19005.0 1
chr1 50000 75000 28207.0 1
chr1 100000 125000 26814.0 1
chr1 150000 175000 18005.0 1
chr1 200000 225000 24461.0 1
chr1 250000 275000 23232.0 1

can you help me to solve this problem?
Thanks!
Best.

error while running fithic with data generated by hic-pro

I used the script in HiC-Pro to transform the 40Kb ICE-normalized Hi-C contact result matrix to a raw interaction count file and a bias file calculated by ICE, but got these error informations, could you please help me to fix this?

fithic -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -t fithic.biases.gz -o fithic_sample2 -l TU-2 -v -x intraOnly -r 40000

Reading the contact counts file to generate bins...
Interactions file read. Time took 47.469826459884644
Fragments file read. Time took 0.20023465156555176
Traceback (most recent call last):
File "/home/wg_xialin/.local/bin/fithic", line 8, in
sys.exit(main())
File "/home/wg_xialin/.local/lib/python3.8/site-packages/fithic/fithic.py", line 314, in main
biasDic = read_biases(biasFile)
File "/home/wg_xialin/.local/lib/python3.8/site-packages/fithic/fithic.py", line 671, in read_biases
chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range

Out of 70041 loci 70041 were discarded with biases not in range [0.5 2]

Hi,
Thank you for such a nice tool for Hi-C data processing!
I am processing DNase-HiC data. I want to apply Fithic2 to my data; I have prepared the input files from HiC-Pro.

For example:
-i
Screen Shot 2022-07-15 at 4 25 27 PM
-f
Screen Shot 2022-07-15 at 4 27 43 PM
-t
Screen Shot 2022-07-15 at 4 28 24 PM

I proceed to fithic with these input files; however, I encounter error #42. After, reading all comments about #42 and #39, I have removed the biases using: awk '{if ($3 > "0.5" && $3 < "2"){ print}}' and from fragments file I've removed all the validpairs with 0 using: awk '{if ($4 > "0" && $5 > "0"){print }}'.

After, that now I have
-t
Screen Shot 2022-07-15 at 4 38 14 PM
-f
Screen Shot 2022-07-15 at 4 37 20 PM

After this trimming these files, I used this command to run Fithic:

  • fithic -i fithic.interactionCounts.gz -f fragmentMappability.gz -t biases.gz -r 10000 -o ./fithic-out3/

After Fithic2 ran successfully on my files, when I checked the log file, I noticed that " Out of 70041 loci 70041 were discarded with biases not in range [0.5 2]".

Here, is the log file:
Screen Shot 2022-07-15 at 4 54 35 PM

And my significance file has all the values are equal to zero and this is the significance file:
Screen Shot 2022-07-15 at 4 56 39 PM

I am wondering how to resolve this issue. I don't know where I am making mistakes. I really appreciate it if you have a look into this and help me to fix this problem.

Thank you!
Best,
Nisar

FitHiC run error

Dear Ferhat,

I am interested in your developed tool "FitHiC", however, I met a problem when I trying to find the loops in some specific genome region . It reported an error that:
###################
Running generate_FragPairs method ...
Complete generate_FragPairs method [OK]
Running read_ICE_biases method ...
Complete read_ICE_biases method [OK]
Running read_All_Interactions method ...
Error in Ops.factor(chr1, chr2) : level sets of factors are different
##################

my original command is:
FitHiC(fragsfile = "fithic.fragmentMappability",intersfile = "fithic.interactionCounts", outdir=getwd(),biasfile="fithic.biases",libname="ESC",noOfBins=20,distUpThres=14000000,distLowThres=13500000,visual=TRUE)

I don't know what is the problem? Could you give me some help?

Thank you so much!
Best,
Garen

Question about the output of merge-filter.sh

Hi:
I tried to run merge-filter.sh for merging spatially close, significant interactions from FitHiC2, but I am confused about the output files of merge-filter.sh .
There are many columns, like chr1 mid1 chr2 mid2 and bin1_low bin1_high bin2_low bin2_high, in output files of merge-filter.sh , which one is the location of merged interactions ? And, about the parameter "fdr", do you have any recommended settings? I found there are too many merged interactions(about more than 200000 rows) if I set "fdr" to 0.01.
Thanks in advance !
Best wishes
Qianzhao

error while using fithic for the interchromosomal interactions

Dear Dr. Ay,
I used fithic for the intrachromosomal contacts and it worked perfectly.
I am trying to use fithic 2.0.7 for the interchromosomal contacts and here is my command:

python fithic.py -f fragments_list.gz -i chr9_chrX_1mb_fithic.gz -o test/ -r 1000000 -t bias.gz -x All
python fithic.py -f fragments_list.gz -i chr9_chrX_1mb_fithic.gz -o test/ -r 1000000 -t bias.gz -x interOnly

but it gives me this error:

File "fithic.py", line 310, in main
(binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb)= generate_FragPairs(binStats, fragsFile, resolution)
File "fithic.py", line 560, in generate_FragPairs
currBin = binStats[binTracker]
KeyError: 0

Can you help me solve this issue???

Best regards,
Noha

Can't Open File

When using merge-filter.sh I run into:

python3: can't open file 'CombineNearbyInteraction.py': [Errno 2] No such file or directory

This is because a variable has been capitalized:

script=""$UTILITYFOLDER"CombineNearbyInteraction.py"

So the variable $UTILITYFOLDER is empty, whereas the variable $utilityfolder contains the argument needed. Not sure if this is bash version issue or something.

Fragment length is not consistent with fithic resolution (-r) in tests data

When running the tests data, I noticed that you set resolution (-r) to 100000, however, the input file has a fragment size of 1000000.

Here is tests script:

#/fithic/fithic/tests/run_tests-git.sh
#line 46-49
for i in Dixon_IMR90_HindIII_hg19_w100000; do
    python3 ../fithic.py -r 100000 -l "$i" -i $inI/$i.gz -f $inF/$i.gz -b $noOfBins -p $noOfPasses -o outputs/${i}.interOnly -x interOnly
    python3 ../fithic.py -r 100000 -l "$i" -i $inI/$i.gz -f $inF/$i.gz -b $noOfBins -p $noOfPasses -o outputs/${i}.all -x All
done

Here is tests data:

#/fithic/fithic/tests/contactCounts/Dixon_IMR90_HindIII_hg19_w100000.gz
chr10   500000  chr10   500000  13850
chr10   500000  chr10   1500000 3472
chr10   500000  chr10   10500000        370

Here is log file:

#Dixon_IMR90_HindIII_hg19_w100000.fithic.log
Interactions file read successfully
-----------------------------------------------------------------------------------
-
Observed, Intra-chr in range: pairs= 215762      totalCount= 91387585
Observed, Intra-chr all: pairs= 218642   totalCount= 121700752
Observed, Inter-chr all: pairs= 3878618  totalCount= 99952107
Range of observed genomic distances [1000000 249000000]

Making equal occupancy bins
-----------------------------------------------------------------------------------
-
Observed intra-chr read counts in range 91387585
Desired number of contacts per bin      456937.925,
Number of bins  200
Equal occupancy bins generated

Looping through all possible fragment pairs in-range
-----------------------------------------------------------------------------------
-
Chromosome 'chr1',      250 mappable fragments,         -2487765 possible intra-chr
 fragment pairs in range,    715750 possible inter-chr fragment pairs
Chromosome 'chr10',     136 mappable fragments,         -733191 possible intra-chr 
fragment pairs in range,     404872 possible inter-chr fragment pairs
Chromosome 'chr11',     136 mappable fragments,         -733191 possible intra-chr fragment pairs in range,     404872 possible inter-chr fragment pairs
Chromosome 'chr12',     134 mappable fragments,         -711689 possible intra-chr fragment pairs in range,     399186 possible inter-chr fragment pairs
Chromosome 'chr13',     116 mappable fragments,         -532571 possible intra-chr fragment pairs in range,     347652 possible inter-chr fragment pairs
Chromosome 'chr14',     108 mappable fragments,         -461283 possible intra-chr fragment pairs in range,     324540 possible inter-chr fragment pairs
Chromosome 'chr15',     103 mappable fragments,         -419328 possible intra-chr fragment pairs in range,     310030 possible inter-chr fragment pairs
Chromosome 'chr16',     91 mappable fragments,  -326796 possible intra-chr fragment pairs in range,     275002 possible inter-chr fragment pairs
Chromosome 'chr17',     82 mappable fragments,  -264957 possible intra-chr fragment pairs in range,     248542 possible inter-chr fragment pairs
Chromosome 'chr18',     79 mappable fragments,  -245784 possible intra-chr fragment pairs in range,     239686 possible inter-chr fragment pairs
Chromosome 'chr19',     60 mappable fragments,  -141075 possible intra-chr fragment pairs in range,     183180 possible inter-chr fragment pairs
Chromosome 'chr2',      244 mappable fragments,         -2369499 possible intra-chr fragment pairs in range,    700036 possible inter-chr fragment pairs
Chromosome 'chr20',     64 mappable fragments,  -160719 possible intra-chr fragment pairs in range,     195136 possible inter-chr fragment pairs
Chromosome 'chr21',     49 mappable fragments,  -93654 possible intra-chr fragment pairs in range,      150136 possible inter-chr fragment pairs
Chromosome 'chr22',     52 mappable fragments,  -105627 possible intra-chr fragment pairs in range,     159172 possible inter-chr fragment pairs
Chromosome 'chr3',      199 mappable fragments,         -1574304 possible intra-chr fragment pairs in range,    579886 possible inter-chr fragment pairs
Chromosome 'chr4',      192 mappable fragments,         -1465167 possible intra-chr fragment pairs in range,    560832 possible inter-chr fragment pairs
Chromosome 'chr5',      181 mappable fragments,         -1301586 possible intra-chr fragment pairs in range,    530692 possible inter-chr fragment pairs
Chromosome 'chr6',      172 mappable fragments,         -1174947 possible intra-chr fragment pairs in range,    505852 possible inter-chr fragment pairs
Chromosome 'chr7',      160 mappable fragments,         -1016175 possible intra-chr fragment pairs in range,    472480 possible inter-chr fragment pairs
Chromosome 'chr8',      147 mappable fragments,         -857172 possible intra-chr fragment pairs in range,     436002 possible inter-chr fragment pairs
Chromosome 'chr9',      142 mappable fragments,         -799617 possible intra-chr fragment pairs in range,     421882 possible inter-chr fragment pairs
Chromosome 'chrX',      156 mappable fragments,         -965811 possible intra-chr fragment pairs in range,     461292 possible inter-chr fragment pairs
Chromosome 'chrY',      60 mappable fragments,  -141075 possible intra-chr fragment pairs in range,     183180 possible inter-chr fragment pairs
Number of all fragments= 3113
Possible, Intra-chr in range: pairs= -19082983 
Possible, Intra-chr all: pairs= 241996.0 
Possible, Inter-chr all: pairs= 4604945.0 
Desired genomic distance range   [0 inf] 
Range of possible genomic distances  [100000  249450000] 
Baseline intrachromosomal probability is 4.13229970743318e-06 
Interchromosomal probability is 2.1715785964870374e-07 
5th quantile of biases: 0.57080572791248
50th quantile of biases: 1.01076079547
95th quantile of biases: 1.20269227401
Out of 3053 loci 85 were discarded with biases not in range [0.5 2]


Calculating probability means and standard deviations of contact counts
------------------------------------------------------------------------------------
Means and error written to outputs/Dixon_IMR90_HindIII_hg19_w100000.all/Dixon_IMR90_HindIII_hg19_w100000.fithic_pass1.res100000.txt


Fitting a univariate spline to the probability means
-----------------------------------------------------------------------------------
Spline successfully fit

The 'Possible, Intra-chr in range: pairs= -19082983' seems weird. If set -r to 1000000, the 'Intra-chr in range: pairs= ' is a positive number and the significant interactions greatly reduce. Shouldn't the resolution parameter (fithic -r) be the same as the fragment length (Dixon_IMR90_HindIII_hg19_w100000.gz)?

Any example of using .hic file as the input?

I am struggling with the output from juicer to fit with fithic. I dump my .hic file and then tried creatFitHiCContact-hic.sh but end up with nothing. I am wondering if we have any examples using .hic as the start to run Fithic.
Thanks

Update Readme

Hi, I think the Readme might be a little outdated. The docs mention a ./runall but I didn't find any executable called that in the folder, and installing from pypi worked well enough to where you might just link the data files to run a test, or include them as a unit test in the module.

Also it looks like _tkinter was a requirement so you may want to add that to the requirements.

fithic crash

Hi,
I am trying to run fithic downstream of HiC-pro.
I managed to convert the Hicpro output and the iced base using the HiCPro2FitHiC utility function.
Nevertheless when running FitHiC with the produced files I get the following error which is hard for me to grasp.
Any insight is appreciated
Thanks
Francesco

Reading the contact counts file to generate bins...
Interactions file read. Time took 535.2248961925507
Traceback (most recent call last):
File "/hpcnfs/data/GN2/fgualdrini/tools/anaconda3/envs/EnvFITHIC2/bin/fithic", line 10, in
sys.exit(main())
File "/hpcnfs/data/GN2/fgualdrini/tools/anaconda3/envs/EnvFITHIC2/lib/python3.6/site-packages/fithic/fithic.py", line 310, in main
(binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb)= generate_FragPairs(binStats, fragsFile, resolution)
File "/hpcnfs/data/GN2/fgualdrini/tools/anaconda3/envs/EnvFITHIC2/lib/python3.6/site-packages/fithic/fithic.py", line 542, in generate_FragPairs
maxFrags[ch]=max([int(i)-resolution/2 for i in allFragsDic[ch]])
ValueError: max() arg is an empty sequence

TypeError: can only concatenate str (not "int") to str

Hello,
My test run (fithic/tests/run_tests-git.sh) finished successfully, but while running it on my files using this command using version 2.0.7:

python3 fithic.py -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -o FitHicAmphioxus -t fithic.biases.gz -r 150000

I received this error:

Reading the contact counts file to generate bins...
Interactions file read. Time took 26.23392629623413
Traceback (most recent call last):
File "/home/user/sarigoel/Programs/FITHIC/fithic/fithic/fithic.py", line 1324, in
main()
File "/home/user/sarigoel/Programs/FITHIC/fithic/fithic/fithic.py", line 323, in main
(binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb) = generate_FragPairs(observedInterAllCount, observedInterAllSum, binStats, fragsFile, resolution)
File "/home/user/sarigoel/Programs/FITHIC/fithic/fithic/fithic.py", line 600, in generate_FragPairs
print("ERROR - the chromosome " + ch + " has " + len(allFragsDic[ch]) + " valid fragments/bins and should be removed from the input fragment information !!! ")
TypeError: can only concatenate str (not "int") to str

Here is how my input files look like:

[sarigoel@myotis AMPHIOXUS]$ zcat fithic.biases.gz | head -n2
Sc7u5tJ_517 75000 1.970547623956338
Sc7u5tJ_517 225000 0.40157523166875075
[sarigoel@myotis AMPHIOXUS]$ zcat fithic.fragmentMappability.gz | head -n2
Sc7u5tJ_517 0 75000 17395 1
Sc7u5tJ_517 150000 225000 2437 1
[sarigoel@myotis AMPHIOXUS]$ zcat fithic.interactionCounts.gz | head -n2
Sc7u5tJ_517 75000 Sc7u5tJ_517 75000 1700
Sc7u5tJ_517 75000 Sc7u5tJ_517 225000 5

I used an old HicPro (version 2.10.0) to generate my initial data and used this command/script to convert it:

python3 HiCPro2FitHiC.py -i Sample1_150000.matrix -b Sample1_150000_abs.bed -s Sample1_150000_iced.matrix.biases -o . -r 150000

These files had these lengths:
3776446 Sample1_150000.matrix
3769 Sample1_150000_abs.bed
3769 Sample1_150000_iced.matrix.biases

and first two lines were as below:

**==> Sample1_150000.matrix <==
1 1 1700
1 2 5

==> Sample1_150000_abs.bed <==
Sc7u5tJ_517 0 150000 1
Sc7u5tJ_517 150000 246623 2

==> Sample1_150000_iced.matrix.biases <==
1.917118534333063673e+00
3.906869898508548156e-01**

Sample1_150000_iced.matrix.biases file had also nan values which were I guess converted to -1.

Following the conversion the files kept their original lengths:

[sarigoel@myotis AMPHIOXUS]$ zcat fithic.interactionCounts.gz | wc -l
3776446
[sarigoel@myotis AMPHIOXUS]$ zcat fithic.fragmentMappability.gz | wc -l
3769
[sarigoel@myotis AMPHIOXUS]$ zcat fithic.biases.gz | wc -l
3769

As for the chromosome names, all start with Sc7u5tJ_ and there is no other special character than an underscore, each followed by a scaffold number.

The log file had these lines:

###########
Interactions file read successfully
Observed, Intra-chr in range: pairs= 275495 totalCount= 6213510
Observed, Intra-chr all: pairs= 275495 totalCount= 6213510
Observed, Inter-chr all: pairs= 3500951 totalCount= 7397792
Range of observed genomic distances [0 35250000]

Making equal occupancy bins
Observed intra-chr read counts in range 6213510
Desired number of contacts per bin 62135.1,
Number of bins 100
Equal occupancy bins generated

Looping through all possible fragment pairs in-range_
############

Can you think of a reason that may have caused the error?
Thank you!

--visual flag has cutoff images

Hi, I noticed that while I was using the --visual flag to produce plots, all the plots were slightly cutoff. I imagine this is a matplotlib issue while saving the images, maybe adding plt.tight_layout() or something before saving the image? (Unless it's being drawn with R?)

Merge filter generalizable?

Having multiple adjacent bin pairs all passing some constant significance threshold despite stemming from the same underlying interaction can be an issue in quite a number of other Hi-C analyses, for instance in calling differential interactions by SELFISH as implemented by your group.

The merging filer introduced here seems like a nice approach to this problem, but I was curious if there has been any testing done on whether such a method is suited to prune the output of other tools?

Thanks!

HiCKRy.py generated extreme bias value

Hi:
When I tried to calculate bias values using HiCKRy.py, I found there are some extreme values as like chr1 30875000 2043.101940700636 ; but the bias file generated by ICE normalization using HiCPro is normal, chr1 30875000 0.8739076476642095. My script is simple : python $KR -i $in -f $frag -o $bias
Meanwhile when I used the bias file generated by HiCKRy.py to finding inter-chromosomal significant interactions, I obtained too much interactions (5183320,q<0.01), but when I used ICE normalized bias file, I only obtained some interactions (493135,q<0.01).
It seems that the inter-chromosomal significant interactions from ICE normalization were more solid through comapred with contact heatmap using juicer.
Maybe I should choose the ICE normalization for calling inter-chromosomal significant interactions. Counld you give me some suggestions?

Best wishes
Qianzhao

fithic run error

**Dear professor,
I download the fithic, and use the command to setup:
python setup.py install --user

after setup I run the command:sh run_test.sh
but an error occured below:
**

observedIntraInRangeSum 2210827 desiredPerBin 22108 noOfBins 100
Plotting Duan_yeast_EcoRI.fithic_pass1.png
/share/nas2/genome/biosoft/Python/2.7.8/lib/python2.7/site-packages/matplotlib-1.4.3-py2.7-linux-x86_64.egg/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):
Writing Duan_yeast_EcoRI.fithic_pass1.txt
Fit a univariate spline to the probability means
baseline intra-chr probability: 1.4239355014175277e-06 baseline inter-chr probability: 1.1954383983420704e-07
Traceback (most recent call last):
File "/home/liufuyan/.local/bin/fithic", line 9, in
load_entry_point('fithic==1.0.8', 'console_scripts', 'fithic')()
File "/home/liufuyan/.local/lib/python2.7/site-packages/fithic-1.0.8-py2.7.egg/fithic/fithic.py", line 184, in main
splineXinit,splineYinit,splineResidual,isOutlier,splineFDRxinit,splineFDRyinit=fit_Spline(x,y,yerr,options.intersfile,sortedInteractions,biasDic,libname+".spline_pass1",1)
File "/home/liufuyan/.local/lib/python2.7/site-packages/fithic-1.0.8-py2.7.egg/fithic/fithic.py", line 276, in fit_Spline
newSplineY = ir.fit_transform(splineX, splineY, increasing=False)
File "/home/liufuyan/.local/lib/python2.7/site-packages/sklearn/base.py", line 436, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
TypeError: fit() got an unexpected keyword argument 'increasing'

Also I meet another issues:

cannot connect to X server localhost

Could you provide information how to avoid use the X serve.

Thank you !

fuyan

Scikit-learn broken command

It looks like there's something wrong with a call to the isotonic module in scikit-learn. When running the example out of the box I got the error:

TypeError: fit() got an unexpected keyword argument 'increasing'

After looking at your code, it looks like you are calling the fit_transform function from the Isotonic Regression module in scikit-learn which passes it to the fit function. It's hard to tell (https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/isotonic.py#L325) but I wasn't sure if it accepts additional parameters. It could via some mixin but removing "increasing=False" from line 276 seems to do the trick. (https://github.com/ay-lab/fithic/blob/master/fithic/fithic.py#L276)

Not sure if this is an update to scikit-learn or something that caused a change or if I managed to run it wrong.

Missing chr1 in output

Hi,
Thanks for the magic tool. But, I can't find either inter or intra-chromosome data related with chr1 in the output file. I wonder how to fix that.

HiCKRy.py Key errors

Hi

I have been trying to run HiCKRy.py on data dumped from Juicer. The contact counts were dumped using no normalisation at 1kb resolution (we have greater than 4 billion contacts) The error i keep getting from HiCKRy.py is as follows:

Creating sparse matrix...
Traceback (most recent call last):
File "HiCKRy.py", line 283, in
main()
File "HiCKRy.py", line 276, in main
matrix,revFrag = loadfastfithicInteractions(args.interactions, args.fragments)
File "HiCKRy.py", line 45, in loadfastfithicInteractions
x.append(fragDic[chrom1][mid1])
KeyError: '1'

The contacts file generated looks like this:

1 87000 1 87000 2.0
1 87000 1 88000 1.0
1 137000 1 139000 1.0
1 181000 1 181000 17.0
1 181000 1 182000 2.0
1 182000 1 182000 1.0
1 187000 1 190000 1.0
1 190000 1 191000 1.0
1 597000 1 598000 1.0
1 598000 1 599000 1.0

The fragment file generated looks like this:

chr1 0 500 1 1
chr1 1000 1500 1 1
chr1 2000 2500 1 1
chr1 3000 3500 1 1
chr1 4000 4500 1 1
chr1 5000 5500 1 1
chr1 6000 6500 1 1
chr1 7000 7500 1 1
chr1 8000 8500 1 1
chr1 9000 9500 1 1

Do you have any suggestions for this or would it be easier to dump the contacts from Juicer with the KR normalisation already applied?

Thanks in advance,

James

Standard Error is always 0

Between lines 887 and 897 in fithic/fithic/fithic.py there is some commenting out of some standard error calculation?

Running error

Hi,
I got a such error when running fithic with command:
python /home/user/liu/Software/fithic/fithic/fithic.py -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -o ./Inter -r 40000 -t fithic.biases.gz -x interOnly
Btw, the test running is working successfully.

#######################################################
Reading fragments file from: fithic.fragmentMappability.gz
Reading interactions file from: fithic.interactionCounts.gz
Output path created ./Inter
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 40.0 kb
Reading bias file from: fithic.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be Emu
Upper Distance threshold is inf
Lower Distance threshold is 0
Only inter-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...

Reading the contact counts file to generate bins...
Interactions file read. Time took 154.33874917030334
Traceback (most recent call last):
File "/home/user/liu/Software/fithic/fithic/fithic.py", line 1317, in
main()
File "/home/user/liu/Software/fithic/fithic/fithic.py", line 323, in main
(binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb) = generate_FragPairs(observedInterAllCount, observedInterAllSum, binStats, fragsFile, resolution)
File "/home/user/liu/Software/fithic/fithic/fithic.py", line 597, in generate_FragPairs
maxFrags[ch]=max([int(i)-resolution/2 for i in allFragsDic[ch]])
ValueError: max() arg is an empty sequence
#######################################################

Could you please help with it?

Best~
Jing

test run AttributeError

Hi,

I am trying to check whether Fithic is running correctly.
When I run the run_tests-git.sh, it gives me the following error:

Traceback (most recent call last):
File "../fithic.py", line 30, in
import matplotlib
File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/fithic/lib/python3.6/site-packages/matplotlib/init.py", line 107, in
from . import cbook, rcsetup
File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/fithic/lib/python3.6/site-packages/matplotlib/rcsetup.py", line 28, in
from matplotlib.fontconfig_pattern import parse_fontconfig_pattern
File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/fithic/lib/python3.6/site-packages/matplotlib/fontconfig_pattern.py", line 15, in
from pyparsing import (Literal, ZeroOrMore, Optional, Regex, StringEnd,
File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/fithic/lib/python3.6/site-packages/pyparsing/init.py", line 133, in
version = version_info.version
AttributeError: 'version_info' object has no attribute 'version'

A similar error appears when I run fithic using only the mandatory arguments.
I wonder if you have some ideas how to fix this?

Generate biasfile/KRnorm vector

Hello,
What would you recommend for generating the biasfile (i.e. KRnorm) from the Hi-C interaction matrix? All the tools I have found compute directly the KR normalized matrix, but as far as I understand, FitHiC requires the raw matrix and the bias vector. I have tried retrieving the bias vector from the diagonal balanced matrix, but I was wondering if there is a more straightforward approach you would recommend.
Thanks!

HTML file has broken links to FitHiC outputs

I noticed the links in the HTML generated are missing the resolution specified when running FitHiC.

e.g. the equal occupancy bin statistics file:

SAMPLE.fithic_pass1.txt

should be (for resolution=5kb):

SAMPLE.fithic_pass1.[res5000.]txt

Advice on converting pairix or cooler format to fithic format

Dear fithic developers

I had a question on how to basically convert a cooler (.cool) or pairix format files into a fithic input format.
Main reason I have a cool format (or pairix format) is because I'm working on an experimental long read based Hi-C procedure, hence standard Hi-C pipelines like HiC Pro is not relavent for me and I dont think I can use the HiCPro2FitHiC scripts provided by the fithic package. Perhaps I could ask some questions for clarifications so that it can help people with cool files and want to use fithic?

So basically what Im trying to do is change the .cool file into the interactions and fragments file for fithic.
It seems like for the interactions file can be created with cooler dump function, specifically

cooler dump --join fubar.cool

which will give you a 7 column "matrix" where first three column represents bin_i_chromosome bin_i_start bin_i_end, and the next three gives you the interacting bin, and the last column is contact count between the two bins. So basically if you average the start and end of each bin I think it would correspond to fragmentMid1 & fragmentMid2 of the interaction file. My first question is, it seems like you don't count the diagonal cells in the interaction file? In other words the contact count for bin_i vs. bin_i should not be in the interaction file?

For the fragments file I think the 2nd and 5th column would hold some dummy values (as it doesnt matter what they are?) and I dont think its easily found with a cool file. So the column marginalizedContactCount is most important to fill, and based on the description it seems like its just a summation of the entire contacts for a given bin? So if one had a N x N matrix representing the Hi-C matrix, the marginalizedContactCount can just be a summation of each row?

Thank you for the help.

error while running FitHiC

Hi,
I am running version Fit-Hi-C 2.0.7.
I used the command below
fithic -f /project/roselai_228/priyatap/HiC_work/fithic_output/fithic.fragmentMappability.gz -i /project/roselai_228/priyatap/HiC_work/fithic_output/fithic.interactionCounts.gz -U 1000000 -r 1000000 -l fithit_SRR1030745_1MB -o /project/roselai_228/priyatap/HiC_work/fithic_output

and encounter the error
Screen Shot 2020-12-11 at 7 26 34 PM

Please help me, how to resolve the issue.
FYI, I used the below function to generate the input for fithic. I got error in this command too but it generated three zip files. SO, I used 2 files in the above code.

python HiCPro2FitHiC.py -i /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/raw/1000000/SRR1030745_1000000.matrix -b /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/raw/1000000/SRR1030745_1000000_abs.bed -s /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix.biases -o /scratch/priyatap/Hicpro_output

The error I got from the HiCPro2FitHiC.py is below

Screen Shot 2020-12-11 at 6 29 45 PM

Please help me out.
Thank you in advance for your help,
Priya

FitHic crashes

Hi,

I try to detect loops with FitHic, I installed the tool via bioconda.

First, I converted a cooler file to hicpro and the required bed file, next, I used your script HiCPro2FitHiC (which is actually not part of the bioconda installation... maybe that should be included?) to transform the data to FitHic compatible input.

However, I run now FitHic with:

fithic -i hmec_100kb/fithic.interactionCounts.gz -f hmec_100kb/fithic.fragmentMappability.gz -o hmec_100kb/ -r 100000

but get a crash:

GIVEN FIT-HI-C ARGUMENTS
=========================
Reading fragments file from: hmec_100kb/fithic.fragmentMappability.gz
Reading interactions file from: hmec_100kb/fithic.interactionCounts.gz
Output path being used from hmec_100kb/
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 100.0 kb
No bias file
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is inf
Lower Distance threshold is 0
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...
=========================


Reading the contact counts file to generate bins...
Interactions file read. Time took 244.58135843276978
Traceback (most recent call last):
  File "/home/wolffj/miniconda3/envs/fithic2/bin/fithic", line 10, in <module>
    sys.exit(main())
  File "/home/wolffj/miniconda3/envs/fithic2/lib/python3.6/site-packages/fithic/fithic.py", line 310, in main
    (binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb)= generate_FragPairs(binStats, fragsFile, resolution)
  File "/home/wolffj/miniconda3/envs/fithic2/lib/python3.6/site-packages/fithic/fithic.py", line 542, in generate_FragPairs
    maxFrags[ch]=max([int(i)-resolution/2 for i in allFragsDic[ch]])
ValueError: max() arg is an empty sequence

Do you have any idea what I need to do to get it running?

Thanks a lot,

Joachim

list index out of range

The following are the three input files:

$ zcat fat_5000.fithic.fragment.gz |head
NC_052532.1	0	2500	0	0
NC_052532.1	0	7500	0	0
NC_052532.1	0	12500	1106	1
NC_052532.1	0	17500	3828	1
NC_052532.1	0	22500	7946	1
NC_052532.1	0	27500	1786	1
NC_052532.1	0	32500	11554	1
NC_052532.1	0	37500	4999	1
NC_052532.1	0	42500	7694	1
NC_052532.1	0	47500	10932	1
zcat fat_5000.fithic.interaction.gz |head
NC_052532.1	12500	NC_052532.1	12500	113
NC_052532.1	12500	NC_052532.1	17500	15
NC_052532.1	12500	NC_052532.1	22500	4
NC_052532.1	12500	NC_052532.1	27500	1
NC_052532.1	12500	NC_052532.1	32500	1
NC_052532.1	12500	NC_052532.1	42500	1
NC_052532.1	12500	NC_052532.1	47500	5
NC_052532.1	12500	NC_052532.1	52500	2
NC_052532.1	12500	NC_052532.1	62500	2
NC_052532.1	12500	NC_052532.1	67500	3
zcat fat_5000.fithic.bias.gz|head
NC_052532.1	2500	0.447834
NC_052532.1	7500	0.098977
NC_052532.1	12500	0.150248
NC_052532.1	17500	0.374007
NC_052532.1	22500	0.563625
NC_052532.1	27500	0.239352
NC_052532.1	32500	0.588517
NC_052532.1	37500	0.492011
NC_052532.1	42500	0.661867
NC_052532.1	47500	0.819888

And the following are my error reporting messages:

GIVEN FIT-HI-C ARGUMENTS
=========================
Reading fragments file from: /home/SLY68/2022/hic/juicer/down_analysis/raw/fit-hic2/fat_5000.fithic.fragment.gz
Reading interactions file from: /home/SLY68/2022/hic/juicer/down_analysis/raw/fit-hic2/fat_5000.fithic.interaction.gz
Output path created ./interOnly/
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 5.0 kb
Reading bias file from: /home/SLY68/2022/hic/juicer/down_analysis/raw/fit-hic2/fat_5000.fithic.bias.gz
The number of spline passes is 2
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is inf
Lower Distance threshold is 0
Graphs will be outputted
Only inter-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...
=========================


Reading the contact counts file to generate bins...
Interactions file read. Time took 2983.205014705658
Fragments file read. Time took 0.9499287605285645
Traceback (most recent call last):
  File "/home/SLY68/anaconda3/envs/hicpro/bin/fithic", line 8, in <module>
    sys.exit(main())
  File "/home/SLY68/anaconda3/envs/hicpro/lib/python3.7/site-packages/fithic/fithic.py", line 327, in main
    biasDic = read_biases(biasFile)
  File "/home/SLY68/anaconda3/envs/hicpro/lib/python3.7/site-packages/fithic/fithic.py", line 808, in read_biases
    chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range

how to prepare my `input file`?

This software is really great, but I don't know how to prepare my input file? How should I make the input file from .HiC , cool or h5 format HiC file?

OverflowError: cannot convert float infinity to integer

Hi! I hope you are doing well. I am receiving this error when I try different upper bound values while running FitHiC:

(fithic) [murthys3 FitHiC]$ fithic -f test/fragmentLists/Dixon_hESC_HindIII_hg18_w40000_chr1.gz -i test/contactCounts/Dixon_hESC_HindIII_hg18_w40000_chr1.gz -o test/tested_outputs_again -L 2 -U 1000 -p 2 -b 100 -r 40000 -x intraOnly
.
.
.
.
.
.
.
.
Traceback (most recent call last):
File "/home/murthys3/.local/bin/fithic", line 10, in
sys.exit(main())
File "/home/murthys3/.local/lib/python3.6/site-packages/fithic/fithic.py", line 310, in main
(binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb)= generate_FragPairs(binStats, fragsFile, resolution)
File "/home/murthys3/.local/lib/python3.6/site-packages/fithic/fithic.py", line 654, in generate_FragPairs
log.write("Range of possible genomic distances [%d %d] \n" % (minPossibleGenomicDist, maxPossibleGenomicDist)),
OverflowError: cannot convert float infinity to integer

It seems like for the fragment file, there is an upper threshold set as infinity that cannot be converted to an integer. Without the upper threshold, the FitHiC program runs successfully. Do you know how I can fix this? This seems to be the case regardless of the input file I have selected.

Thanks.

HiCKRy.py KeyError: '0'

Hi,

I'm trying to generate a bias values file and it gives me this error:

Creating sparse matrix...
Traceback (most recent call last):
File "/gpfs/data/reinberglab/home/kl3488/fithic/fithic/utils/HiCKRy.py", line 283, in
main()
File "/gpfs/data/reinberglab/home/kl3488/fithic/fithic/utils/HiCKRy.py", line 276, in main
matrix,revFrag = loadfastfithicInteractions(args.interactions, args.fragment s)
File "/gpfs/data/reinberglab/home/kl3488/fithic/fithic/utils/HiCKRy.py", line 45, in loadfastfithicInteractions
x.append(fragDic[chrom1][mid1])
KeyError: '0'

My contact counts file and fragment mappability files seem to look ok:

Contact counts:

chr10 100467500 chr10 100592500 1
chr10 100467500 chr10 100597500 2
chr10 100467500 chr10 100602500 2

Fragment mappability:

chr10 895000 897500 1 1
chr10 900000 902500 1 1
chr10 905000 907500 1 1

intra vs inter chromosomal counts in input

Hi, this is not a technical issue but rather it's not documented in the manual. I know fithic is for mid-range intrachromosomal contacts. In the input files, specifically the fragments file, does the marginalized count need to be for the intra counts only or for all counts?

Thanks!

Matplotlib Backend Error

A common problem on a headless OS, but when I tried to run fithic with the --visual flag:

fithic -f data/yeast_fragments.gz -i data/yeast_counts.gz -o sample -p 10 -l yeast --visual

I got the following error:

/home/asur/.local/lib/python2.7/site-packages/fithic/fithic.py:119: UserWarning: 
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
The backend was *originally* set to u'TkAgg' by the following code:
  File "/usr/local/bin/fithic", line 7, in <module>
    from fithic.fithic import main
  File "/home/asur/.local/lib/python2.7/site-packages/fithic/fithic.py", line 25, in <module>
    from pylab import *
  File "/home/asur/.local/lib/python2.7/site-packages/pylab.py", line 1, in <module>
    from matplotlib.pylab import *
  File "/home/asur/.local/lib/python2.7/site-packages/matplotlib/pylab.py", line 257, in <module>
    from matplotlib import cbook, mlab, pyplot as plt
  File "/home/asur/.local/lib/python2.7/site-packages/matplotlib/pyplot.py", line 72, in <module>
    from matplotlib.backends import pylab_setup
  File "/home/asur/.local/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 14, in <module>
    line for line in traceback.format_stack()

  matplotlib.use('Agg')

While I noticed you did include the work around in your code by importing the appropriate matplotlib backend, it looks like pylab imports the wrong backend before that line. By moving

import matplotlib
matplotlib.use('Agg')

to the top of the fithic.py script, I was able to avoid the error. I suppose you call the optional import before the pylab import?

Difference between ICE and KR biases. Setting up the bias limits.

Hi,
I am processing a HiC-Pro results from a small genome (in scaffolds) at 5000bp resolution. I have used HiCPro2FitHiC utility to convert data and also generated the KR biases file using HiCKRy.py.

For example, I see 2 extremely high values in the ICE:
HiC_scaffold_83 6297500 88.3104735696205
HiC_scaffold_20 672500 43.444427428612
that are far from corresponding KR values:
HiC_scaffold_83 6297500 1.71088667499396
HiC_scaffold_20 672500 0.824151827905017
...otherwise, the distributions are similar, with a number of -1 values.

Which of the bias version is preferable?
My other question is in which cases the -bL and -bU need to be modified and whether it is appropriate to adjust them to the bias method or other genome/data-specific factors.

Thank you!

How to decide the bin size in interaction loops?

Hello,

I am running fithic. From the output, it only shows the midpoint of fragment in interaction loops. I am just wondering how I can come to determine the bin size to know which range of genomic regions can be potentially interacted with another range of genomic regions? I assume just the output from previous FitHiChip results. Thank you!

Zhikai

merge_filter reduces significant interactions from 12M to 10k

I am using HiC-Pro/FitHiC tools for calling interactions from two Hi-C replicates for a large genome. FitHiC2 at 20kb resolution (40kb-2Mb) found 12M interactions with q-value <0.005 (from ".spline_pass2.res20000.significances.txt.gz" file). After running the default merge_filter.sh, the number went down to 10kb with FDR=0.05. Is this normal or is it possible that I need to adjust some parameters in CombineNearbyInteraction.py? I could not find details on this step.
Thank you for the great toolkits and help!
Pavla

q-value is always 1 when running FitHiC2 using allValidPairs input

Hi!

I hope you are doing well. I ran FitHiC2 by first using the validPairs2FitHiC-fixedSize.sh for interactions file input, HiCKRy.py for bias file input, and createFitHiCFragments-fixedsize.py for fragments file input. When obtaining the output, nearly all the q-values are 1, and the bias values are consistently -1. When running directly from HiCPro2FitHiC, many of the q-values are below 1. I think the difference in results may have to do with setting the percentOfSparseToRemove (-x) parameter when generating the bias file. I wanted to know if you had any suggestions on how you would determine what parameters and cutoffs to use when generating the bias file as well as for running the FitHiC command after using the validPairs2FitHiC-fixesSize.sh, createFitHiCFragments-fixedsize.py, and HicKRy.py for generating the input files.

Thanks so much in advance!

Best,
Shanta

Where to find fragments argument from HiC Pro Pipeline?

Hello,

I am trying to run Fit-HiC to analyze HiChIP data I have already aligned using the HiC-pro pipeline. I have inferred from your readme that the interactions argument would be my .allvalidpairs file generated from hic-pro pipeline. However, I cannot see anything in a format that could work for the fragments file.

Do you know if the HiC-pro pipeline generates a file that would work as a fragments file for Fit-HiC?

Any help would be greatly appreciated.

Thanks!

sequencing depth for loops

Hello,
I want to call loops using FitHiC2 at 10kb resolution. Do you know how many valid paired reads are rquired by FitHiC2 to call loops at 10kb resolution? 300million valid paired reads?

fragmentMid

Hello,
would you like to tell me what fragmentMid reperesent?
how to get the coordinates of 2 loop anchors?

image

TypeError: coercing to Unicode: need string or buffer, NoneType found

Hello,
I install the fithic via pip and get follows error when running fithic in the command line:

Generating all possible intra-chromosomal fragment pairs and counting the number of all possible inter-chr fragment pairs
------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/public/home/zpxu/miniconda2/bin/fithic", line 11, in <module>
    load_entry_point('fithic==1.1.3', 'console_scripts', 'fithic')()
  File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/fithic/fithic.py", line 181, in main
    generate_FragPairs(options.fragsfile)
  File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/fithic/fithic.py", line 756, in generate_FragPairs
    infile = open(infilename, 'r')
TypeError: coercing to Unicode: need string or buffer, NoneType found

Any help is much appreciated.
Thanks.

Minor bug when testing installation

Dear Ay lab,

Thank you for making and maintaining this great software for the community.

I caught one small bug when I was testing my fithic installation and thought I would let you know. In the directory biasPerLocus, the file Dixon_IMR90_HindIII_hg19_w100000.gz is missing positional and normalization information for chrY loci. This results in an error (stdout):

Error. Bias file does not contain chromosome chr10 or chrY. Please ensure you're using correct file.

A little weird b/c chr10 is there in the file. Anyway, I added dummy information for chrY (set all normalization values to 1; see attached file) and that solved the problem.

Thanks again,
Kris

Dixon_IMR90_HindIII_hg19_w100000_chrYnow.tsv.gz

How to get different contacts?

I call Loops for normal and cancer samples using fithic. I want to compare two samples results. Could you give help information or which software could be used for achieve the function to me ?Thank you!

ValueError: max() arg is an empty sequence

Hi,
I am running version Fit-Hi-C 2.0.7.

The input files look like this:
image
image
image

I used the command below
fithic -i fithic.interactionCounts.gz -f fithic.fragmentMappability.gz -t fithic.biases.gz -r 5000 -x intraOnly -U 5000000 -L 20000 -o ./23_10k -l 23_10k-intrachrom -v

and encounter the error
image

When I just extracted one chromosome infomation from the three files to new files, and run the same command, it works.

Please help me out.
Thank you in advance for your help,
Chengming

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.