Coder Social home page Coder Social logo

ocrdetector's Introduction

OCRDetector

A novel bioinformatics pipeline named OCRDetector for detecting OCRs of the whole-genome based on cfDNA.

The original name is OCRDetector

email- [email protected]


Usage:

python OCRDetectBycfDNA.py [-h usage] [-i input file] [-o OCRs output bed]

Options:

-h: usage

-i: bam file list of cfDNA

-o: output file of detected OCRs

Example:

  • python OCRDetectBycfDNA.py -i bamFileList.txt -o OCRs.bed or python OCRDetectBycfDNA.py -i bamFileList.txt -o OCRs.bed -c -1

    The above command can calculate the OCRs of the whole genome

    image-20210408220024041

  • python OCRDetectBycfDNA.py -i bamFileList.txt -o OCRs.bed -c 1

    The above command can calculate the OCRs of chromosome 1

    image-20210408220409219

  • python OCRDetectBycfDNA.py -h

    View the usage of the software

image-20210408215235338


Guidance:

Step1:

First, you need to modify the content of the bamFileList.txt file and replace it with the path of your cfDNA-seq data. The input cfDNA-seq data file is of bam type and has been indexed.

image-20210408213119734

Step2:

python OCRDetectBycfDNA.py -i bamFileList.txt -o OCRs.bed

The above command can get the initial OCRs.

Step3:

Extract features and train machine learning models to filter false positives.

Step4:

Use bedtools to calculate the intersection of our OCRs and OCRs obtained from other data (ATAC-seq, Dnase-seq, TSS of Genes ).


Waiting for subsequent updates

ocrdetector's People

Contributors

fakenewss avatar

Watchers

James Cloos avatar  avatar

ocrdetector's Issues

reference GRCh37 required

I am unsure, but does OCRDetector require ensembl format build GRCh37?

python3 OCRDetectBycfDNA.py -i /home/ubuntu/data/bamFileList.txt -o OCRs.bed
begin
************************************* round : 0 -> 142602 *************************************
get bamfile done
Traceback (most recent call last):
File "OCRDetectBycfDNA.py", line 864, in
wpsList_Nor, lFdepth_Nor, sFdepth_Nor = callOneBed(bamfileList, contig, start, end, win=120)
File "OCRDetectBycfDNA.py", line 316, in callOneBed
for r in bamfile.fetch(contig, bed1, bed2):
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig 1

error in getPeakAveHeight - endPos bigger than startPos

Hi! I have encountered an error on getPeakAveHeight function while running OCRDetectBycfDNA.py.

I have added prints to the function to double check that wpsList is not empty. The problem seems to be that startPos is bigger than endPos. I'm not sure I understand how this is happening.

def getPeakAveHeight(wpsList, startPos, endPos):
    '''
    :param wpsList: 原始数据
    :param startPos: 起始位置
    :param endPos: 终止位置
    :return:
    '''
    if endPos > len(wpsList) or startPos == endPos:
        return 0

    print([startPos, endPos])
    print(wpsList)
    x = np.arange(startPos, endPos, 1)
    minNum = np.min(wpsList[startPos:endPos])
    minLine = np.array([minNum for i in range(endPos - startPos)])
    # lowLine = np.array([-0.5 for i in range(len(x))])
    return scipy.integrate.simps(np.subtract(np.array(wpsList)[startPos:endPos], minLine), x=x)
************************************  round :  786   ->   2841  *************************************
savgol filter done
scipy_signal_find_peaks done
[561, 587]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[1064, 1263]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[3526, 4162]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[4313, 4800]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[4900, 5141]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[5436, 5572]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[6121, 6342]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
[6616, 6514]
[0.34751679 0.34591636 0.34431592 ... 0.41062607 0.41341785 0.41620963]
Traceback (most recent call last):
  File "/home/julieta/Documents/github/OCRDetector/src/OCRDetectBycfDNA.py", line 881, in <module>
    ndrObjectList = findTssNDR(start, contig, peakObjectList, smoothWpsList_Nor, norm_lFdepth_Nor)
  File "/home/julieta/Documents/github/OCRDetector/src/OCRDetectBycfDNA.py", line 270, in findTssNDR
    aveHeight = getPeakAveHeight(smoothData, peakObjectList[i].endPos,
  File "/home/julieta/Documents/github/OCRDetector/src/OCRDetectBycfDNA.py", line 775, in getPeakAveHeight
    minNum = np.min(wpsList[startPos:endPos])
  File "/home/julieta/.local/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2953, in min
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/home/julieta/.local/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.