Coder Social home page Coder Social logo

pipelines's Introduction

analytical tools

Analytical tools and pipelines developed by the Gaulton Lab at UCSD.

bulk_ATAC-seq

Process bulk ATAC-seq data.

Last update: 08/30/2017
Maintained by: Josh Chiou

snATAC-seq

Process and analyze single nucleus ATAC-seq data.

Last update: 12/10/2018
Maintained by: Josh Chiou

rare_variants

Rare variant testing in disrupted footprints.

Last update: 06/26/2017
Maintained by: Josh Chiou

reweight_variants

Re-weighting variants after applying epigenetic and eQTL priors.

Last update: 07/02/2017
Maintained by: Josh Chiou

ATAC-seq_footprinting

Footprinting in ATAC-seq peaks.

Last update: 06/11/2017
Maintained by: Josh Chiou

ChIP-seq

ChIP-seq data

Last update: 07/21/2017
Maintained by: Josh Chiou

variant_annotation_matrix

Binary variant-by-annotaiton matrices.

Last update: 06/27/2017
Maintained by: Anthony Aylward

fgwas_workflow

Implementation of the workflow suggested by the FGWAS manual.

Last update: 08/10/2017
Maintained by: Anthony Aylward

ChIP-seq_imbalance

Allelic imbalance analysis and workflow for ChIP-seq data.

Last update: 09/7/2018
Maintained by: Anthony Aylward

infer_footprints

Infer TF binding footprints from DNase-seq data.

Last update: 09/24/2017
Maintained by: Mei-Lin Okino, Anthony Aylward

pipelines's People

Contributors

joshchiou avatar kjgaulton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pipelines's Issues

Fastq files

Hi, I was wondering how I can access the fastq files of snATAC-seq lung data. In GEO website of samples, it says that raw data are available in SRA but I couldn't see any SRA accession id. I also checked the website, https://www.lungepigenome.org/, there are only matrix, barcodes and regions information available.

Thank you in advance.

Unable to generate islet cell snATAC count matrix

Hello! I am trying to generate the count matrix for the islet cell snATAC data by running snATAC_pipeline.py with fastq files pulled from GEO.

Unfortunately I am unable to run the whole pipeline and keep getting an empty file for XXX.filt.md.bam. It seemed like an error with picard, so I manually ran the following command and received an error:

java -Xmx24G -jar picard.jar MarkDuplicates INPUT=SRR12957014.compiled.filt.bam OUTPUT=SRR12957014.filt.md.bam VALIDATION_STRINGENCY=LENIENT BARCODE_TAG=BX METRICS_FILE=SRR12957014.MarkDuplicates.log REMOVE_DUPLICATES=false

[Tue Feb 01 09:45:07 EST 2022] Executing as gaov@lx14 on Linux 3.10.0-957.12.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2022-02-01 09:45:08 MarkDuplicates Start of doWork freeMemory: 2036051352; totalMemory: 2058354688; maxMemory: 22906667008
INFO 2022-02-01 09:45:08 MarkDuplicates Reading input file and constructing read end information.
INFO 2022-02-01 09:45:08 MarkDuplicates Will retain up to 70699589 data points before spilling to disk.
WARNING 2022-02-01 09:45:08 AbstractOpticalDuplicateFinderCommandLineProgram A field field parsed out of a read name was expected to contain an integer and did not. Read name: SRR12957014.1.13818841_. Cause: String 'SRR12957014.1.13818841_' did not start with a parsable number.
[Tue Feb 01 09:45:08 EST 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: UMI found with illegal characters. UMIs must match the regular expression ^[ATCGNatcgn-]*$.
at picard.sam.markduplicates.UmiUtil.getTopStrandNormalizedUmi(UmiUtil.java:73)
at picard.sam.markduplicates.MarkDuplicates.buildReadEnds(MarkDuplicates.java:679)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:551)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:258)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I then checked the top 5 lines in XXX.compiled.bam file and got the following:

SRR12957015.1.50091124_ 99 chr1 10064 37 50M = 10307 288 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCC D?DDDIHIIIIIHIHCHHGHIHCHHIIIIHHHECCEHHI1<1<FH0D0<D NM:i:0 MD:Z:50 AS:i:50 XS:i:50 XA:Z:chr4,-191043979,50M,0;chr10,-135524462,50M,0;chr7,+10197,48M2S,0;chr11,+175738,50M,1;chr12,-95475,5S45M,0; MQ:i:37 MC:Z:5S45M BX:Z:SRR12957015.1.50091124
SRR12957015.1.50024188_ 163 chr1 10100 34 50M = 10270 221 CCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC DDDDDHIHHIIFIIHIIIIIIHIGHHIEHFHH1<FCFH@GF?HH1C<1DH NM:i:0 MD:Z:50 AS:i:50 XS:i:50 MQ:i:34 MC:Z:30M1D20M BX:Z:SRR12957015.1.50024188
SRR12957015.1.17592450_ 147 chr1 10153 40 36M1D14M = 10166 -38 ACCCTAACCCTAACCCTAACCCTAACCTAACCTTAACCTAACCTTAACCC CIHECCEIHIIIHHHHHEDIIIHIIHHHCHIIHHIHIHHHHHHHFDDDDD NM:i:3 MD:Z:32C3^C7C6 AS:i:33 XS:i:33 XA:Z:chrUn_gl000227,-73922,36M1D14M,3;chr10,-47667359,29M21S,0;chr17,+81195004,23S27M,0;chr20,+62918614,12M1D38M,5; MQ:i:53 MC:Z:50M BX:Z:SRR12957015.1.17592450
SRR12957015.1.17592450_ 99 chr1 10166 53 50M = 10153 38 CCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC DDDDCHIIIIIGIIIHHIIIIIIHHHIIIIIIIIIIIGHIIII1<FHH<1 NM:i:0 MD:Z:50 AS:i:50 XS:i:45 XA:Z:chr17,-81194997,50M,1;chr20,-62918608,6S44M,0; MQ:i:40 MC:Z:36M1D14M BX:Z:SRR12957015.1.17592450
SRR12957015.1.16969529_ 99 chr1 10251 35 50M = 10296 95 CCCTAAACCCTAACCCTAACCCTAACCCTAAACCCAACCCTAACCCTAAC DDDDDIIIIIIIIIIIIHIIIIIHIIIIIIIIGIHHIHHIHIHHHIHHIH NM:i:4 MD:Z:31C2T5C5C3 AS:i:31 XS:i:40 MQ:i:35 MC:Z:50M BX:Z:SRR12957015.1.16969529

The read names seem to be off. Could you please let me know how to resolve this?
If possible, could you please share the processed count matrix of the 3 patients?

Thank you so much in advance!

Vianne

Wasp issues

Hi,

Thanks for sharing the ChIP-Seq allelic imbalance scripts.

I am trying to use your pipeline, and I am having trouble to get WASP running.

Did you install WASP in your environment? If yes, could you give me a tip how you did that?
Also, would you have a requirements.txt file to setup the environment?

Thanks

Generating sparseMatrix!!!

Hi, I am using your [lung_snATAC_pipeline.py] to process some sciATAC-Seq data and I am facing issue towards the end of the pipeline while generating sparseMatrix.
Upon converting the barcode and regions as.numeric to make the sparseMatrix, they get converted to NA and I get the below error.

"Error in sparseMatrix(i = as.numeric(V1), j = as.numeric(V2), x = V3, :
NA's in (i,j) are not allowed"

Was wondering if you have faced this issue?

-Pushpinder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.