The pipelines from kjgaulton

Fastq files

Hi, I was wondering how I can access the fastq files of snATAC-seq lung data. In GEO website of samples, it says that raw data are available in SRA but I couldn't see any SRA accession id. I also checked the website, https://www.lungepigenome.org/, there are only matrix, barcodes and regions information available.

Thank you in advance.

Unable to generate islet cell snATAC count matrix

Hello! I am trying to generate the count matrix for the islet cell snATAC data by running snATAC_pipeline.py with fastq files pulled from GEO.

Unfortunately I am unable to run the whole pipeline and keep getting an empty file for XXX.filt.md.bam. It seemed like an error with picard, so I manually ran the following command and received an error:

java -Xmx24G -jar picard.jar MarkDuplicates INPUT=SRR12957014.compiled.filt.bam OUTPUT=SRR12957014.filt.md.bam VALIDATION_STRINGENCY=LENIENT BARCODE_TAG=BX METRICS_FILE=SRR12957014.MarkDuplicates.log REMOVE_DUPLICATES=false

[Tue Feb 01 09:45:07 EST 2022] Executing as gaov@lx14 on Linux 3.10.0-957.12.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2022-02-01 09:45:08 MarkDuplicates Start of doWork freeMemory: 2036051352; totalMemory: 2058354688; maxMemory: 22906667008
INFO 2022-02-01 09:45:08 MarkDuplicates Reading input file and constructing read end information.
INFO 2022-02-01 09:45:08 MarkDuplicates Will retain up to 70699589 data points before spilling to disk.
WARNING 2022-02-01 09:45:08 AbstractOpticalDuplicateFinderCommandLineProgram A field field parsed out of a read name was expected to contain an integer and did not. Read name: SRR12957014.1.13818841_. Cause: String 'SRR12957014.1.13818841_' did not start with a parsable number.
[Tue Feb 01 09:45:08 EST 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: UMI found with illegal characters. UMIs must match the regular expression ^[ATCGNatcgn-]*$.
at picard.sam.markduplicates.UmiUtil.getTopStrandNormalizedUmi(UmiUtil.java:73)
at picard.sam.markduplicates.MarkDuplicates.buildReadEnds(MarkDuplicates.java:679)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:551)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:258)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I then checked the top 5 lines in XXX.compiled.bam file and got the following:

SRR12957015.1.50091124_ 99 chr1 10064 37 50M = 10307 288 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCC D?DDDIHIIIIIHIHCHHGHIHCHHIIIIHHHECCEHHI1<1<FH0D0<D NM:i:0 MD:Z:50 AS:i:50 XS:i:50 XA:Z:chr4,-191043979,50M,0;chr10,-135524462,50M,0;chr7,+10197,48M2S,0;chr11,+175738,50M,1;chr12,-95475,5S45M,0; MQ:i:37 MC:Z:5S45M BX:Z:SRR12957015.1.50091124
SRR12957015.1.50024188_ 163 chr1 10100 34 50M = 10270 221 CCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC DDDDDHIHHIIFIIHIIIIIIHIGHHIEHFHH1<FCFH@GF?HH1C<1DH NM:i:0 MD:Z:50 AS:i:50 XS:i:50 MQ:i:34 MC:Z:30M1D20M BX:Z:SRR12957015.1.50024188
SRR12957015.1.17592450_ 147 chr1 10153 40 36M1D14M = 10166 -38 ACCCTAACCCTAACCCTAACCCTAACCTAACCTTAACCTAACCTTAACCC CIHECCEIHIIIHHHHHEDIIIHIIHHHCHIIHHIHIHHHHHHHFDDDDD NM:i:3 MD:Z:32C3^C7C6 AS:i:33 XS:i:33 XA:Z:chrUn_gl000227,-73922,36M1D14M,3;chr10,-47667359,29M21S,0;chr17,+81195004,23S27M,0;chr20,+62918614,12M1D38M,5; MQ:i:53 MC:Z:50M BX:Z:SRR12957015.1.17592450
SRR12957015.1.17592450_ 99 chr1 10166 53 50M = 10153 38 CCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC DDDDCHIIIIIGIIIHHIIIIIIHHHIIIIIIIIIIIGHIIII1<FHH<1 NM:i:0 MD:Z:50 AS:i:50 XS:i:45 XA:Z:chr17,-81194997,50M,1;chr20,-62918608,6S44M,0; MQ:i:40 MC:Z:36M1D14M BX:Z:SRR12957015.1.17592450
SRR12957015.1.16969529_ 99 chr1 10251 35 50M = 10296 95 CCCTAAACCCTAACCCTAACCCTAACCCTAAACCCAACCCTAACCCTAAC DDDDDIIIIIIIIIIIIHIIIIIHIIIIIIIIGIHHIHHIHIHHHIHHIH NM:i:4 MD:Z:31C2T5C5C3 AS:i:31 XS:i:40 MQ:i:35 MC:Z:50M BX:Z:SRR12957015.1.16969529

The read names seem to be off. Could you please let me know how to resolve this?
If possible, could you please share the processed count matrix of the 3 patients?

Thank you so much in advance!

Vianne

Wasp issues

Hi,

Thanks for sharing the ChIP-Seq allelic imbalance scripts.

I am trying to use your pipeline, and I am having trouble to get WASP running.

Did you install WASP in your environment? If yes, could you give me a tip how you did that?
Also, would you have a requirements.txt file to setup the environment?

Thanks

Visualization tools

Hi, I was wondering how the Figure 2a (https://doi.org/10.1038/s41586-021-03552-w) was generated. The annotation layer looks really nice. Thank you.

Generating sparseMatrix!!!

Hi, I am using your [lung_snATAC_pipeline.py] to process some sciATAC-Seq data and I am facing issue towards the end of the pipeline while generating sparseMatrix.
Upon converting the barcode and regions as.numeric to make the sparseMatrix, they get converted to NA and I get the below error.

"Error in sparseMatrix(i = as.numeric(V1), j = as.numeric(V2), x = V3, :
NA's in (i,j) are not allowed"

Was wondering if you have faced this issue?

-Pushpinder

kjgaulton / pipelines Goto Github PK

pipelines's Introduction

analytical tools

bulk_ATAC-seq

snATAC-seq

rare_variants

reweight_variants

ATAC-seq_footprinting

ChIP-seq

variant_annotation_matrix

fgwas_workflow

ChIP-seq_imbalance

infer_footprints

pipelines's People

Contributors

Stargazers

Watchers

Forkers

pipelines's Issues

Recommend Projects

Recommend Topics

Recommend Org