y9c / pseudou-bidseq Goto Github PK

View Code? Open in Web Editor NEW

11.0 1.0 2.0 11.9 MB

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)

Home Page: https://bidseq.chuan.science/

License: GNU General Public License v3.0

Dockerfile 3.20% Shell 8.91% Python 87.89%

rna pseudouridine modification single-base mrna rrna epitranscriptome epitranscriptomics

pseudou-bidseq's Introduction

Ψ-BID-seq

Overview of the workflow

How to use it?

A docker image containing the source code and dependencies has been published for reproducibility. You can run it using the apptainer container runtime.

The entire analysis can be completed in just three steps:

Specific the path of references (.fasta) and samples (.fastq) in a configure file (.YAML).

data.yaml for example^{(Click to expand)}

reference:
  contamination:
    fa: ./ref/contamination.fa
  genes:
    fa: ./ref/genes.fa
  genome:
    fa: /data/reference/genome/Mus_musculus/GRCm39.fa
    star: /data/reference/genome/Mus_musculus/star/GRCm39.release108

samples:
  mESCWT-rep1-input:
    data:
      - R1: ./test/IP16.fastq.gz
    group: mESCWT
    treated: false
  mESCWT-rep1-treated:
    data:
      - R1: ./test/IP4.fastq.gz
    group: mESCWT
    treated: true
  mESCWT-rep2-treated:
    data:
      - R1: ./test/IP5.fastq.gz
    group: mESCWT
    treated: true

You can copy and edit from this template.

Read the documentation on how to customize.

Run all the analysis by one command:
```
apptainer run docker://y9ch/bidseq
```
The pipeline will load configure file named `data.yaml` under the current directory.^{(Click to expand)}
- Customized configure file with -c argument. (default: data.yaml)
- Customized number of jobs/cores in parallel -j argument. (default: 48)
View the analytics reports and filtered sites.
3 folders are will be created in the working directory (default: `workspace`).^{(Click to expand)}
├── align_bam ├── report_reads └── filter_sites
- trimming, mapping, and deduping reports are in report_reads folder, with key numbers in all the steps reported in one webpage^(example).
- filtered sites for Ψ detection are in the filter_sites folder. These sites are only passed the simplest filtering, you can apply customized thresholds to them based on your data type and quality.
- processed mapping results (.bam) are in align_bam folder. You can zoom into a location that you are interested in IGV.

Documentation

Citation

cite this software

@misc{y_y9cpseudou-bidseq_2022,
  title = {y9c/{pseudoU}-{BIDseq}: v1.0},
  url = {https://zenodo.org/record/8158036},
  urldate = {2023-07-18},
  publisher = {Zenodo},
  author = {Ye, Chang},
  month = dec,
  year = {2022},
  doi = {10.5281/zenodo.8158036},
}

cite the protocol

@article{dai2023quantitative,
title={Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution},
author={Dai, Qing and Zhang, Li-Sheng and Sun, Hui-Lung and Pajdzik, Kinga and Yang, Lei and Ye, Chang and Ju, Cheng-Wei and Liu, Shun and Wang, Yuru and Zheng, Zhong and others},
journal={Nature Biotechnology},
volume={41},
number={3},
pages={344--354},
year={2023},
publisher={Nature Publishing Group US New York}
}

cite the method

@article{dai_quantitative_2022,
  title = {Quantitative sequencing using {BID}-seq uncovers abundant pseudouridines in mammalian {mRNA} at base resolution},
  issn = {1087-0156},
  doi = {10.1038/s41587-022-01505-w},
  journal = {Nature Biotechnology},
  author = {Dai, Qing and Zhang, Li-Sheng and Sun, Hui-Lung and Pajdzik, Kinga and Yang, Lei and Ye, Chang and Ju, Cheng-Wei and Liu, Shun and Wang, Yuru and Zheng, Zhong and Zhang, Linda and Harada, Bryan T. and Dou, Xiaoyang and Irkliyenko, Iryna and Feng, Xinran and Zhang, Wen and Pan, Tao and He, Chuan},
  year = {2022},
  pages = {1--11},
}

pseudou-bidseq's People

Contributors

Stargazers

Watchers

Forkers

chelab tdnguyen2020

pseudou-bidseq's Issues

Low mapping ratio and lost reads after realigngap and filter

Hi y9c!
I am trying your pseudoU-BIDseq pipeline. I found your workflow is very efficient and your coding is perfect.Nice work!
I have a question here. when I ran the pipeline with the mRNA samples from Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution, I found the mapping ratio for genome is very low(~30% for unique mapping and ~6 % for multimapping), and I have lost lots of reads after realignGap and samtools calmd, samtools view -e '[NM]<=5 && [NM]/(qlen-sclen)<=0.1', the average of lost reads is about 40%.
I wonder is this correct and what can I do?
Thanks you!

header of result

Hi @y9c , thanks very much!
i have got the final result. but i don't know what mean is the depth, gap,WT_ratio,WT_fraction. could you give me a detailed explanation ?
genome.zip

Detailed parameters of STAR

I'm sorry to bother you.
Can you provide your detailed parameters of STAR?
And I'm still confused about setting parameters of barcode. Because I have already cut the UMI sequence and barcodes. Do I still need to write "barcode: '-NNNNN'" in data.yaml?
Looking forward to your reply! I will really really really appreciate you!!
Best wishes

rcFastq

hello, i run you docker,
apptainer run -B /workplace bidseq_latest.sif

i produce error: /pipeline/bin/rcFastq: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /pipeline/bin/rcFastq)

can you update this glibc?

barcode setting

Hi y9c!
I have a question here. I use the R2 reads for your pipeline，and there are 8nt UMI and 6nt random sequence in the 5' end, and i have already trimmed the 3' end adapter, how should i set the barcode, is "NNNNNNNNXXXXXX-" work? Looking forward to your reply, thanks!

Filtering question

Hello, I recently used your BIDseq workflow and analysis pipeline - it is very straightforward to use, thank you for making it easily accessible!

I had one question about the filtering aspect of the pipeline. From what I can tell, the filter_sites contain sites that would be filtered out by the filter parameters published in Dai et al 2022 ((1) deletion rate above 5% (with deletion count above five in BID-seq libraries); (2) deletion rate below 1% in ‘Input’ libraries; (3) total reads coverage depth above 20 in both BID-seq and ‘Input’ libraries; (4) deletion rate above 1.5-fold over background in any given sequence motif (defined as the deletion rates detected from RNA probes containing 0% Ψ,). Would you recommend to use those parameters to further filter the output sites?

Thanks!

Error in rule gap_realign

Hi y9c!
I am trying your pseudoU-BIDseq pipeline using the SRRSRR15082607,SRR15082609 and SRR15082610.
And I have the following error when run the pipeline,

[Wed Jan 10 14:11:17 2024]
Error in rule gap_realign:
    jobid: 25
    output: .tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram
    shell:
        
        /pipeline/bin/realignGap -r /disk/user_18/jky_project/genome/Mesomycoplasma_hyorhinis_ATCC_17981/Mesomycoplasma_hyorhinis_ATCC_17981.fna -i .tmp/mapping_unsort/HeLaWT-rep1-treated_run1_contamination.bam -o .tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job gap_realign since they might be corrupted:
.tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram

I wonder what can I do, thank you so much!

Duplicated reads level is high

Hi, @y9c !
I have a question here. I use the BID-pipe for my data, and the duplication level is 40%. I also use 'seqkit rmdup' by the sequence to calculate the duplication level, it's only 20%. I want to know how the BID-pipe calculate the duplication level and what's the difference from the seqkit rmdup. Looking forward to your reply. Thanks a lot!

automatic program termination

Hi y9c!

I encountered two instances of automatic program termination. My command is nohup apptainer run docker://y9ch/bidseq & in the terminal. And I always keep the terminal on. The attatchment is the two log files.
2024-01-11T111942.665433.snakemake.log
2024-01-12T003946.169422.snakemake.log

I wonder what can I do, thank you so much!

samFilter: /lib64/libc.so.6: version `GLIBC_2.28' not found

Hi, may I ask what's the dependency for samFilter? The following error shows when executing the samFilter file under bin directory.
/lib64/libc.so.6: version GLIBC_2.28' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter) /lib64/libc.so.6: version GLIBC_2.18' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter)
/lib64/libc.so.6: version `GLIBC_2.25' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter).
Thanks.

y9c / pseudou-bidseq Goto Github PK

pseudou-bidseq's Introduction

Ψ-BID-seq

Overview of the workflow

How to use it?

Documentation

Citation

pseudou-bidseq's People

Contributors

Stargazers

Watchers

Forkers

pseudou-bidseq's Issues

Recommend Projects

Recommend Topics

Recommend Org