Coder Social home page Coder Social logo

y9c / pseudou-bidseq Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 2.0 11.9 MB

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)

Home Page: https://bidseq.chuan.science/

License: GNU General Public License v3.0

Dockerfile 3.20% Shell 8.91% Python 87.89%
rna pseudouridine modification single-base mrna rrna epitranscriptome epitranscriptomics

pseudou-bidseq's Introduction

Docker DOI

Ψ-BID-seq

Overview of the workflow

How to use it?

A docker image containing the source code and dependencies has been published for reproducibility. You can run it using the apptainer container runtime.

The entire analysis can be completed in just three steps:

  1. Specific the path of references (.fasta) and samples (.fastq) in a configure file (.YAML).

    data.yaml for example(Click to expand)
    reference:
      contamination:
        fa: ./ref/contamination.fa
      genes:
        fa: ./ref/genes.fa
      genome:
        fa: /data/reference/genome/Mus_musculus/GRCm39.fa
        star: /data/reference/genome/Mus_musculus/star/GRCm39.release108
    
    samples:
      mESCWT-rep1-input:
        data:
          - R1: ./test/IP16.fastq.gz
        group: mESCWT
        treated: false
      mESCWT-rep1-treated:
        data:
          - R1: ./test/IP4.fastq.gz
        group: mESCWT
        treated: true
      mESCWT-rep2-treated:
        data:
          - R1: ./test/IP5.fastq.gz
        group: mESCWT
        treated: true

    You can copy and edit from this template.

    Read the documentation on how to customize.

  2. Run all the analysis by one command:

    apptainer run docker://y9ch/bidseq
    The pipeline will load configure file named `data.yaml` under the current directory.(Click to expand)
    • Customized configure file with -c argument. (default: data.yaml)
    • Customized number of jobs/cores in parallel -j argument. (default: 48)
  3. View the analytics reports and filtered sites.

    3 folders are will be created in the working directory (default: `workspace`).(Click to expand) ├── align_bam ├── report_reads └── filter_sites
    • trimming, mapping, and deduping reports are in report_reads folder, with key numbers in all the steps reported in one webpage(example).
    • filtered sites for Ψ detection are in the filter_sites folder. These sites are only passed the simplest filtering, you can apply customized thresholds to them based on your data type and quality.
    • processed mapping results (.bam) are in align_bam folder. You can zoom into a location that you are interested in IGV.

Documentation

Read more

Citation

  • cite this software

    @misc{y_y9cpseudou-bidseq_2022,
      title = {y9c/{pseudoU}-{BIDseq}: v1.0},
      url = {https://zenodo.org/record/8158036},
      urldate = {2023-07-18},
      publisher = {Zenodo},
      author = {Ye, Chang},
      month = dec,
      year = {2022},
      doi = {10.5281/zenodo.8158036},
    }
  • cite the protocol

    @article{dai2023quantitative,
    title={Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution},
    author={Dai, Qing and Zhang, Li-Sheng and Sun, Hui-Lung and Pajdzik, Kinga and Yang, Lei and Ye, Chang and Ju, Cheng-Wei and Liu, Shun and Wang, Yuru and Zheng, Zhong and others},
    journal={Nature Biotechnology},
    volume={41},
    number={3},
    pages={344--354},
    year={2023},
    publisher={Nature Publishing Group US New York}
    }
  • cite the method

    @article{dai_quantitative_2022,
      title = {Quantitative sequencing using {BID}-seq uncovers abundant pseudouridines in mammalian {mRNA} at base resolution},
      issn = {1087-0156},
      doi = {10.1038/s41587-022-01505-w},
      journal = {Nature Biotechnology},
      author = {Dai, Qing and Zhang, Li-Sheng and Sun, Hui-Lung and Pajdzik, Kinga and Yang, Lei and Ye, Chang and Ju, Cheng-Wei and Liu, Shun and Wang, Yuru and Zheng, Zhong and Zhang, Linda and Harada, Bryan T. and Dou, Xiaoyang and Irkliyenko, Iryna and Feng, Xinran and Zhang, Wen and Pan, Tao and He, Chuan},
      year = {2022},
      pages = {1--11},
    }

 

Copyright © 2021-present Chang Y

pseudou-bidseq's People

Contributors

y9c avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pseudou-bidseq's Issues

Low mapping ratio and lost reads after realigngap and filter

Hi y9c!
I am trying your pseudoU-BIDseq pipeline. I found your workflow is very efficient and your coding is perfect.Nice work!
I have a question here. when I ran the pipeline with the mRNA samples from Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution, I found the mapping ratio for genome is very low(~30% for unique mapping and ~6 % for multimapping), and I have lost lots of reads after realignGap and samtools calmd, samtools view -e '[NM]<=5 && [NM]/(qlen-sclen)<=0.1', the average of lost reads is about 40%.
I wonder is this correct and what can I do?
Thanks you!

header of result

Hi @y9c , thanks very much!
i have got the final result. but i don't know what mean is the depth, gap,WT_ratio,WT_fraction. could you give me a detailed explanation ?
genome.zip

Detailed parameters of STAR

I'm sorry to bother you.
Can you provide your detailed parameters of STAR?
And I'm still confused about setting parameters of barcode. Because I have already cut the UMI sequence and barcodes. Do I still need to write "barcode: '-NNNNN'" in data.yaml?
Looking forward to your reply! I will really really really appreciate you!!
Best wishes

rcFastq

hello, i run you docker,
apptainer run -B /workplace bidseq_latest.sif

i produce error: /pipeline/bin/rcFastq: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /pipeline/bin/rcFastq)

can you update this glibc?

barcode setting

Hi y9c!
I have a question here. I use the R2 reads for your pipeline,and there are 8nt UMI and 6nt random sequence in the 5' end, and i have already trimmed the 3' end adapter, how should i set the barcode, is "NNNNNNNNXXXXXX-" work? Looking forward to your reply, thanks!

Filtering question

Hello, I recently used your BIDseq workflow and analysis pipeline - it is very straightforward to use, thank you for making it easily accessible!

I had one question about the filtering aspect of the pipeline. From what I can tell, the filter_sites contain sites that would be filtered out by the filter parameters published in Dai et al 2022 ((1) deletion rate above 5% (with deletion count above five in BID-seq libraries); (2) deletion rate below 1% in ‘Input’ libraries; (3) total reads coverage depth above 20 in both BID-seq and ‘Input’ libraries; (4) deletion rate above 1.5-fold over background in any given sequence motif (defined as the deletion rates detected from RNA probes containing 0% Ψ,). Would you recommend to use those parameters to further filter the output sites?

Thanks!

Error in rule gap_realign

Hi y9c!
I am trying your pseudoU-BIDseq pipeline using the SRRSRR15082607,SRR15082609 and SRR15082610.
And I have the following error when run the pipeline,

[Wed Jan 10 14:11:17 2024]
Error in rule gap_realign:
    jobid: 25
    output: .tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram
    shell:
        
        /pipeline/bin/realignGap -r /disk/user_18/jky_project/genome/Mesomycoplasma_hyorhinis_ATCC_17981/Mesomycoplasma_hyorhinis_ATCC_17981.fna -i .tmp/mapping_unsort/HeLaWT-rep1-treated_run1_contamination.bam -o .tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job gap_realign since they might be corrupted:
.tmp/mapping_realigned_unsorted/HeLaWT-rep1-treated_run1_contamination.cram

I wonder what can I do, thank you so much!

Duplicated reads level is high

Hi, @y9c !
I have a question here. I use the BID-pipe for my data, and the duplication level is 40%. I also use 'seqkit rmdup' by the sequence to calculate the duplication level, it's only 20%. I want to know how the BID-pipe calculate the duplication level and what's the difference from the seqkit rmdup. Looking forward to your reply. Thanks a lot!

samFilter: /lib64/libc.so.6: version `GLIBC_2.28' not found

Hi, may I ask what's the dependency for samFilter? The following error shows when executing the samFilter file under bin directory.
/lib64/libc.so.6: version GLIBC_2.28' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter) /lib64/libc.so.6: version GLIBC_2.18' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter)
/lib64/libc.so.6: version `GLIBC_2.25' not found (required by /users/ludwig/ebu571/ebu571/BIDseq/pseudoU-BIDseq-main/bin/samFilter).
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.