encode-dcc / wgbs-pipeline Goto Github PK

View Code? Open in Web Editor NEW

27.0 13.0 13.0 317.31 MB

ENCODE whole-genome bisulfite sequencing (WGBS) pipeline

License: MIT License

WDL 34.17% Python 61.90% Dockerfile 2.96% Shell 0.97%

bioinformatics wgbs methylation ngs pipeline

wgbs-pipeline's Introduction

wgbs-pipeline

Overview

An ENCODE pipeline for processing whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data using gemBS for alignment and methylation extraction.

Installation

Git clone this pipeline.

$ git clone https://github.com/ENCODE-DCC/wgbs-pipeline

Install Caper, requires java >= 1.8, docker, and python >= 3.6 . Caper is a python wrapper for Cromwell.
```
$ pip install caper  # use pip3 if it doesn't work
```
Follow Caper's README carefully to configure it for your platform (local, cloud, cluster, etc.)

IMPORTANT: Configure your Caper configuration file ~/.caper/default.conf correctly for your platform.

Usage

To verify your installation, you can run the following pipeline with a test data set by invoking the following command from the root of the cloned repository.

Note: this will incur some cost when running in cloud environments.

$ caper run wgbs-pipeline.wdl -i tests/functional/json/test_wgbs.json --docker

For detailed usage, see usage

Inputs

See inputs

Outputs

See outputs

Contributing

We welcome comments, questions, suggestions, bug reports, feature requests, and pull requests (PRs). Please use one of the existing Github issue templates if applicable. When contributing code, please follow the Developer Guidelines.

wgbs-pipeline's People

Contributors

Stargazers

Watchers

Forkers

karl616 lemonskiller ismailm procha2 zongchangli yzeng-lol shicheng-guo jakelehle berguner zm-git-dev khl0798 wangdi2016 syy3354

wgbs-pipeline's Issues

[bug] localization error when using Cromwell via Docker

This isn't an issue / bug proper, but rather just an issue that I can link in the documentation to explain the issue with using cromwell inside Docker. I'll open and close likely when I get it fleshed out fully:

Localization via hard link has failed: /cromwell-executions/wgbs/47d02aff-5371-478e-a412-54189e06b303/call-flatten_/inputs/-1474066501/flowcell_1_1_1.fastq.gz -> /opt/data/fastq/flowcell_1_1_1.fastq.gz: Invalid cross-device link

carry on! And thanks to @Bek for the help and pointing this out earlier!

File latency issue?

Describe the bug
The pipeline creates a file that it doesn't think exists. In the example below, one case of this would be /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz, which does exist when I check after the fact.

This is running on a compute cluster. I've repeated this multiple times, so it's not just a one-off error.

My first guess it that this is due to file system latency. If this were Snakemake, I'd know that I just needed to increase the value passed to --latency-wait, but I'm not sure how to set something like that for this pipeline.

OS/Platform

OS/Platform: Ubuntu 20.04.6 LTS
Pipeline version: 1.1.7
Caper version: 2.3.2

Caper configuration file

backend=slurm

# SLURM partition. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford Sherlock.
slurm-partition=serial

# SLURM account. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford SCG.
slurm-account=

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=

cromwell=/redacted/caper/2.3.2/jars/cromwell-82.jar
womtool=/redacted/caper/2.3.2/jars/womtool-82.jar

Input JSON file

{
  "wgbs.benchmark_mode": true,
  "wgbs.extra_reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_1_200000.fastq.gz",
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sacCer3.fa.gz",
  "wgbs.sample_names": [
    "sample5"
  ],
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}

Error log

2023-10-04 21:10:58,310|caper.cli|INFO| Cromwell stdout: /redacted/wgbs_test/cromwell.out.1
2023-10-04 21:10:58,315|caper.caper_base|INFO| Creating a timestamped temporary directory. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,315|caper.caper_runner|INFO| Localizing files on work_dir. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,686|caper.caper_workflow_opts|INFO| Singularity image found in WDL metadata. wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, s=docker://encodedcc/wgbs-pipeline:1.1.7
2023-10-04 21:10:58,706|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2023-10-04 21:11:04,509|caper.nb_subproc_thread|INFO| Subprocess finished successfully.
2023-10-04 21:11:04,510|caper.cromwell|INFO| Passed Womtool validation.            
2023-10-04 21:11:04,510|caper.caper_runner|INFO| launching run: wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, inputs=/redacted/wgbs_test/test_wgbs.json, backend_conf=/redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834/backend.co
2023-10-04 21:11:15,543|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Submitted
2023-10-04 21:11:15,605|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Running
2023-10-04 21:11:23,864|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_conf:-1, retry=0, status=CallCached
2023-10-04 21:11:26,814|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_metadata_csv:-1, retry=0, status=CallCached
2023-10-04 21:11:29,834|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.index_reference:-1, retry=0, status=CallCached
2023-10-04 21:11:35,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.prepare:-1, retry=0, status=CallCached
2023-10-04 21:11:44,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Started, job_id=2081286
2023-10-04 21:11:44,837|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Running
2023-10-04 21:11:51,943|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Done
2023-10-04 21:11:59,788|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Started, job_id=2081287
2023-10-04 21:11:59,796|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Running
2023-10-04 21:12:04,068|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Done
2023-10-04 21:12:05,042|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
2023-10-04 21:12:15,586|caper.cromwell_metadata|INFO| Wrote metadata file. /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/metadata.json
2023-10-04 21:12:15,587|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2023-10-04 21:12:15,589|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2023-10-04 21:12:15,589|caper.cli|ERROR| Check stdout in /redacted/wgbs_test/cromwell.out.1
* Started troubleshooting workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
* Found failures JSON object.                                                      
[                                                                                  
    {                                                                              
        "causedBy": [                                                              
            {                                                                      
                "message": "Job wgbs.map:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []                                                     
            }                                                                      
        ],                                                                         
        "message": "Workflow failed"                                               
    }                                                                              
]                                                                                  
* Recursively finding failures in calls (tasks)...                                 
                                                                                   
==== NAME=wgbs.map, STATUS=RetryableFailure, PARENT=                               
SHARD_IDX=0, RC=1, JOB_ID=2081286                                                  
START=2023-10-04T21:11:41.051Z, END=2023-10-04T21:11:54.791Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.json': No such file or directory
                                                                                   
                                                                                   
==== NAME=wgbs.map, STATUS=Failed, PARENT=                                         
SHARD_IDX=0, RC=1, JOB_ID=2081287                                                  
START=2023-10-04T21:11:55.035Z, END=2023-10-04T21:12:04.072Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.json': No such file or directory

A question about the differences between wgbs-pipeline and dna-me-pipeline

Hello, I found dna-me-pipeline and wgbs-pipeline from ENCODE, what are the differences between these 2 pipelines in terms of analysing WGBS data? Thank you so much! Best regards, Qianhui

[question] difference between verison 1 and version 2 wdl?

I see two wdl pipeline files:

$ ls wgbs-pipeline*
wgbs-pipeline-v2.wdl  wgbs-pipeline.wdl

along with the test.wdl (which I believe is the one I should use) and some other ones:

$ ls *.wdl
bs-call.wdl  mapping.wdl  test.wdl  wgbs-pipeline-v2.wdl  wgbs-pipeline.wdl

How do I know what these are, and which to use? I think if there were two clearly "main" pipelines in the folder, one named according to the repo name (wgbs-pipeline.wdl) and the other for testing "test.wdl") this would be intuitive. Can we discuss these other files and how they are used?

Cannot find 'main' module in '' when running installation test

Describe the bug
I was trying to install the ENCODE WGBS pipeline to analyze some Methyl-Seq data. Everything seemed fine until testing the installation with the command on the README:
caper run wgbs-pipeline.wdl -i tests/functional/json/test_wgbs.json

The pipeline runs and assigns a work ID but fails within the first or second steps giving the following error message:
/home/ubuntu/anaconda3/bin/python3: can't find '__main__' module in ''

Any insight on this would be greatly appreciated.

OS/Platform

OS/Platform: Ubuntu 16.04
Conda version: v4.9.2
Pipeline version: [e.g. v1.3.3] #GitHub project was cloned today 02/15/2021
Caper version: v1.4.2

Caper configuration file

backend=local

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=

cromwell=/home/ubuntu/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/ubuntu/.caper/womtool_jar/womtool-52.jar

Input JSON file
Contents of the test json: test_wgbs.json

{
  "wgbs.benchmark_mode": true,
  "wgbs.extra_reference": "tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "tests/data/sample5_data_1_200000.fastq.gz",
        "tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "tests/data/sacCer3.fa.gz",
  "wgbs.sample_names": [
    "sample5"
  ],
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}

Error log
Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a WORKFLOW_ID of your failed workflow with caper list or directly use a metadata.json file on Caper's output directory.

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

Output of caper debug metadata.json on the failed run

* Started troubleshooting workflow: id=6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41, status=Failed
* Found failures JSON object.
[
    {
        "message": "Workflow failed",
        "causedBy": [
            {
                "causedBy": [],
                "message": "Job wgbs.make_conf:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job wgbs.make_metadata_csv:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ]
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=wgbs.make_conf, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=45272
START=2021-02-15T20:57:45.333Z, END=2021-02-15T20:57:58.862Z
STDOUT=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_conf/execution/stdout
STDERR=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_conf/execution/stderr
STDERR_CONTENTS=
/home/ubuntu/anaconda3/bin/python3: can't find '__main__' module in ''


==== NAME=wgbs.make_conf, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=45345
START=2021-02-15T20:58:01.217Z, END=2021-02-15T20:58:14.905Z
STDOUT=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_conf/attempt-2/execution/stdout
STDERR=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_conf/attempt-2/execution/stderr
STDERR_CONTENTS=
/home/ubuntu/anaconda3/bin/python3: can't find '__main__' module in ''


==== NAME=wgbs.make_metadata_csv, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=45292
START=2021-02-15T20:57:47.212Z, END=2021-02-15T20:57:58.862Z
STDOUT=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_metadata_csv/execution/stdout
STDERR=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_metadata_csv/execution/stderr
STDERR_CONTENTS=
/home/ubuntu/anaconda3/bin/python3: can't find '__main__' module in ''


==== NAME=wgbs.make_metadata_csv, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=45364
START=2021-02-15T20:58:03.205Z, END=2021-02-15T20:58:17.405Z
STDOUT=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_metadata_csv/attempt-2/execution/stdout
STDERR=/home/ubuntu/20210215_IPI_MethylSeq_TestRun/software/wgbs-pipeline/wgbs/6ce87d1d-27ca-4f38-8fbd-bec0ac88cc41/call-make_metadata_csv/attempt-2/execution/stderr
STDERR_CONTENTS=
/home/ubuntu/anaconda3/bin/python3: can't find '__main__' module in ''

make_conf not found: pipeline fails on test run

TypeError in wgbs.map Step

Describe the bug
At the wgbs.map step, I get a TypeError:

==== NAME=wgbs.map, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=9608
START=2021-10-15T03:04:21.289Z, END=2021-10-15T03:09:23.663Z
STDOUT=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stdout
STDERR=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
:
: Command map started at 2021-10-14 20:08:16.349606
:
: ------------ Mapping Parameters ------------
: Sample barcode : sample_1
: Data set : 1
: No. threads : 8
: Index : indexes/hg38.BS.gem
: Paired : False
: Read non stranded: False
: Type : SINGLE
: Input Files : ./fastq/1/Control_S1_L004_R2_001.fastq.gz
: Output dir : ./mapping/sample_1
:
: Bisulfite Mapping...
TypeError: sequence item 14: expected str instance, NoneType found
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping//*.bam': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping//.csi': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/**/.bam.md5': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/**/*.json': No such file or directory

How can I resolve this error?

OS/Platform

OS/Platform: Linux 7 CentOS on a Slurm-managed HPC
Conda version: conda 4.10.3
Pipeline version: 1.1.7
Caper version: 1.6.3

Caper configuration file
default.conf.txt

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

cromwell.out.txt

Input JSON File
json_input.txt

used software of your workflow

Thanks for your contribution, I want to refer to your process to implement it step by step, but I can't find the detailed use method, software and workflow，such as smtools、bismark、or piard???

Make sure to check that your docker daemon is running before trying to run the pipeline or it will fail

Describe the bug
Hello, I'm trying to bring the ENCODE pipelines to my school but I'm working my way through learning how to use them. I'm getting a better hang of understanding what to do with .wdl and input json files.
I've been playing around with cloning the wgbs-pipeline and running the sample code as described in the readme file which was working last week. However, this week I recently started getting an error message when I tried running the pipeline from scratch.

Here is the code I ran
$ git clone https://github.com/ENCODE-DCC/wgbs-pipeline.git
$ pip3 install caper
$ caper run wgbs-pipeline.wdl -i tests/functional/json/test_wgbs.json --docker

The caper failure id is 0b655180-c147-4524-bed5-92d854487050
I tried running the debug on that error code but had an issue connecting to the caper server.

OS/Platform

OS/Platform: Ubuntu 20.04.2 LTS,
Conda version: 4.10.3
Pipeline version: v1.1.7
Caper version: 2.1.1

Caper configuration file

Input JSON file

name: test_wgbs
tags:
- functional
  command: >-
  tests/caper_run.sh
  wgbs-pipeline.wdl
  tests/functional/json/test_wgbs.json
  files:
- path: test-output/gembs.conf
  md5sum: 1ad8f25544fa7dcb56383bc233407b54
- path: test-output/gembs_metadata.csv
  md5sum: 9dd5d3bee6e37ae7dbbf4a29edd0ed3f
- path: test-output/indexes.tar.gz
  md5sum: bde2c7f6984c318423cb7354c4c37d44
- path: test-output/gemBS.json
  md5sum: d6ef6f4d2ee7e4c3d2e8c439bb2cb618
- path: test-output/glob-e97d885c83d966247d485dc62b6ae799/sacCer3.contig.sizes
  md5sum: 0497066e3880c6932cf6bde815c42c40
- path: test-output/glob-c8599c0b9048b55a8d5cfaad52995a94/sample_sample5.bam
  md5sum: fc1d87ed4f9f7dab78f58147c02d06c9
- path: test-output/glob-e42c489c9c1355a3e5eca0071600f795/sample_sample5.bam.csi
  md5sum: b37be1c10623f32bbe73c364325754b0
- path: test-output/glob-13824b1e03fdcf315fe2424593870e56/sample_sample5.bam.md5
  md5sum: 56f31539029eab274ff0ac03e84e214a
- path: test-output/glob-c0e92e4e9fb050e7e70bb645748b45dd/0.json
  md5sum: cbf4ba8d84384779c626b298a9a60b96
- path: test-output/glob-1e6c456aecc092f75370b54a5806588f/sample_sample5.bcf
  md5sum: 7cafb436b89898e852f971f1f3b20fc6
- path: test-output/glob-804650e4b0c9cc57f1bbc0b3919d1f73/sample_sample5.bcf.csi
  md5sum: 156c39bd2bcc0bc83052eb4171f83507
- path: test-output/glob-95d24e89d025dc63935acc9ded9f8810/sample_sample5.bcf.md5
  md5sum: 8ebe942fe48b07e1a3455f572aadf57b
- path: test-output/glob-0b0236659b9524643e6454061959b28c/sample_sample5_pos.bw
  md5sum: c30bc10a258ca4f1fe67f115c4c2db10
- path: test-output/glob-041e1709c7dd1f320426281eb4649f9b/sample_sample5_neg.bw
  md5sum: f49ab06a51d9c4a8e663f0472e70eb06
- path: test-output/glob-708835e6a0042d33b00b6937266734f5/sample_sample5_chg.bb
  md5sum: cc123bff807e0637864d387628d410fa
- path: test-output/glob-f70a6609728d4fb1448dba1f41361a30/sample_sample5_cpg.bb
  md5sum: f67f273c68577197ecf19a8bb92c925c
- path: test-output/glob-52f916d7cc14a5bcfb168d6910e04b56/sample_sample5_chg.bed.gz
  md5sum: 6994fafbf9eab44ff6e7fafa421fffbc
- path: test-output/glob-2b5148d6967b43eee33eb370fd36b70e/sample_sample5_chh.bed.gz
  md5sum: 31a69396b7084520c04bd80f2cabfd59
- path: test-output/glob-f31cca1fcab505e10c2fe5ff003b211a/sample_sample5_cpg.bed.gz
  md5sum: 67193e21cc34b76d12efba1a19df6644
- path: test-output/glob-b76cddd256e1197e0b726acc7184afc4/sample_sample5_cpg.txt.gz
  md5sum: fccd9c9c5b4fea80890abd536fd76a35
- path: test-output/glob-24ceb385eea2ca53f1e6c4a1438ccd21/sample_sample5_cpg.txt.gz.tbi
  md5sum: a1e08686f568af353e9026c1de00c25d
- path: test-output/glob-40c90aa4516b00209d682b819b1d021f/sample_sample5_non_cpg.txt.gz
  md5sum: 6809ee8479439454aa502ae11f48d91c
- path: test-output/glob-664ff83c3881df2363da923f006b098b/sample_sample5_non_cpg.txt.gz.tbi md5sum: 6cdebb4ad2ea184ca4783acb350ae038
- path: test-output/gembs_map_qc.json
  md5sum: 26b5238ab7bb5b195d1cf8127767261c
- path: test-output/gembs_map_qc.json
  md5sum: 26b5238ab7bb5b195d1cf8127767261c
- path: test-output/glob-65c481a690a62b639d918bb70927f25e/sample_sample5.isize.png
  md5sum: f0277a185298dee7156ec927b02466c7
- path: test-output/mapping_reports/mapping/sample_sample5.mapq.png
  md5sum: 49837be15f24f23c59c50241cf504614
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/ENCODE.html
  md5sum: 163401e0bf6a377c2a35dc4bf9064574
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/0.html
  md5sum: 604b6f1a4c641b7d308a4766d97cadb7
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/sample_sample5.html
  md5sum: b86ce9e15ab8e1e9c45f467120c22649
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/0.isize.png
  md5sum: 82a262d0bf2dadb02239272da490bba1
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/0.mapq.png
  md5sum: d8c3af2eae5af12f1eb5dd9ec4e225bb
- path: test-output/glob-1aeed469ae5d1e8d7cbca51e8758b781/style.css
  md5sum: a09ae01f70fa6d2461e37d5814ceb579
- path: test-output/coverage.bw
  md5sum: afa224c2037829dccacea4a67b6fa84a
- path: test-output/average_coverage_qc.json
  md5sum: 82ce31e21d361d52a7f19dce1988b827
- path: test-output/bed_pearson_correlation_qc.json
  should_exist: false

Error log

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

Thank you so much!
Best,
Jake Lehle

failed with test data

OS/Platform

OS/Platform: Ubuntu 20.04.3 LTS
Conda version: conda 4.11.0
Pipeline version: v1.1.8
Caper version: v2.1.3

Caper configuration file

backend=local

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs/lsf) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Metadata DB for call-caching (reusing previous outputs):
# Cromwell supports restarting workflows based on a metadata DB
# DB is in-memory by default
#db=in-memory

# If you use 'caper server' then you can use one unified '--file-db'
# for all submitted workflows. In such case, uncomment the following two lines
# and defined file-db as an absolute path to store metadata of all workflows
#db=file
#file-db=

# If you use 'caper run' and want to use call-caching:
# Make sure to define different 'caper run ... --db file --file-db DB_PATH'
# for each pipeline run.
# But if you want to restart then define the same '--db file --file-db DB_PATH'
# then Caper will collect/re-use previous outputs without running the same task again
# Previous outputs will be simply hard/soft-linked.


# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/mnt/storage/hong/caper

cromwell=/home/hong/.caper/cromwell_jar/cromwell-65.jar
womtool=/home/hong/.caper/womtool_jar/womtool-65.jar

Input JSON file

{
  "wgbs.benchmark_mode": true,
  "wgbs.extra_reference": "tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "tests/data/sample5_data_1_200000.fastq.gz",
        "tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "tests/data/sacCer3.fa.gz",
  "wgbs.sample_names": [
    "sample5"
  ],
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}

* Found failures JSON object.
[
    {
        "message": "Workflow failed",
        "causedBy": [
            {
                "message": "Job wgbs.make_conf:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job wgbs.make_metadata_csv:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            }
        ]
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=wgbs.make_conf, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=588529
START=2022-02-24T14:20:39.912Z, END=2022-02-24T14:20:52.057Z
STDOUT=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/execution/stdout
STDERR=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/execution/stderr
STDERR_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/execution/'

STDERR_BACKGROUND_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/execution/'



==== NAME=wgbs.make_conf, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=592187
START=2022-02-24T14:20:54.103Z, END=2022-02-24T14:21:01.920Z
STDOUT=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/attempt-2/execution/stdout
STDERR=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/attempt-2/execution/stderr
STDERR_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/attempt-2/execution/'

STDERR_BACKGROUND_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_conf/attempt-2/execution/'



==== NAME=wgbs.make_metadata_csv, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=588545
START=2022-02-24T14:20:40.106Z, END=2022-02-24T14:20:52.057Z
STDOUT=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/execution/stdout
STDERR=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/execution/stderr
STDERR_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/execution/'

STDERR_BACKGROUND_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/execution/'



==== NAME=wgbs.make_metadata_csv, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=593365
START=2022-02-24T14:20:56.102Z, END=2022-02-24T14:21:03.969Z
STDOUT=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/attempt-2/execution/stdout
STDERR=/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/attempt-2/execution/stderr
STDERR_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/attempt-2/execution/'

STDERR_BACKGROUND_CONTENTS=
/home/hong/anaconda3/envs/wgbs/bin/python3: can't find '__main__' module in '/home/hong/wgbs-pipeline/wgbs/9cd9270a-b1e7-45cb-a706-260cd3685f1c/call-make_metadata_csv/attempt-2/execution/'


2022-02-24 15:21:17,695|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2022-02-24 15:21:17,695|caper.cli|ERROR| Check stdout in /home/hong/wgbs-pipeline/cromwell.out.3

Update to gemBS v3

Hello,

As far as I can tell you created this pipeline based on gemBS v2. Are you planning to update it for gemBS v3?

Best,
Bekir

encode-dcc / wgbs-pipeline Goto Github PK

wgbs-pipeline's Introduction

wgbs-pipeline

Overview

Installation

Usage

Inputs

Outputs

Contributing

wgbs-pipeline's People

Contributors

Stargazers

Watchers

Forkers

wgbs-pipeline's Issues

Input JSON file

Recommend Projects

Recommend Topics

Recommend Org