ianmisner avatar ianmisner commented on June 17, 2024

I should note that I'm not running this on a cluster just a Linux box with 32 CPUs.

biocoder avatar biocoder commented on June 17, 2024

Did you use params.yaml file? Can you share with me your command call and / or params.yaml?


ianmisner avatar ianmisner commented on June 17, 2024

Yes I used the yaml file here was my command and my file.

perl lncRNApipe --conf Pence_18_SEP_15params.yaml
biocoder avatar biocoder commented on June 17, 2024

Ok. I know what the problem is. For now use the updated params.yaml. The problem was that the code was expecting schedulerOpts: and clusterOpts: and since you were not running it on the cluster, and since you may have removed them, it was failing. I just mentioned them but they will commented out in the actual job script. I fill fix the code and release later.

# This is the configuration file, where you can specify the command line
# options for each of the modules of lncRNApipe. If you do not wish to
# run any of the modules, simply, disable them and all the lines following
# it (if, any) by prefixing it with #
# Specify output directory.
# In the example below, a  new output directory called "lncRNApipe_test"
# will be created at the mentioned path /Users/krantikonganti/Projects/scratch_test.
outputDir: /home/imisner/Desktop/new-lncRNApipe2

# Indicate, if we should overwrite the output directory if it already exists.
overwriteOutputDir: YES

# Let lncRNApipe know if you intend to perform transcript assembly
# using tophat / cufflinks or just identify ncRNAs from already
# assembled transcripts.
performTranscriptAssembly: YES

# Specify number of threads / CPUs to use where possible.
# If running on a grid, each job script will run with this many
# number of CPUs.
CPUs: 20

# Specify scheduler type.
# Valid options are PBS, SGE, LSF or NONE to disable grid computing.
scheduler: NONE

# Mention batch submission command.
# For example, for LSF, it is bsub, for PBS or SGE, it
# is qsub.
#batchSubCmd: qsub

# Specify scheduler options on separate line
# specific to your job running environment.
# ********************* !! IMPORTANT !! ***********************
# *************************************************************
# Different clusters uses different job parameter names. For
# example our cluster uses num_threads as parameter name to
# request number of CPUs in SGE. Some clusters may use
# num_cpu. Since it is difficult to guess, please provide
# number of CPUs you want to use based on your grid environment
# below, which is equal to "CPUs:" option above. Yes, we know that
# this is kind of redundant but it is necessary evil. What ever
# job parameters you provide after the - , they will appear
# exactly in the job script.
# *************************************************************
# *************************************************************
# Ex for PBS (job parameters start with a PBS directive):
# PBS -V
# PBS -l nodes=1:ppn=16,walltime=24:00:00
# Ex for SGE (job parameters start with $ sign):
# $ -N lncRNApipe
# $ -l num_threads=4
 - # none

 - # none

# Specify if your reads are in FASTA or FASTQ format.
readType: FASTQ

# Specify if your reads are SE (single-end only), PE (paired-end only) or MIXED
libType: PE

# Specify if you want to use ENSEMBL or UCSC as source for annotation.
# "PLEASE MAKE SURE" that you are using the same sourceDB for tophat
# alignment, i.e. bowtie indexes created from the same sourceDB for the
# same assembly version. As downloding genome indices on-the-fly is time
# consuming and since many users may have genome indices installed for
# various NGS analysis anyways, we leave it to you to provide "CORRECT"
# genome index below in tophat and cufflinks options' configuration.
sourceDB: UCSC

# Choose assembly version so that we can download up-to-date annotation on the fly.
# You can choose to supply your own annotation file for consistency below.
# ENSEMBL and UCSC represents species name differently in the URLs.
# To view species names for ENSEMBL do, "perl lncRNApipe --list ENSEMBL".
# Go to to view UCSC version names.
# For example, at UCSC, for mouse, it is mm9 or mm10, for rat, it is rn5 or rn6 etc...
species: mm10

# If you want to use your own annoation file, provide full path to the annotation
# file of your choice. This option overrides "sourceDB: " and "species: " options above.
# Again, "PLEASE MAKE SURE", you are providing the same genome index from the
# same source. In the example below, "bowtieGenomeIndex" is rn4 genome index created
# from FASTA files from UCSC.
#useThisAnnotation: /data/ref_annotation/rn4_UCSC/genes.gtf

# Provide Unix path to directory where the read files are located and also
# provide read file names.
# If your data is just SE, then disable "r2: " below by prefixing it with a #.
# Separate replicates of sample by a comma. Separate different samples by a |.
 readsDir: /home/imisner/4Big/Pence/Raw_reads
 r1: EP1_ACAGTG_L001_R1_pe.fastq,EP2_GTGAAA_L001_R1_pe.fastq,EP3_CGATGT_L002_R1_pe.fastq|PC1_GCCAAT_L001_R1_pe.fastq,PC2_AGTCAA_L001_R1_pe.fastq,PC3_TGACCA_L002_R1_pe.fastq|XE1_CTTGTA_L001_R1_pe.fastq,XE2_AGTTCC_L001_R1_pe.fastq,XE3_CAGATC_L002_R1_pe.fastq
 r2: EP1_ACAGTG_L001_R2_pe.fastq,EP2_GTGAAA_L001_R2_pe.fastq,EP3_CGATGT_L002_R2_pe.fastq|PC1_GCCAAT_L001_R2_pe.fastq,PC2_AGTCAA_L001_R2_pe.fastq,PC3_TGACCA_L002_R2_pe.fastq|XE1_CTTGTA_L001_R2_pe.fastq,XE2_AGTTCC_L001_R2_pe.fastq,XE3_CAGATC_L002_R2_pe.fastq

# If you want to use TRIMMOMATIC to trim reads, then provide
# TRIMMOMATIC options. TRIMMOMATIC provides the following adapter
# files: NexteraPE-PE.fa, TruSeq2-PE.fa, TruSeq2-SE.fa, TruSeq3-PE-2.fa,
# TruSeq3-PE.fa and TruSeq3-SE.fa. Choose one below or provide full path
# to the adapter sequence file you want to use.
# No need to provide -threads, as it will be handled by lncRNApipe.
# - ILLUMINACLIP:TruSeq3-SE.fa:2:30:10
# - LEADING:20
# - MINLEN:25

# Provide bowtie genome index, full path to genome multi FASTA and tophat options.
# "PLEASE MAKE SURE" that you are using the same genome indices for the sourceDB
# mentioned above. Do not use UCSC genome indices if you requested ENSEMBL above in
# "sourceDB: " or vice versa.
# In the example below, it is assumed that you have already created a transcriptome
# index. If you have not created one, go through tophat manual on how to just create
# transcriptome index ( If you
# use "--trancriptome-index" below and the index does not exist, it will keep overwriting
# while the jobs are running.
# No need to provide FASTQ files, as they will be handled by lncRNApipe.
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
#ftp://igenome:[email protected]/Mus_musculus/UCSC/mm10/Mus_musculus_UCSC_mm10.tar.gz
genomeFasta: /home/imisner/Datasets/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa
bowtieGenomeIndex: /home/imisner/Datasets/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome
 - -x 1
 - -M
 - --b2-sensitive
 - --no-coverage-search

# Provide cufflinks options and also "PLEASE MAKE SURE" you provide the same reference
# FASTA for the sourceDB you want to use in case of bias correction.
# No need to provide -g, as it will be handled by lncRNApipe.
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
# No need to provide input files, as they will be handled by lncRNApipe.
 - -u
 - -b /home/imisner/Datasets/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa

# Provide cuffcompare options.
# ******************************** !! IMPORTANT !! ******************************
# *******************************************************************************
# If you only want to run lncRNApipe without the transcript assembly stage, then
# provide assembled transcripts in GTF format, otherwise, no options are needed.
# *******************************************************************************
# *******************************************************************************
# No need to provide -r, as it will be handled by lncRNApipe.
# Provide "-i transcript_assembly_list.txt" if you did not run assembly stage,
# where transcript_assembly_list.txt contains full path to assembled transcript
# files.
 #- -i /data/cufflinks/assembly_list.txt

# Provide options
# See "perl lncRNApipe -h cat" for description of options.
 - -sample-names "EP,PC,XE"
 - -len 200
 - -min-exons 1
 - -ov 80
 - -inc

# Provide options.
# Generally only "-ov" is required in either
# case of [performTranscriptAssembly: YES] or
# [performTranscriptAssembly: NO].
# When you supply your own known ncRNAs file to
# compare against with "-sf", then you MUST also
# specify it's file format (Ex: gtf or bed) with
# "-sff" option.
# See "perl lncRNApipe --h get" for description of options.
 - -ov 80
# - -sf test
# - -sff gtf

# By, default, RNAfold is not run since it is very slow.
# It is generally recommended to generate RNAfold plots
# based on the transcript of your interest after you have
# investigated the results, but you can still enable it
# in the pipeline by uncommenting "runRNAfold: YES" to
# run RNAfold with default options.
# To pass command line options to RNAfold, define it's
# after line "RNAfold:"
# Mention any other option other than "-p" and "--noPS" as they
# are automatically handled by lncRNApipe.
# See "perl lncRNApipe -h rna" for description of options.
runRNAfold: YES
# - --circ

# Provide options to cmscan. Running cmscan with
# default options provides good matches in
# most cases, but in any case you want to add
# extra options, do it here.
# To run cmscan with default options, use
# "runcmscan: YES".
# Add additional options you want to pass to cmscan
# after line "cmscan:".
# Provide any options other than "-o", "--tblout"
# and "--cpu" as they are automatically handled by
# lncRNApipe.
# See "perl lncRNApipe -h inf" for description of options.
runcmscan: YES
# - -E 9.0

# Provide cuffmerge options.
# If you have replicates, final predicted lncRNAs will be merged.
# No need to provide -s, as it will be handled by lncRNApipe.
# No need to provide -g, as it will be handled by lncRNApipe.
 - --min-isoform-fraction 0.05

# Provide cuffdiff options if you want to run differential expression
# tests between known ncRNAs and between novel ncRNAs in your samples.
# Normally, cuffdiff is only run on identified known and novel ncRNAs.
# If you want to run cuffdiff on all transcripts, i.e to identify
# differentially expressed transcripts (end of typical tuxedo pipeline),
# then change "runCuffdiffForAllTranscripts" to YES.
# If you have mentioned "-sample-names " above in "categorize_ncRNAs: ",
# no need to mention -L, else provide -L option here.
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
# No need to provide input files, if you have run tophat, cufflinks above.
# If "performTranscriptAssembly" is NO, then provide your own BAM files here prefixed by
# -bam option.
# Separate replicate BAM files with comma as you generally do with cuffdiff command.
runCuffdiffForAllTranscripts: YES
 - -u
 - -b /home/imisner/Datasets/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa
  # - -bam /data/aligned_rn4_UCSC/sample1/rep1.bam,/data/aligned_rn4_UCSC/sample1/rep2.bam /data/aligned_rn4_UCSC/sample2/rep1.bam,/data/aligned_rn4_UCSC/sample2/rep2.bam

ianmisner avatar ianmisner commented on June 17, 2024

With your config file it get this error:

[imisner@bioinformatics lncRNApipe]$ perl lncRNApipe --conf Pence_21_SEP_15params.yaml 

Mon Sep 21 12:37:59 2015        Validating options...

Mon Sep 21 12:37:59 2015        Starting ☲☴ lncRNApipe Pipeline...

No value(s) provided for [batchSubCmd: ]. Please use only QSUB or BSUB or SBATCH.

Please check your config file [ Pence_21_SEP_15params.yaml ] and provide correct value(s).

ianmisner avatar ianmisner commented on June 17, 2024

Ok I uncommented the

#batchSubCmd: qsub

line and now its running. Thanks for the help!!!

biocoder avatar biocoder commented on June 17, 2024

Nice. I created new release fixing the issues you were having.

biocoder avatar biocoder commented on June 17, 2024


Did the pipeline work? I identified a bug where the software keeps on waiting until cufflinks job actually starts. I have fixed it in the latest version of the software v1.0.7

