nextflow-io / nf-validation Goto Github PK

View Code? Open in Web Editor NEW

47.0 5.0 19.0 3.01 MB

Params validation plugin for Nextflow pipelines

Home Page: https://nextflow-io.github.io/nf-validation/

License: Apache License 2.0

Groovy 99.01% Makefile 0.71% Shell 0.06% HTML 0.23%

json-schema nextflow plugin samplesheet validation

nf-validation's Introduction

A Nextflow plugin to work with validation of pipeline parameters and sample sheets.

Important

`nf-validation` has now been renamed to `nf-schema`.

The nf-validation plugin will not receive any future updates. Please update your pipelines to use nf-schema instead.

See https://github.com/nextflow-io/nf-schema for details.

This change was necessary to prevent older versions of nf-core pipelines from with unpinned plugin references from breaking when updating to the latest version of nf-validation. Please pin the version of nf-schema in your pipeline's nextflow.config file:

plugins { id '[email protected]' }

Introduction

This Nextflow plugin provides a number of functions that can be included into a Nextflow pipeline script to work with parameter and sample sheet schema. Using these functions you can:

📖 Print usage instructions to the terminal (for use with --help)
✍️ Print log output showing parameters with non-default values
✅ Validate supplied parameters against the pipeline schema
📋 Validate the contents of supplied sample sheet files
🛠️ Create a Nextflow channel with a parsed sample sheet

Supported sample sheet formats are CSV, TSV and YAML (simple).

Quick Start

Declare the plugin in your Nextflow pipeline configuration file:

plugins {
  id 'nf-validation'
}

This is all that is needed - Nextflow will automatically fetch the plugin code at run time.

Note

The snippet above will always try to install the latest version, good to make sure that the latest bug fixes are included! However, this can cause difficulties if running offline. You can pin a specific release using the syntax [email protected]

You can now include the plugin helper functions into your Nextflow pipeline:

include { validateParameters; paramsHelp; paramsSummaryLog; fromSamplesheet } from 'plugin/nf-validation'

// Print help message, supply typical command line usage for the pipeline
if (params.help) {
   log.info paramsHelp("nextflow run my_pipeline --input input_file.csv")
   exit 0
}

// Validate input parameters
validateParameters()

// Print summary of supplied parameters
log.info paramsSummaryLog(workflow)

// Create a new channel of metadata from a sample sheet
// NB: `input` corresponds to `params.input` and associated sample sheet schema
ch_input = Channel.fromSamplesheet("input")

Dependencies

Java 11 or later
https://github.com/everit-org/json-schema

Slack channel

There is a dedicated nf-validation Slack channel in the Nextflow Slack workspace.

Credits

This plugin was written based on code initially written within the nf-core community, as part of the nf-core pipeline template.

We would like to thank the key contributors who include (but are not limited to):

Júlia Mir Pedrol (@mirpedrol)
Nicolas Vannieuwkerke (@nvnieuwk)
Kevin Menden (@KevinMenden)
Phil Ewels (@ewels)

nf-validation's People

Contributors

Stargazers

Watchers

Forkers

rpetit3 jorgeaguileraseqera nvnieuwk robsyme mashehu mynahub koenbossers adamrtalbot louislenezet lukbut awgymer mattheww95 wackero fbdtemme giangtools ctuni nextflow-io

nf-validation's Issues

YAML format samplesheets are not properly vaildated

Discovered as a side effect of #120

yaml format samplesheets are currently not actually vaildated by the validateFile:

nf-validation/plugins/nf-validation/src/main/nextflow/validation/SchemaValidator.groovy

Lines 188 to 198 in ac5812a

    
           if(fileType == "yaml"){ 
        
               fileContent = new Yaml().load((samplesheetFile.text)) 
        
           } 
        
           else { 
        
               Map types = variableTypes(schemaFile.toString(), baseDir) 
        
               fileContent = samplesheetFile.splitCsv(header:true, strip:true, sep:delimiter) 
        
               fileContentCasted = castToType(fileContent, types) 
        
           } 
        
           if (validateFile(false, samplesheetFile.toString(), fileContentCasted, schemaFile.toString(), baseDir, s3PathCheck)) { 
        
               log.debug "Validation passed: '$samplesheetFile' with '$schemaFile'" 
        
           }

and

nf-validation/plugins/nf-validation/src/main/nextflow/validation/SchemaValidator.groovy

Lines 416 to 426 in ac5812a

    
           if(fileType == "yaml"){ 
        
               fileContent = new Yaml().load((file_path.text)) 
        
           } 
        
           else { 
        
               Map types = variableTypes(schema_name, baseDir) 
        
               fileContent = file_path.splitCsv(header:true, strip:true, sep:delimiter) 
        
               fileContentCasted = castToType(fileContent, types) 
        
           } 
        
           if (validateFile(useMonochromeLogs, key, fileContentCasted, schema_name, baseDir, s3PathCheck)) { 
        
               log.debug "Validation passed: '$key': '$file_path' with '$schema_name'" 
        
           }

This doesn't throw an error but an empty list is passed to validateFile in the yaml case instead of the fileContent.

The fix should be as simple as switching to fileContentCasted = new Yaml().load((samplesheetFile.text)).

Add support for excel samplesheets

Add a link to the slack channel in the docs

I think the docs would really benefit from a link to the Slack channel

When validating if a path exists, trap exception and add context

I'm using v0.1.0 of the validation plugin, running in NF tower on AWS. It's a somewhat complex environment and I don't own the Tower or AWS setups, and am running into frequent AWS access issues.

When an input parameter to the pipeline is a directory or file path, the plugin validates that those files exist. However, if there are any issues such as a lack of permissions, no routable network, etc. the plugin just bubbles the exception right up. And at least in the case of AWS S3 paths, you get a cryptic exception with a request ID, but no information about what path/URL was being accessed.

It would be nice for the plugin to trap any exceptions during file existence checks, and turn them into validation errors that include i) the full path being accessed and ii) the details of the exception. This way users could get errors for all paths at once, and easily see which paths are causing issues.

Support for simple arrays in YAML samplesheets

It would be great to be able to supply an arbitrary length array of files that is associated with just one other file like so:

---
- id: sample1
  normal: /path/to/some/file.bam
  tumour:
    - /path/to/tumour1.bam
    - /path/to/tumour2.bam
  cancer_type: aml

And then I would want to create some channel like: [[id: sample1], [normal_bam], [tumour_bams]]

In a perfect world that would be possible via fromSamplesheet but for now I would be happy with just having the samplesheet schema validation.

Add a new `exists` key

This key would tell the plugin to check if the files/directories specified by the corresponding parameters or samplesheet fields should be checked for existence.

This key would replace the file-path-exists, directory-path-exists and path-exists formats (since these would have to be added downstream to the tooling and to Tower.

Validation Error when trying to run test

Hi, I'm trying to implement the smrnaseq pipeline on an HPC but have been getting this error:

ERROR ~ a fault occurred in an unsafe memory access operation

Full terminal error:
nextflow run nf-core/smrnaseq -profile test,conda --outdir /gpfs/Cores/GSC/share/05_Scripts/nf-core
N E X T F L O W ~ version 23.04.3
Launching https://github.com/nf-core/smrnaseq [lonely_volhard] DSL2 - revision: nf-core/smrnaseq@18d6c84 [master]
ERROR ~ a fault occurred in an unsafe memory access operation

-- Check '.nextflow.log' file for details

nextflow version 23.04.03
HPC
Conda
Linux
nextflow.log

Specifying empty params (e.g. clusterOptions in profile) breaks with nf-validation 0.3.1

When running one of the pipelines using nf-validation with 0.3.1 and specifying the UPPMAX profile, it fails with an error:

pontus@lalage:/tmp$ nextflow run nf-core/atacseq -r 2.1.1 -profile uppmax
N E X T F L O W  ~  version 23.04.2
Launching `https://github.com/nf-core/atacseq` [nostalgic_euler] DSL2 - revision: 415795d3c1 [2.1.1]
ERROR ~ JSONObject["project"] not found.

 -- Check script '/home/pontus/.nextflow/assets/nf-core/atacseq/main.nf' at line: 52 or see '.nextflow.log' file for more details
pontus@lalage:/tmp$

after some digging, it seems to me this is because the profile sets that up in params, although with a null value.

It looks to me that is enough to have it seen by nf-validation, but when nf-validation tries to report it (

nf-validation/plugins/nf-validation/src/main/nextflow/validation/SchemaValidator.groovy

Line 315 in 01a3131

warnings << "* --${specifiedParam}: ${paramsJSON[specifiedParam]}".toString()

) it's not found in paramsJSON because that's based on the cleaned version (

nf-validation/plugins/nf-validation/src/main/nextflow/validation/SchemaValidator.groovy

Line 271 in 01a3131

def paramsJSON = new JSONObject(new JsonBuilder(cleanedParams).toString())

) which has removed anything evaluating to false.

A workaround (that makes sense) for now is to ignore the affected param, but this seems to be a bug that can be difficult to figure out and would be nice to have fixed.

When a input file is optional, `validateParameters()` should not raise an error

Problem

Consider the situation where a pipeline parameter is defined with an associated schema, such as the standard situation with the following input parameter in nextflow_schema.json:

                "input": {
                    "type": "string",
                    "format": "file-path",
                    "mimetype": "text/csv",
                    "pattern": "^\\S+\\.csv$",
                    "schema": "assets/schema_input.json",
                    "description": "Path to comma-separated file containing information about the samples in the experiment.",
                    "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.",
                    "fa_icon": "fas fa-file-csv"
                },

When this input parameter is optional, and nextflow calls the pipeline without defining params.input, the function validateParameters() throws an error:

ERROR ~ Argument of `file` function cannot be null

Desired behavior: no error is raised.

Versions:

[email protected]
Nextflow: 23.04.1

Nextflow log

Jun-28 17:45:18.871 [main] DEBUG nextflow.script.IncludeDef - Loading included plugin extensions with names: [validateParameters:validateParameters, paramsHelp:paramsHelp, paramsSummaryLog:paramsSummaryLog, fromSamplesheet:fromSamplesheet]; plugin Id: nf-validation
Jun-28 17:45:18.886 [main] DEBUG nextflow.validation.SchemaValidator - Starting parameters validation
Jun-28 17:45:18.979 [main] WARN  nextflow.validation.SchemaValidator - The following invalid input values have been detected:

* --schema_ignore_params: genomes


Jun-28 17:45:18.998 [main] DEBUG nextflow.Session - Session aborted -- Cause: Argument of `file` function cannot be null
Jun-28 17:45:19.007 [main] ERROR nextflow.cli.Launcher - Argument of `file` function cannot be null
java.lang.IllegalArgumentException: Argument of `file` function cannot be null
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:72)
        at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:59)
        at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277)
        at nextflow.Nextflow.file(Nextflow.groovy:99)
        at nextflow.Nextflow.file(Nextflow.groovy)
        at nextflow.validation.SchemaValidator.validateParameters(SchemaValidator.groovy:361)
        at nextflow.validation.SchemaValidator.validateParameters(SchemaValidator.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at nextflow.script.FunctionDef.invoke_a(FunctionDef.groovy:64)
        at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
        at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:102)
        at groovy.lang.GroovyObject$invokeMethod.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
        at nextflow.script.BaseScript.invokeMethod(BaseScript.groovy:141)
        at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:68)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
        at Script_d3adfc12.runScript(Script_d3adfc12:27)
        at nextflow.script.BaseScript.run0(BaseScript.groovy:145)
        at nextflow.script.BaseScript.run(BaseScript.groovy:192)
        at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229)
        at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:224)
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:130)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:368)
        at nextflow.cli.Launcher.run(Launcher.groovy:494)
        at nextflow.cli.Launcher.main(Launcher.groovy:653)

validateParameters raises warning when nextflow schema contains nested parameters

This issue is likely tied to the others involving nested parameters but it is worth documenting. If a nextflow_schema.json file contains a group of nested parameters you will get a warning involving access to undefined parameters even if those parameters are initialized.

A minimal example of this issue can be found here: https://github.com/mattheww95/bug_report

Support multiple input files via globbing or patterns

Nextflow has quite nice features for turning globs and patterns to channels of multiple files. E.g.:

Channel.fromPath("path/*.bam")                 // [ path/file1.bam, path/file2.bam ]
Channel.fromPath("path/{sample1,sample2}.bam") // [ path/sample1.bam, path/sample2.bam ]

I don't believe the schema supports this currently, except as a string. It would be helpful to support this to allow people to use the nf-validation tools for files but with the options of >1 file. It could coerce the object into a list of file objects or a channel, or leave it as a string so it can be passed to Channel.fromPath or Channel.fromFilePairs.

I'm not sure what the exact implementation would be @ewels suggested two options:

A new format option, e.g.: format: filePathPattern
An additional options for allowing multiple: allowMultiple: true

The use case would be people who want to run an analysis on >1 file but do not want to use a samplesheet, which carries a pretty significant overhead for simple pipelines.

ImmutableMap clashes with the `-resume` flag

When resuming a pipeline with ImmutableMap meta, following warnings are given:

WARN: [CMGG_CMGGSTRUCTURAL:CMGGSTRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (1)] Unable to resume cached task -- See log file for details
WARN: [CMGG_CMGGSTRUCTURAL:CMGGSTRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (4)] Unable to resume cached task -- See log file for details
WARN: [CMGG_CMGGSTRUCTURAL:CMGGSTRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (2)] Unable to resume cached task -- See log file for details
WARN: [CMGG_CMGGSTRUCTURAL:CMGGSTRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (3)] Unable to resume cached task -- See log file for details
WARN: [CMGG_CMGGSTRUCTURAL:CMGGSTRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (5)] Unable to resume cached task -- See log file for details

With this as error in .nextflow.log:

com.esotericsoftware.kryo.KryoException: Unable to find class: nextflow.validation.ImmutableMap
	at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
	at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
	at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
	at com.esotericsoftware.kryo.Kryo$readClassAndObject$8.call(Unknown Source)
	at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:181)
	at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
	at nextflow.util.KryoHelper$deserialize.call(Unknown Source)
	at nextflow.processor.TaskContext.deserialize(TaskContext.groovy:202)
	at nextflow.cache.CacheDB.getTaskEntry(CacheDB.groovy:88)
	at nextflow.processor.TaskProcessor.checkCachedOrLaunchTask(TaskProcessor.groovy:770)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:203)
	at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:618)
	at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:52)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
	at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
	at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: nextflow.validation.ImmutableMap
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:467)
	at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
	... 29 common frames omitted

Possible solution: try to find a way to make ImmutableMap available to the main flow

Feature request: allow quotes strings containing commas in .fromSamplesheet() for .csv files

At this moment, it is not possible to create a channel from a .csv file with the .fromSamplesheet() function if the .csv file contains a quoted field that contains a comma.

id,path,description
sample1,reads.sample1.fq.gz,"no problems here"
sample2,reads.sample2.fq.gz,"oh no, I cause issues"

The line in row 2 will be split on the no, as well, which is not the desired behavior.

It should be easy to add this functionality by adding the appropriate parameter value quote:true (https://www.nextflow.io/docs/latest/operator.html#splitcsv) to the following line:

nf-validation/plugins/nf-validation/src/main/nextflow/validation/SamplesheetConverter.groovy

Line 75 in b040ebb

    
           samplesheetList = fileSamplesheet.splitCsv(header:true, strip:true, sep:delimiter)

But perhaps it is even a good idea to allow all .splitCsv parameters to be passed through as well?

Default values of `0` (integer type) evaluate to false, causing `null` to be used instead of `0` during the samplesheet conversion

https://nfcore.slack.com/archives/C056RQB10LU/p1691317883677639

AWS error when iGenomes path is defined

Hi,

I'm getting the following error when a default iGenomes S3 path is defined:

ERROR ~ The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 73HW8YCDVSYJPDES; S3 Extended Request ID: 4USexpHHlbBBFqNCoadx9LEGKuMBhWsUfrdZFAZUaYVnC3bPDGWfoC8EaEY82Cr70hlgxjc9xrs=; Proxy: null)

I suppose this has something to do with the plugin trying to stage the data before all configs are loaded or something.

Have you encountered this before and what was the solution?
I tested this on a newly generated workflow.

Cheers
Matthias

When parsing samplesheets, group values into ordered lists

It's a common pattern in nf-core modules to use an ordered list instead of tuple elements. i.e. instead of:

input:
    tuple val(meta), path(input1), path(input2)

You use:

input:
    tuple val(meta), path(inputs)

We could incorporate some method of parsing the schema to use in .fromSamplesheet, e.g.:

{
    // rest of schema above
    "lists": [
        "fastq_1",
        "fastq_2"
    ]
}

Would coerce to an ordered list, with nullable values not being included.

Any chance of making release notes on GitHub for recent (and future) releases?

I, and I'm sure others, would greatly appreciate release notes with official releases of the plugin. Given the policy/practice of merging PRs without squashing it is really hard for an outsider to look through the commit history and see what's changed between versions.

Or a CHANGELOG.md or a section in the README.

The link in the readme is broken

When open the link https://nextflow-io.github.io/nf-validation this page is displayed

Move params.monochrome_logs to local scope

Currently, functions like paramsSummaryLog uses params.monochrome_logs to decide whether to make the logs coloured. I think it's cleaner to move this to local scope and provide it as an input to the function. E.g., from:

// params.monochrom_logs = true
paramsSummaryLog()

to:

// params.monochrom_logs = true
paramsSummaryLog(monochrome = params.monochrome_logs)

This would allow pipeline developers to be more flexible about how to handle these things and not rely on a global object. For all we know, someone is out there studying monochrome logs with Nextflow and is frustrated by the confusing parameter.

ImmutableMap becomes a LinkedHashMap after passing through a process

=> Is probably a nextflow issue, but I need to investigate this further

Very slow validation for single-end fastq samplesheets

Hello. I have been working on how to make use of the fromSamplesheet function to validate a samplesheet using the assets/schema_input.json file and create a channel of input data. However, I have been encountering a large difference in the time it takes to validate a samplesheet containing paired-end fastq files and single-end fastq files when using nf-validator version 0.3.1.

I have written up a method to reproduce the issue at https://github.com/apetkau/from-samplesheet-test-nf, but in brief you can run:

# Get total runtime of pipeline on a samplesheet
time nextflow run apetkau/from-samplesheet-test-nf -r main --input https://raw.githubusercontent.com/apetkau/from-samplesheet-test-nf/main/samplesheet.pe.30.csv

# Get time for validating samplesheet against schema_input.json from logs
grep 'Starting validation' -A1 .nextflow.log

You can replace the samplesheet.pe.30.csv with the listed files in the below table to run the other cases.

Type	Samplesheet	Number of samples	Total runtime for example pipeline	Time for validating samplesheet with `schema_input.json`
Paired-end	samplesheet.pe.30.csv	30	5 seconds	< 1 second
Paired-end	samplesheet.pe.60.csv	60	6 seconds	< 1 second
Single-end	samplesheet.se.30.csv	30	30 seconds	25 seconds
Single-end	samplesheet.se.60.csv	60	387 seconds	382 seconds

That is, validating the samplesheet against the schema_input.json file appears roughly constant as samples increase for paired-end samplesheets, but for single-end samplesheets going from 30 to 60 samples increases the validation time by a factor of 15x.

I am wondering if someone could help me to sort out this issue?

Thanks so much. And thanks for the amazing software. It's helped me out in my work 😄

Make the meta map immutable

The meta map should be made immutable to force pipeline developers to avoid in place map modifications according to the bytesize on meta map handling

Samplesheet fields that are not defined in samplesheet schema should be silently ignored

Problem

When using Channel.fromSamplesheet , I get warnings when there are fields present in a samplesheet that are not defined in the associated json schema:

WARN: Unable to cast value dummy_flowcell to type null: java.lang.NullPointerException: Cannot invoke "String.toLowerCase()" because "type" is null
WARN: Unable to cast value BC01 to type null: java.lang.NullPointerException: Cannot invoke "String.toLowerCase()" because "type" is null

My understanding is that these undefined fields should be silently ignored

Versions:

[email protected]
Nextflow: 23.04.1

Input & schema definitions

Samplesheet: https://raw.githubusercontent.com/koenbossers/testdata/main/nextflow_precisionfoodsafety/test_samplesheet_minimal.csv

The fields barcode_short and flowcell_id_short are not defined, and those values trigger the warnings.

Relevant section of nextflow_schema.json

    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["input", "outdir"],
            "properties": {
                "input": {
                    "type": "string",
                    "format": "file-path",
                    "mimetype": "text/csv",
                    "pattern": "^\\S+\\.csv$",
                    "schema": "assets/schema_input.json",
                    "description": "Path to comma-separated file containing information about the samples in the experiment.",
                    "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.",
                    "fa_icon": "fas fa-file-csv",
                    "exists": true
                },

Contents of assets/schema_input.json

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/lcab/precisionfoodsafety/master/assets/schema_input.json",
    "title": "lcab/precisionfoodsafety pipeline - params.input schema",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "sample": {
                "type": "string",
                "pattern": "^\\S+$",
                "errorMessage": "Sample name must be provided and cannot contain spaces",
                "meta": ["sample_id"],
                "unique": ["flowcell_id", "barcode"]
            },
            "flowcell_id": {
                "type": "string",
                "pattern": "^\\S+$",
                "errorMessage": "Sample name must be provided and cannot contain spaces",
                "meta": ["flowcell_id"]
            },
            "barcode": {
                "type": "string",
                "pattern": "^barcode[0-9][0-9]",
                "errorMessage": "Must be provided and of form 'barcode[00..99]",
                "meta": ["barcode"]
            },
            "location": {
                "anyOf": [
                    {
                        "type": "string",
                        "format": "directory-path",
                        "exists": true
                    },
                    {
                        "type": "string",
                        "format": "file-path",
                        "exists": true,
                        "pattern": "^\\S+\\.f(ast)?q[\\.gz]{0,}$"
                    }
                ]
            }
        },
        "required": ["sample", "flowcell_id", "barcode", "location"]
    }
}

Is there a way to express in the schema that a file-path is required, but the path it points to is allowed to not exist?

As best I can tell, when specifying an input in my schema as a file-path or directory-path, the plugin will validate that any path given already exists. Is there a way to specify for a given parameter that the target may not exist?

Note that this is different than required/not-required. This comes up mostly for output paths, where the path may not exist yet and the pipeline is expected to create it.

Default to report an error if the rows in a sample sheet are not unique

Processing the same sample twice is usually not desired. We should add a new functionality which throws an error if rows in a sample sheet are not unique.
Add a new flag to disable this behaviour.
A sample sheet can contain additional entries which are ignored if they aren't present in the JSON schema. We should also ignore them and apply the unique only for the combination of all entries present in the schema.

Use standard JSON schema syntax for `dependentRequried`

In our sample sheet schema we write:

"sample": {
    "dependentRequired": ["fastq_1"]
}

While standard JSON schema is:

"dependentRequired": {
    "sample": ["fastq_1"]
 }

We should stick to the standard syntax.

boolean entries in parameter schema should always default to false if unspecified

Appears to be a bug in the plugin for automatically setting boolean values to false if they aren't defined in the schema.

Following on from convo here in Slack

All boolean parameters that don't have a default specified e.g.: here

Are now being printed to screen indicating they are different from the default values set by the pipeline:

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnaseq v3.13.0dev
------------------------------------------------------
Core Nextflow options
  runName                         : small_dubinsky
  containerEngine                 : docker
  launchDir                       : /home/harshil/repos/nf-core/rnaseq
  workDir                         : /home/harshil/repos/nf-core/rnaseq/work
  projectDir                      : /home/harshil/repos/nf-core/rnaseq
  userName                        : harshil
  profile                         : test,docker
  configFiles                     : 

Input/output options
  input                           : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv
  outdir                          : ./results
  save_merged_fastq               : false

UMI options
  with_umi                        : false
  skip_umi_extract                : false
  umitools_bc_pattern             : NNNN
  umitools_dedup_stats            : false
  save_umi_intermeds              : false

Read filtering options
  bbsplit_fasta_list              : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/bbsplit_fasta_list.txt
  save_bbsplit_reads              : false
  skip_bbsplit                    : false
  remove_ribo_rna                 : false
  save_non_ribo_reads             : false

Reference genome options
  fasta                           : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genome.fasta
  gtf                             : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gtf.gz
  gff                             : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gff.gz
  transcript_fasta                : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/transcriptome.fasta
  additional_fasta                : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/gfp.fa.gz
  hisat2_index                    : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/hisat2.tar.gz
  rsem_index                      : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/rsem.tar.gz
  salmon_index                    : https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/salmon.tar.gz
  gencode                         : false
  save_reference                  : false
  igenomes_ignore                 : false

Read trimming options
  skip_trimming                   : false
  save_trimmed                    : false

Alignment options
  pseudo_aligner                  : salmon
  bam_csi_index                   : false
  star_ignore_sjdbgtf             : false
  stringtie_ignore_gtf            : false
  save_unaligned                  : false
  save_align_intermeds            : false
  skip_markduplicates             : false
  skip_alignment                  : false
  skip_pseudo_alignment           : false

Process skipping options
  skip_bigwig                     : false
  skip_stringtie                  : false
  skip_fastqc                     : false
  skip_dupradar                   : false
  skip_qualimap                   : false
  skip_rseqc                      : false
  skip_biotype_qc                 : false
  skip_deseq2_qc                  : false
  skip_multiqc                    : false
  skip_qc                         : false

Institutional config options
  config_profile_name             : Test profile
  config_profile_description      : Minimal test dataset to check pipeline function

Max job request options
  max_cpus                        : 2
  max_memory                      : 6.GB
  max_time                        : 6.h

Generic options
  help                            : false
  version                         : false
  plaintext_email                 : false
  monochrome_logs                 : false
  validationShowHiddenParams      : false
  validationFailUnrecognisedParams: false
  validationLenientMode           : false

This wasn't the case before when all of the schema validation code was in the pipeline template in lib/.

To add to this we don't have default values in the schema for anything else that evaluates to false in Groovy e.g. null , '' , 0 so we need to fix this for boolean values for consistency.

Add more file options as formats

Currently, all files are checked for existence, but that is not always ideal.

To solve this I propose changing the supported formats from file-path and directory-path to these six formats:

file-path: Simply state that this is a file without checking its existence
file-path-exists: Also check for the existence of this file
directory-path: Simply state that this is a directory without checking its existence
directory-path-exists: Also check for the existence of this directory
path: Simply state that this is a file or directory without checking its existence
path-exists: Also check for the existence of this file or directory

"exists" should only check if a path exists if the value is provided

When a parameter is not provided but "exists" is true, the plugin checks the default returning an error:
* The file or directory 'null' does not exist.

Handle no header samplesheet

cf https://raw.githubusercontent.com/nf-core/test-datasets/fetchngs/sra_ids_test.csv

Warning for undefined parameter `help`

With Nextflow 23.10.1 I get the following warning:

WARN: Access to undefined parameter `help` -- Initialise it to a default value eg. `params.help = some_value`

Where and how should I initialize help parameter?

Add support for JSON samplesheets

Include specific error incorrect schema location fromSamplesheet

Specifying an existing path to schema_filename by including $projectDir gives a cryptic error message. It would be good if the error message gave more relevant information, like removing that part.

For instance,

ch_samplesheet = Channel.fromSamplesheet(
        "input",
        schema_filename: "$projectDir/assets/schema_input.json"
    )

Will throw this error message :

...

ERROR ~ Cannot invoke "java.util.Map.containsKey(Object)" because "samplesheetValue" is null

...

Meta issue

When no meta fields are in the JSON schema, an empty map will be created. This is obviously not meant to happen and should be fixed

Example in the QuickStart section has a syntax error

I copy/pasted the example in the quick start to get up and running, and immediately got a NF error on the include line. Editing it to use semi-colons instead of commas, as follows, worked for me:

include { validateParameters; paramsHelp; paramsSummaryMap; paramsSummaryLog; validateAndConvertSamplesheet } from 'plugin/nf-validation'

Update validator library to erosb/json-sKema

The library that we use for schema validation https://github.com/everit-org/json-schema is deprecated and should be replaced by https://github.com/erosb/json-sKema#installation

This new library solves some bugs reported though slack, thanks @sralchemab for bringing this up!

Unexpected error when a parameter isn't in the schema

When a parameter has been set in the nextflow.config file, but not yet in the schema, the pipeline will fail with this weird error (project is the name of the parameter added to the config):

N E X T F L O W  ~  version 23.04.0
Launching `main.nf` [sick_boltzmann] DSL2 - revision: bfa6b06e83
ERROR ~ JSONObject["project"] not found.

I think it would be best if a warning is returned instead stating that the parameter isn't in the schema instead of failing the pipeline entirely. (@mirpedrol?)

Example config and/or make sure all plugin params are treated as expected

It was relatively easy to get the plugin up and running, but I got a bunch of NF warnings about access to undefined params, for params the plugin used (e.g. help). I thought it would be easy to resolve this, but it was more confusing and time consuming that I thought.

What I ended up with is the following in my config:

params.schema_ignore_params = "help,monochrome_logs,validationLenientMode,validationFailUnrecognisedParams"
params.monochrome_logs = false
params.help = false
params.validationLenientMode = true
params.validationFailUnrecognisedParams = false

Originally I had thought that schema_ignore_params took a list, e.g.:

params.schema_ignore_params =["help" ,"monochrome_logs", "validationLenientMode","validationFailUnrecognisedParams"]

However, when I did that I got cryptic error messages on the console like:

org.json.JSONException: JSONObject["help"] not found.

Digging in the NF log file I found:

org.json.JSONException: JSONObject["help"] not found.
        at org.json.JSONObject.get(JSONObject.java:587)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:1956)
        at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:3843)
        at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:199)
        at org.codehaus.groovy.runtime.DefaultGroovyMethods.getAt(DefaultGroovyMethods.java:421)
        at nextflow.validation.SchemaValidator.validateParameters(SchemaValidator.groovy:206)
        at nextflow.validation.SchemaValidator.validateParameters(SchemaValidator.groovy)
        ...

It looks to me like in SchemaValidator that it should also add help and monochrome_logs to the set of expectedParams?

I think it would also be helpful to have an example config snipped like the above in the README, and/or make it clear that schema_ignore_params is intended to be a CSV string.

Support JSON schema draft 2020-12 only

This has been mostly completed in #141
I created this issue for tracking purposes

`number` type in `fromSamplesheet()` doesn't work for floats

When a float is supplied and a number type has been used in the template, following error occurs:

java.lang.NumberFormatException: For input string: "0.01"
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
	at java.base/java.lang.Integer.parseInt(Integer.java:668)
	at java.base/java.lang.Integer.valueOf(Integer.java:999)
	at org.codehaus.groovy.runtime.StringGroovyMethods.toInteger(StringGroovyMethods.java:3134)
	at org.codehaus.groovy.runtime.StringGroovyMethods.asType(StringGroovyMethods.java:192)
	at nextflow.extension.Bolts.asType(Bolts.groovy:447)
	at jdk.internal.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
	at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1024)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1015)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodN(ScriptBytecodeAdapter.java:180)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.asType(ScriptBytecodeAdapter.java:603)
	at nextflow.validation.SamplesheetConverter.transform(SamplesheetConverter.groovy:272)
	at nextflow.validation.SamplesheetConverter.access$0(SamplesheetConverter.groovy)
	at nextflow.validation.SamplesheetConverter$_convertToList_closure4.doCall(SamplesheetConverter.groovy:167)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:428)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.collect(DefaultGroovyMethods.java:3601)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.collect(DefaultGroovyMethods.java:3586)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.collect(DefaultGroovyMethods.java:3686)
	at nextflow.validation.SamplesheetConverter.convertToList(SamplesheetConverter.groovy:87)
	at nextflow.validation.SchemaValidator.fromSamplesheet(SchemaValidator.groovy:202)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at nextflow.plugin.extension.ChannelFactoryInstance.invoke0(ChannelFactoryInstance.groovy:68)
	at nextflow.plugin.extension.ChannelFactoryInstance.invokeExtensionMethod(ChannelFactoryInstance.groovy:82)
	at nextflow.plugin.extension.PluginExtensionProvider.invokeFactoryExtensionMethod(PluginExtensionProvider.groovy:289)
	at nextflow.Channel.$static_methodMissing(Channel.groovy:81)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1572)
	at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1560)
	at groovy.lang.DelegatingMetaClass.invokeStaticMethod(DelegatingMetaClass.java:154)
	at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:50)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
	at Script_92adc13a$_runScript_closure1$_closure3.doCall(Script_92adc13a:329)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:102)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:408)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:350)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
	at Script_c605a15e$_runScript_closure1$_closure3.doCall(Script_c605a15e:74)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:102)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:408)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:350)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
	at Script_c605a15e$_runScript_closure2$_closure4.doCall(Script_c605a15e:88)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
	at nextflow.script.ChainableDef$invoke_a.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at nextflow.script.BaseScript.run0(BaseScript.groovy:183)
	at nextflow.script.BaseScript.run(BaseScript.groovy:192)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229)
	at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:224)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:130)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:368)
	at nextflow.cli.Launcher.run(Launcher.groovy:494)
	at nextflow.cli.Launcher.main(Launcher.groovy:653)

The problem is that nf-validation also tries to cast a number type as an Integer, which isn't correct.

Parameter validation interface

The aim of this project is to define a common interface for Nextflow pipeline parameters validation based on the nf-core JSON schema that can be applied to any pipeline without the need to programmatically apply it in the pipeline code.

In the current validation, method signature takes the params map the JSON schema file and return an array on invalid params.

nf-validation/src/main/nextflow/validation/NfcoreSchema.groovy

Line 19 in 7423ffa

private static ArrayList validateParameters(params, jsonSchema) {

@KevinMenden @ewels How are you using the returned array?

Use camelCase for validation parameters

The validation plugin uses the parameters:

show_hidden_params
fail_unrecognised_params
lenient_mode

Considering Nextflow follows the camelCase pattern for parameters instead of underscore separation, it would be nicer to use camelCase also for those.

As related notes, it may be convenient to prefix those params with a common prefix e.g. validationParams to prevent potential conflicts with the pipeline parameters

Is this an official plugin, yet?

@pditommaso , or anyone else, is this going to be an official plugin at some point? If it already is, are the instructions for using it from just the nextflow command? I don't see this plugin in the list, here:
https://github.com/nextflow-io/plugins/blob/main/plugins.json

I could try to make a docs submission if this is officially supported, and I get a little direction.

feature request: check existance of format 'file-path' fields in samplesheet

I am using nf-validation 0.3.1 and nextflow version 23.04.1.5866

Unless I am mistaken, adding the "file-path" format does not cause that file to be checked for existence. I believe this is what happens when the type in the NF schema is a "file-path".

For example, if I put this in the schema_input.json:

{
...
            },
            "fastq_1": {
                "type": "string",
                "format": "file-path",
                "pattern": "^\\S+\\.f(ast)?q\\.gz$",
                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
            },
           ...
}

fromSamplesheet correctly parses the samplesheet, but doesn't check that the path in fastq_1 actually exists, in my testing.

Validate a file included in the params JSON Schema

When a parameter has a field schema which is the path to a JSON Schema, this schema is automatically used to validate the file given by this parameter.

nf-validation/plugins/nf-validation/src/testResources/nextflow_schema_with_samplesheet.json

Line 20 in 7c5d63b

"schema": "src/testResources/samplesheet_schema.json",

Clarify error when wrong JSON schema draft was used

This error should really contain the name of the schema file it's talking about:

ERROR ~ Failed to load the meta schema:
The used schema draft (http://json-schema.org/draft-07/schema) is not correct, please use "https://json-schema.org/draft/2020-12/schema" instead.
See here for more information: https://json-schema.org/specification#migrating-from-older-drafts

AWS Healthomics doesn't recognise this plugin

Hi, i'm trying to run this gwas workflow https://github.com/genepi/nf-gwas by porting it into AWS Healthomics and running it on some test data. guide: https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US/workshop/create-healthomics-workflow

However, I am getting this error from AWS:

Jan-24 08:42:55.367 [main] ERROR nextflow.cli.Launcher - @unknown
nextflow.exception.IllegalModulePath: Module path must start with / or ./ prefix -- Offending module: plugin/nf-validation

it can't seem to fetch the plugin, which was specified here:
https://github.com/genepi/nf-gwas/blob/606ffe88f325de77fcdc799b8b090238920febc5/nextflow.config#L171
& used here:
https://github.com/genepi/nf-gwas/blob/606ffe88f325de77fcdc799b8b090238920febc5/main.nf#L13

what gives? is there a way for me to add this plugin into my git repository and import it locally then?

thanks

nf-validation fails when number parameters are set to 0

Hello, in https://github.com/nf-core/differentialabundance, I noticed that setting the following parameter:

"filtering_min_abundance": {
                    "type": "number",
                    "default": 1,
                    "description": "Minimum abundance value",
                    "fa_icon": "fas fa-compress-alt"
                }

to 0 will throw this error:

ERROR ~ ERROR: Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* Missing required parameter: --filtering_min_abundance

The problem is also not solved by setting 0.0; zero will not be recognized as a valid number.

It can easily be reproduced with e.g. the following command:
nextflow run nf-core/differentialabundance -profile docker,test --outdir test_zero --filtering_min_abundance 0

Confused

The first sentence in the readme confuses me:

nf-validation/README.md

Lines 3 to 4 in 0a1718e

    
           This plugins implement a validation Nextlow pipeline parameters 
        
           based on [nf-core JSON schema](https://nf-co.re/pipeline_schema_builder).

	if(fileType == "yaml"){
	fileContent = new Yaml().load((samplesheetFile.text))
	}
	else {
	Map types = variableTypes(schemaFile.toString(), baseDir)
	fileContent = samplesheetFile.splitCsv(header:true, strip:true, sep:delimiter)
	fileContentCasted = castToType(fileContent, types)
	}
	if (validateFile(false, samplesheetFile.toString(), fileContentCasted, schemaFile.toString(), baseDir, s3PathCheck)) {
	log.debug "Validation passed: '$samplesheetFile' with '$schemaFile'"
	}

	if(fileType == "yaml"){
	fileContent = new Yaml().load((file_path.text))
	}
	else {
	Map types = variableTypes(schema_name, baseDir)
	fileContent = file_path.splitCsv(header:true, strip:true, sep:delimiter)
	fileContentCasted = castToType(fileContent, types)
	}
	if (validateFile(useMonochromeLogs, key, fileContentCasted, schema_name, baseDir, s3PathCheck)) {
	log.debug "Validation passed: '$key': '$file_path' with '$schema_name'"
	}

	This plugins implement a validation Nextlow pipeline parameters
	based on [nf-core JSON schema](https://nf-co.re/pipeline_schema_builder).

nextflow-io / nf-validation Goto Github PK

nf-validation's Introduction

nf-validation has now been renamed to nf-schema.

Introduction

Quick Start

Dependencies

Slack channel

Credits

nf-validation's People

Contributors

Stargazers

Watchers

Forkers

nf-validation's Issues

Problem

Versions:

Nextflow log

Problem

Versions:

Input & schema definitions

Recommend Projects

Recommend Topics

Recommend Org

`nf-validation` has now been renamed to `nf-schema`.