pennlinc / xcpengine Goto Github PK

Official public repository for the XCP Engine. This tool is deprecated in favor of XCP-D and ASLPrep.

License: MIT License

Shell 60.84% R 16.23% MATLAB 0.97% TeX 0.73% Dockerfile 0.53% Python 19.25% AMPL 0.57% Roff 0.89%

xcpengine's Introduction

The BBL image processing umbrella

Deprecation Notice

xcpEngine is no longer supported. xcpEngine is essentially completely configurable -- including configurations that don't make sense and would not pass peer review. Instead of maintaining this complex and potentially dangerous configurability, the most widely-used fMRI postprocessing workflows from xcpEngine are available (including for surface data) and rigorously tested/supported in XCP-D.

For ASL preprocessing, we recommend switching to ASLPrep.

Overview

xcpEngine provides a configurable, modular, agnostic, and multimodal platform for neuroimage processing and quality assessment. It implements a number of high-performance denoising approaches and computes regional and voxelwise values of interest for each modality. The system provides tools for calculating functional connectivity after preprocessing has been run in FMRIPREP, as well as a standard toolkit for benchmarking pipeline performance. All pipelines from our 2017 benchmarking paper are implementable, as are the pipelines evaluated in the recent work of Parkes and colleagues.

Documentation

Detailed documentation is accessible at the pipeline's documentation hub

If you experience any difficulties, we suggest that you search or post to the issues forum of the pipeline's GitHub repository.

xcpengine's People

Contributors

Stargazers

Watchers

Forkers

adrose sedacavdaroglu romainjp mattcieslak haitaoge yaomeng94 tingtingbo ayumu722 rob-mccutcheon utooley araikes mcmahonmc butellyn tinashemtapera areez-m babuadhimoolam a3sha2 brainarea tientong98 abw58 arunsm adpines ariekahn shanaahall homebovine figsandcitrus tgholipour bdeck8317 zyang-nki dominikkraft njheimbach rongxiang21 wolketian mwvoss llevitis smeisler dhasegan julioaperaza lawlessrd kahinimehta martinsmgatavins nzwang

xcpengine's Issues

ROI-wise quantification: update for structural processing

Update the roiquant module to:

Allow registrations into structural space
Allow ROI-wise volume quantification
Ensure improper subject-level directories are not generated

Add structural QA module

1.) Need to allow for multiple segmentation techniques

FAST
Atropos
Custom segmentation files

2.) Perform foreground background segmentation akin to QAP

3.) Potentially would like to explore neighborhood connectivity akin to matlab toolkit

Deprecate config

The xcpConfig system was never intended to be permanent. It will be replaced by a (hopefully graphical) user interface that permits configuration of inter alia:

pipeline/design
atlas
space/template

and potentially also retrieval of and interaction with pipeline output

Derivative metadata → JSON

With network metadata successfully shifted to JSON format (#39), the next (more ambitious) stage of the ACCELERATOR programme will additionally demand shifting derivative neuroimages to JSON. This will permit the processing stream to interface with derivatives in a more intelligent manner.

Functionality:

derivative : declare a new derivative and append it to the current list
derivative_config : add a new property to the current derivative (e.g., space, module provenance, statistic for ROI-wise quantification, type -- timeseries or map -- etc.)
load_derivatives : load all derivatives for iterative processing
derivative_parse : parse a derivative's metadata into variables that can be used by the module
write_derivative : update the index of derivatives to reflect all processing

Structural Module

Updates requested for SM:

1.) Add structural metrics: starting with CC and extend to QAP metrics
2.) Rename rescaled GMD image
3.) Add JLF/intersect capabilities

Config issues

Getting some sed errors when running the config process.

Now, provide a name for the current analysis project.
This will also be the name of the design file.
Name: generic_pipeline
Preparing design file...
sed: can't read 1: No such file or directory
sed: can't read 1: No such file or directory

Network metadata → JSON

The network metadata file will be shifted to JSON format; thus, the functions that interface with this file will need to be modified accordingly. With the broad functionalisation of XCP routines, this transition should be simplified. This will additionally provide a platform to evaluate the feasibility of a shift to JSON.

Functionality:

network : declare a new network
network_add : add a new object to a network (e.g., map, space, timeseries, adjacency matrix)
network_parse : parse a network's metadata into variables that can be used by the module
load_networks : load all networks for potential iteration and for parsing

Leaky functions: localise all variables except output

Functions should be well-encapsulated; they should not declare any unnecessary non-local variables. Any non-local variables declared must be catalogued as outputs in the function documentation. All existing functions should be reviewed to ensure compliance.

Error using seed module on outside data

I'm trying to use the seed module to do some seed-based connectivity on hcp volume data. The data is already cleaned and warped to MNI152_2mm space, so I shouldn't need and registrations/warps, and I don't have the structural images and I have no AntsCT folders for these subjects.

I made my cohort file which consists just of subID,/path/to/BOLD

In setting up the design I tried two options. At the beginning it asks what template you will warp to, so the first time I tried to say MNI152_2mm since that's the space I'm already in, but then when I ran the seed module I got the following errors.

Skipping analysis for:

214423,/data/jag/cnds/jaredz/hcpN100Sample/subImages//214423/MNI/214423_concat4_rfMRI_REST_hp2000_clean_.nii.gz
for the following reasons.
::XCP-ERROR: Unable to locate functional timeseries.
::XCP-ERROR: Unable to locate structural image.
::XCP-ERROR: Unable to locate warp.
::XCP-ERROR: Unable to locate inverse warp.

Then I tried to say I wouldn't be doing any normalization in my analysis and I got only the ::XCP-ERROR: Unable to locate structural image.

Is there a way to get the seed module to run on outside data that's already processed and normalized?

The two design files used are attached, hcpN100*.dsn is the first, and sgACC*.dsn is the second (nvm GitHub won't seem to let me attach them, email, slack or reply here and I'll get you any files you need)

Thanks,
Jared

Cohort uniquify error

As reported by @adrose

...Constructing a pipeline based on user specifications...
* ./pcasl_201607291423-xaa.dsn
Error in write.table(cohort, cohortpath, header = F, row.names = F, quote = F,  :
 unused argument (header = F)
Calls: write.csv -> eval.parent -> eval -> eval -> write.table
Execution halted

Assimilate voxelwise nuisance into confound modelling module

Just as aCompCor is enabled by providing a fractional (for minimal variance explained) or numeric (for number of components) argument to the tissue-based confounds in the confound module, so voxelwise regressors should be enabled by providing a particular argument, for instance voxel.
This change will deprecate the locreg module and hopefully make confound modelling more intuitive.

fslcc does not support -p option in version 4.x

After ANTsCT pipeline, the xcpEngine is executing fslcc with -p option. However fsl version <5.0 does not seem to support such option. It'll be great that the minimal version of FSL package can be verified during dependency check.

Overwrite derivatives

write_derivative should overwrite existing derivatives named identically in order to preclude duplication.

This issue only affects the ACCELERATOR branch.

Transform pathfinder

The objective of the transform pathfinder is the implementation of a more complete system for moving neuroimages between coordinate spaces.
The transform pathfinder parses image metadata to identify the source and target coordinate spaces.
Next, it operates on an binary, directed adjacency matrix with nodes defined as coordinate spaces and edges defined as available transforms. It computes the shortest path from the source space into the target space and recruits the appropriate transforms. (It's possible that there will eventually be multiple layers of this matrix corresponding to different transform formats -- e.g., ANTs, DRAMMS, FSL -- and the pathfinder will in this case also select the most appropriate layer.)
Finally, the pathfinder translates the transform pathway for use by the most appropriate utility and prints the requisite transform series to the module, utility, or function that called it.

Reserve variable for unprocessed/raw image

Reserve a variable for the unprocessed/raw image. Currently, the variable img is used to reference both the unprocessed and the processed images; the rationale was that this variable would always point to the image that further processing steps should operate upon. However, this convention leaves the raw image inaccessible to downstream analyses.

Template construction procedure with ANTs in XCP Engine

Implement the template construction procedure in XCP Engine by automating and serialising subject- and group-level ANTs template-building routines. This should be a separate pipeline from the standard structural analysis.

RERUN failures

This thread will identify any code blocks where rerun fails or is not used to effect.

Slicewise renderings of registration quality are computed regardless of whether they already exist, as are coregistration quality metrics. There is no need to recompute these if coregistration is not recomputed.
The network module reruns the roi2ts, ts2adjmat, adjmat2pajek, withinBetween commands regardless of whether the output exists. Await the resolution of #46 before tackling this.

Verbosity level

The verbosity level is not correctly exported to any child processes.

Dynamic connectivity using MTD

Implement a module (or sequence of modules) that performs connectivity dynamics analysis using the simplest approach: the Multiplication of Temporal Derivatives (MTD).

Proposed features:

Computation of edgewise timeseries
Metaconnectivity matrix
Time-by-time matrix
Community detection in the metaconnectivity matrix (hypergraph detection)
Community detection in the time-by-time matrix (state detection)
Flexibility/state transition count computation

Breaking up the network module

The network module currently performs two separable functions:

Generation of a functional connectivity network
Analysis of an adjacency matrix using network-science strategies

With the introduction of non-FC matrices (for instance, tractographic connectomes) to the multimodal pipeline system, it will in many cases be desirable to perform (2) without (1). In preparation for this possibility, the network module's functionality will be divided into two separate modules, for instance:

fc or connectome : Computes a functional connectome using a BOLD timeseries and an atlas.
network : Computes network measures on an adjacency matrix. Includes community detection.

Intermediates to TMPDIR

If a TMPDIR or analogous variable is defined by the user, then any intermediates generated by the processing stream should be written to and accessed from the scratch space. An appropriate stamp may be necessary to uniquify the paths to scratch when mass parallelisation is employed.

Build in a minimal structural processing pipeline

As not all groups will be interested in running the full ANTsCT pipeline - the antsCT module should be able to run just BE and registration as this will produce all of the required inputs for norm to run.

Speed up quality and ROI file concatenation

Exporting a text file with the paths to all of the quality files and all of the files from the output from ROIquant will make combining these files MUCH faster.

LMK how I can help on this @rciric

Format → NIFTI_GZ

XCP will process all data using the NIFTI_GZ format specification
XCP will no longer search for the file extension in each module
The localiser will localise all input images into NIFTI_GZ format. If inputs are already in NIFTI_GZ format, then the localiser will use symbolic linking to minimise redundant file creation.
This change will enable more flexible use of the extension delimiter (.) in subject identifiers, cohort files, and elsewhere.

Vectorise (anti)symmetric adjacency matrices

Symmetric or antisymmetric adjacency matrices should be stored as feature vectors on the filesystem to eliminate redundant information.
It should not be difficult to adapt analytic tools to operate on both matrices and feature vectors.
This will be particularly critical for large matrices like hypergraphs. (Power hypergraph matrix: 1.205200656E9 elements, Power hypergraph vector: 6.02582970E8 elements -- slightly under half)

Allow DICO module to accept text files

Medeglia asked about this a little whiel back and as I am assuming you are now aware @rciric I am in the process of building som kind of text to dicom functionality. Most likley it'll have to be done in matlab but I will see if I can't do it in R or python or something freeware.

Spatial equivalence test

When determining whether to transform an atlas map, test for spatial equivalence instead of existence only.
This will be a straightforward implementation, but it is on hold until the reformulation of space (#44).

xcpReset declares empty global variables

xcpReset can sometimes declare an empty global variable such as ANTSPATH when this is not declared in a user's bash_profile.

Maybe adding a check to see if all global variables were declared successfully would help resolve this issue?

magnitude image not recognized in dico config

this was totally dental

Fractional ALFF

Fractional amplitude of low-frequency fluctuations (fALFF) should be computable as a neuroimage derivative for functional timeseries data. This may necessitate additional subroutines in the preprocessing sequence, or can be achieved by including two ALFF modules at two points in a processing stream. This issue is low-priority.

Accept cohort as argument

The cohort file (along with any filesystem-specific definitions) should be accepted as an argument by the controller engine. With this change, a new design will no longer need to be created every time that the user wishes to run a pipeline on another sample of subjects. This will also facilitate design sharing.

Derivatives and other outputs: define absolute or relative

Outputs are currently defined using paths relative to the module output directory. The pipeline should also be able to accommodate definitions by absolute path. This is as simple as checking whether the first character of the path definition is /.

output of seed

Hi,

After running prestats, cored, confound, regress and seed, I get an output called sub#_sm6.nii.gz instead of a correlation map. When I overlay it on native T1 image, it seems to have values ranging from -1500 to 160. Any ideas what could be the reason behind?

Issues w/ structural config process

1.) the image and struc variable are not properly defined in the config process.

2.) ANTs dependency check sometimes fails getting the ants version.

3.) The default structural pipeline should be modified.

4.) A template directory variable would make the antsCT ini procedure more efficient.

Minimise use of non-builtins

Shell utilities that are not builtins (e.g., sed, cut, awk) are not always uniform across distributions. Their use should be avoided where possible. (This is a low-priority objective, as it is unlikely that we will be moving away from these completely in the near future.)

Dependency: JSON manipulation

With the neuroimaging community gradually adopting JSON as the standard format for metadata, the pipeline system will need to effectively interface with JSON files. (Furthermore, the hash-delimited file specification will be phased out in favour of JSON for additional flexibility and for improved human-readability.) This transition will require an additional dependency, which should be selected to minimise or eliminate installation burden; JQ is currently favoured because of its portability among Linux-based systems, its compact size, and its permissive license (MIT).

Full path execution for external commands

Any commands deployed from external analysis platforms (FSL, AFNI, ANTs) should be executed using full paths. These full paths should reference the variables defined in the XCP globals ($FSLDIR, $AFNI_PATH, $ANTSPATH). This will ensure that the user-specified version of each command is used, and furthermore that all runs of the processing system will produce consistent results unless the external commands are overwritten.

Task module repair/tune-up

A number of potential concerns with the task module have been noted.

Unnecessary re-analysis. The task module will often re-run if the pipeline is executed after the final task output has already been produced. This occurs even if the 'task_rerun' variable is set to 'N'.
Production of extraneous output directories. The task module occasionally produces an extraneous output directory (${cxt}_task) in addition to its standard output directory (task) if pipeline context numbering is switched off.

Reorient input to RPI

Any input to the processing system should be reoriented to RPI, the standard orientation of MNI data in FSL. This orientation will be handled by the localiser and should allow simplification of scripts that use spatial coordinate information.

RPS map

Hi,

I am trying to do field map correction but I am not sure what RPS stands for. Is it the phase map? What kind of a map do I need to feed in

Thanks,

Assimilate derivative_typeof functionality into derivative_parse

A derivative's type should be a property of the derivative that is returned by the parse function.

Brain extraction uses local reference

The brain extraction in the prestats module uses the local referenceVolume variable, which may not exist in all processing cases. It does not first verify the existence of the local referenceVolume.

Prestats design variable

A new input variable has been introduced to the prestats module as part of the ACCELERATOR project (prestats_censor_contig). As a consequence, v0.5.0 design files will not be usable with v0.6.0 unless they are appended. When v0.6.0 is released, v0.5.0 users will be notified of the compatibility issue and its fix.

Regress failure signal

If the confound regression procedure fails for any reason, the regress module should send a failure signal to the XCP Engine to halt processing for that subject.

Issue history:
AFNI's 3dTproject program, which performs confound regression, was found to exhibit bizarre behaviour (in particular, misreading input file paths) in the resting replication directory on the CFN file system. This behaviour puzzlingly appears to be confined to a single directory, as either (a) changing the output path of the XCP Engine or (b) copying the input elsewhere and calling 3dTproject was sufficient to rectify the problem and achieve replication in these instances. This unusual behaviour alerted us to the reality that 3dTproject's failure does not currently propagate to the top level, and subject-level processing continues despite the failed confound regression.

Prestats localisation error

An image localisation error occurs if the prestats module does not have processing variables correctly assigned, for instance due to an improper context. It also occurs if the prestats module is, for whatever reason, configured not to perform any processing. Specifically, instead of being copied, the input image is moved into the output directory, where it is treated as the preprocessed image.

Apply to derivatives: match any field

apply_exec should be able to match any derivative property; currently it matches only type. In this way, the roiquant module could automatically apply the appropriate statistical calculation to all derivative maps marked for that calculation type (currently stored in the Statistic property).

Apply to all derivatives of type

A number of utilities are designed to operate not only on the primary analyte image, but also on any appropriate derivatives. This operational mode was a necessary consequence of the old metadata structure and the lack of a functional vocabulary in v0.5.
These operations have led to confusing and often conflicting methods of handling derivatives. It is now necessary to standardise this procedure across all routines, utilities, functions, and modules.
tfilter is the most egregious example of this hideous design. It comprises over 800 lines of code dedicated to a relatively simple functionality: applying a temporal filter to time series data. The obscene bloat of this function follows from the necessities of filtering image derivatives and filtering text derivatives (~.1D) every time that the analyte is filtered.
The handling of text derivatives (.1D specification) will not immediately be addressed by this issue -- it is currently an open problem in pipeline development whether ~.1D files should be treated as derivatives or text output. (As of writing, the former course of action is favoured --this will necessitate another major sweep of modules and utilities.)
The new metadata specification and functional vocabulary enables an improved system for managing image derivatives. One possibility presented by the new processing architecture is the introduction of a new function that will apply the same routine to all derivatives, or to all derivatives of (a) specified type(s).
apply_exec will accept as its first argument a (list of) derivative type(s). all for all derivatives without regard to type. If multiple types are passed, there should eventually be support for a logical grammar chaining these together, but for now, a logical OR/set union will be assumed.
apply_exec will accept as its second argument the generic path where derivatives will be written, with the derivative's name replaced by the generic string %DERIVATIVE%.
apply_exec will accept as its third argument the program that should be used to perform the specified command.
apply_exec will accept as its fourth argument the command to be applied to all derivatives of the specified type, with the path to the derivative replaced by the generic string %DERIVATIVE.
apply_exec will also handle the declaration and writing of any derivative that it operates upon. Writing is the difficult part -- this problem could be approached by defining a new variable called finalise that contains commands to be executed at module completion. These calls would then be encapsulated in the core routine moduleEnd.
A substantial number of modules and utilities will need to be updated to conform to this new implementation.

Working directory option for xcpEngine

Hi,
Would it be possible to add a working directory option in the design file where I can designate where to execute and save the intermediate files during runtime? At the end of the pipeline, important output files then get saved into the already existing output variable: $out_super, $out and $out_group. We are hoping to minimize the runtime disk-space requirement with this new feature.

Thanks,
Michael

Design file to export global variables

Globals should be exported in the design file in order to ensure that they propagate to child processes (utils). This will ensure that dependencies are leveraged consistently.

Pipeline run metadata → JSON

Store pipeline run metadata, including versions and environment variables, in JSON format.