inria-empenn / bep028_bidsprov Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bids-standard/bep028_bidsprov

0.0 0.0 0.0 8.13 GB

Organizing and coordinating BIDS extension proposal 28 : BIDS Provenance

License: Creative Commons Attribution 4.0 International

Python 84.10% Makefile 0.19% MATLAB 15.71%

bep028_bidsprov's Introduction

BEP028

This repository contains BIDS Extension Proposal 028 : BIDS-prov, a provenance framework for BIDS.

Our goal

Interpreting and comparing scientific results and enabling reusable data and analysis output require understanding provenance, i.e. how the data were generated and processed. To be useful, the provenance must be understandable, easily communicated, and captured automatically in machine accessible form. Provenance records are thus used to encode transformations between digital objects.

Who is building BEP 028 ✨

Camille Maumet (@cmaumet) and Satrajit Ghosh (@satra) are the BEP co-moderators. Here is the list of all contributors (emoji key):

Camille Maumet (@cmaumet)💻👀📖 🐛🖋 🤔🚧
Satrajit Ghosh (@satra)💻👀📖 🐛🤔
Stefan Appelhoff (@sappelhoff)🤔
Chris Markiewicz (@effigies)🤔
Yaroslav Halchenko (@yarikoptic)🤔
Jean-Baptiste Poline (@jbpoline)🤔
Rémi Adon (@remiadon)💻👀📖 🐛
Hermann Courteille (@hermann74)💻👀📖 🐛
Thomas Betton (@thomasbtnfr)💻👀📖 🐛
Cyril Regan (@cyril-data)💻👀📖 🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

BIDS-prov in the NIDM project

The Neuroimaging Data Model (NIDM) is a collection of specification documents that define extensions the W3C PROV standard for the domain of human brain mapping.

BIDS-prov is a BIDS extension that is compatible with NIDM.

How to help

Our goal is to extends BIDS to be able to track provenance at every stage of an experiment.

For this purpose we have to propose changes to the BIDS specification.

The BIDS specification is rendered as a webpage at https://bids-specification.readthedocs.io.

The website is built from a GitHub repository that consists of mostly markdown files at https://github.com/bids-standard/bids-specification. If you don't know much about markdown, here's a good intro guide.

Finding information and getting in touch

Google doc

The BEP028 is in a google doc.

Contact BIDS-Prov

The group is always open to new contributors interested in neuroimaging data sharing. To participate in discussions or to ask any question, please email us at [email protected].

Additional resources

Mature building blocks of NIDM:

New features (to be included)

Run parsers on the SPM, FSL and AFNI data

To obtain data in bids-prov format, you can use the developed parsers.

Tutorial

Code of conduct

We are committed to building a welcoming and harrasement free experience for all our contributors. As a contributor to the BIDS-Prov specification, we ask you to follow our code of conduct

Credits: This README was build based on the BEP001 README.

bep028_bidsprov's People

Contributors

bep028_bidsprov's Issues

[FSL] change signature of get_kwarg and

Is your feature request related to a problem? Please describe.

BEP028_BIDSprov/bids_prov/fsl/fsl_parser.py

Line 265 in 34fbc5b

parser, inputs_kwarg = _get_kwarg(parser, parameters["used"])

parser, inputs_kwarg = _get_kwarg(parser, parameters["used"])
It's less readable to return a modified version of parameter parser

Describe the solution you'd like

def _get_kwarg(parser, serie,  with_value=True):
    arg_list = []
    for u_arg in serie:
        if type(u_arg) == dict:
            parser.add_argument(u_arg["name"], nargs='+', action='append')
            arg_list.append((u_arg["name"], [u_arg["index"]]))
        if type(u_arg) == str and ":" not in u_arg:
            if with_value:
                parser.add_argument(u_arg)
                arg_list.append((u_arg, [0]))
            else:
                parser.add_argument(u_arg, action='store_true')
                arg_list.append((u_arg, []))

    return parser

Another solution is to return add_argument_list and to add them outside. In such casen parser is not a parameter ...

Adequation of parser to spec BIDS-prov

Update proposal for BIDS Prov (BEP028)

Problem Statement

"BIDSProvVersion": "dev" instead of "0.0.1".
In "Activity" :
- missing : Generated (OPTIONAL)
- missing : StartedAtTime(OPTIONAL)
- missing : EndedAtTime (OPTIONAL)
- missing : Type (OPTIONAL)
In "Entity" :
- missing example for : digest: REQUIRED (Dict. For files, this would include checksums of files. It would take the form {"": "value"}.)
"Environments" is mising (OPTIONAL)

--- Specifics ---

SPM
- AGENT : "version": "xxx"
- Activity : missing Command : REQUIRED
FSL, AFNI : missing : digest: REQUIRED

Rationale

to discuss

Minimal example

Update of AssociatedWith and GeneratedBy keys in the specification

I propose to update the AssociatedWith and GeneratedBy keys of the Bids-prov standard to wasAssociatedWith and wasGeneratedBy respectively for the generation of graphs from jsonld.

Problem Statement

If we keep for example AssociatedWith then when generating the graph, these arrows will not appear. It's the same case for GeneratedBy.

Rationale

AssociatedWith should become wasAssociatedWith
GeneratedBy should become wasGeneratedBy

Minimal example

Example with wasAssociatedWith in key:

Example with AssociatedWith in key:

[Bug]: expected output fsl_sub

batch name and line

/usr/local/packages/fsl-5.0.8/bin/fsl_sub -T 10 -l logs -N feat0_init /usr/local/packages/fsl-5.0.8/bin/feat /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/TEST/nidmresults-examples/fsl_con_f_multiple_test.feat/design.fsf -D /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/TEST/nidmresults-examples/fsl_con_f_multiple_test.feat -I 1 -init

What happened?

I don't really know what is expected as output for the fsl_sub function. It seems to allow parallelization of tasks.

At the moment it is parsed in this way but it seems incorrect to me:

{
        "@id": "urn:652850d5-35a3-4111-9fe1-eff69559a0a9",
        "label": "feat_main_script_fsl_sub",
        "wasAssociatedWith": "urn:b06c78eb-7bf3-4c3f-a798-b0a8f8956d04",
        "command": "/usr/local/packages/fsl-5.0.8/bin/fsl_sub -T 10 -l logs -N feat0_init   /usr/local/packages/fsl-5.0.8/bin/feat /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/TEST/nidmresults-examples/fsl_con_f_multiple_test.feat/design.fsf -D /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/TEST/nidmresults-examples/fsl_con_f_multiple_test.feat -I 1 -init",
        "attributes": [
          [
            "-T",
            "10"
          ],
          [
            "-l",
            "logs"
          ],
          [
            "-N",
            "feat0_init"
          ],
          [
            "-D",
            "/storage/essicd/data/NIDM"
          ],
          [
            "-I",
            "1"
          ]
        ],
        "used": [
          "urn:e4c7a747-8eae-4635-bc71-7ea7d9e3a626"
        ],
        "prov:wasInfluencedBy": "urn:de2c15c3-3a37-4bcd-ab20-469c87f6cd23"
      }

Describe the expected ground-truth ?

fsl_sub has 4 attributes (T, I and N) and one function (which also has attributes and entities) if I understand correctly. How could we represent it @cmaumet ?

What soft is concerned

FSL

Add back manual examples

Hi @cyril-data @hermann74 -- the manual examples were lost in the AFNI PR (see this comment). Can you revert the commit that introduced this change to add them back? Thanks!

FSL : fslroi

Is your feature request related to a problem? Please describe.
The logs of the fslroi function are not obvious. For example, we have the following log: /usr/local/packages/fsl-5.0.8/bin/fslroi prefiltered_func_data example_func 52 1.
prefiltered_func_data seems to be the input entity (= used).
example_func seems to be the generated entity but what about 52 and 1 ? These are attributes but what are their associated parameter names?

Currently these two parameters are not taken into account. fslroi is modelled like this in BIDS-prov:

and this is perhaps the best option? What do you think @cmaumet?

FSL : cp 2nd parameter used or generatedBy

How should the following lines be modelled in BIDS-prov?

mkdir .files

cp /usr/local/packages/fsl-5.0.8/doc/fsl.css .files

Here are some suggestions:

Note that in the second option the entity .files has been generated by two activities which is not currently provided in BIDS-prov.

Furthermore, I think it would be interesting to start from a report_log.html file and propose a modelling in BIDS-prov. This could allow to answer many problems that we start to be confronted, mainly for entities.

@cmaumet

[AFNI] Identification of existing entities.

Is your feature request related to a problem? Please describe.

For now, this is based on the label of the entity (usually: filename.extension). A new entity is created if it is used in an activity (input). Here are some problems concerning the naming:

function cat outcount.r*.1D outcount.rall.1D : the regular expression of the input is not taken into account
some suffixes are added, example tlrc : sub01-T1w_ns+orig becomes after sub01-T1w_ns+tlrc
presence of strange extension :
- sub-01_task-tonecounting_bold.nii.gz'[0..$]
- all_runs.$subj+tlrc"[$ktrs]"
- X.nocensor.xmat.1D'[3]'
- pb00.$subj.r*.tcat+orig.HEAD

FSL naming of activities

Is your feature request related to a problem? Please describe.
Currently activities in FSL are named with something like <NAME_OF_HEADING>_<COMMAND_NAME> for example:

"label": "initialisation_fslmaths"

in https://github.com/bids-standard/BEP028_BIDSprov/blob/610fd46a8bcb57989f7fac9d65ca2c31398a7750/examples/from_parsers/fsl/fsl_default_report_log.jsonld#L18

Describe the solution you'd like
Instead, it would be great if we could take an approach similar to what we have done in SPM:

always use the command alone as label, in the example above we would have

"label": "fslmaths"

provide developers with a way to give a mapping to have a more explicit name (similar as https://github.com/bids-standard/BEP028_BIDSprov/blob/ea7c8e56166ee696317e29d96b7f5a23b947d6f4/bids_prov/spm/spm_labels.json)

Describe alternatives you've considered
No alternative considered but happy to hear your thoughts!

Additional context
Two separate pull requests would be great for this (one to use the command only and a second one to implement the name mapping)

All examples in a single parent directory

Is your feature request related to a problem? Please describe.
Currently we have examples in https://github.com/Inria-Empenn/BEP028_BIDSprov/tree/master/examples and in https://github.com/Inria-Empenn/BEP028_BIDSprov/tree/master/results

Describe the solution you'd like
It would be easier to have a single parent directory with all examples inside, something like:

|__ examples
|____ fsl_default
|____ spm_default
|____ fmriprep
|____ fsl
|____ spm

Describe alternatives you've considered
If this ends up being too complicate (e.g. if we need one folder for the github action with nothing inside), we could alternatively do something like:

|__ examples
|____ fsl_default
|____ spm_default
|____ fmriprep
|____ from_parsers # This would be the equivalent of current "results" folder
|______ fsl
|______ spm

Additional context
Note that the fsl_default and spm_default directories are meant to be removed soon but I would keep them for now (until we are confident that the SPM and FSL exporters can be directly used to export those -- we have a few bugs to fix before this can be done :) ).

Generated [OPTIONAL]

Dans doc il est definit l'option Generated :

Key name	Description
@id	REQUIRED. Unique URIs (for example a UUID). Identifier for the activity.
Generated	OPTIONAL. UUID. Identifier of an entity generated by this activity (the corresponding Entity must be defined with its own Entity record)
--	--

a ajouter dans le parser ? @cmaumet

[Bug]: c1 and c2 filenames contain "xxx"

batch name and line

See https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/spm/nidmresults-examples_spm_2_t_test_batch.jsonld#L296 we have "prov:atLocation": "c1xxx.nii.gz",. The c1xxx should be replaced by the actual filename (that starts with c1).

What happened?

Entities corresponding to the c1 and c2 files generated by the segment activity currently have a prov:atLocation attribute with a value that contains xxx,

Describe the expected ground-truth ?

The expected ground-truth, something closer to: c1sub-01_T1w.nii.gz

What soft is concerned

No response

Use boutiques to infer inputs/outputs in FSL command lines

Is your feature request related to a problem? Please describe.
When reading the logs in FSL (e.g. this one), we have no way to automatically find out what option/parameter is an input and what open/parameter is an output.

Describe the solution you'd like
As discuseed on January 9, we would like to rely on the Boutiques standard to find which files are inputs and outputs of a given activity (command from the logs) in FSL.

Describe alternatives you've considered
Alternative would be to use a yml file as was done for SPM (to specify that c1/c2 are outputs of segment).

Additional context
Boutiques standard : https://boutiques.github.io/
A Boutiques file example : https://zenodo.org/record/7435009#.Y7ve--LMLz8 (for FSL siena)

Command [REQUIRED]

Dans le doc, il est spécifié qu'il faut une clé-valeur Command :

2.1 Activity

Each Activity record has the following fields:

Key name	Description
@id	REQUIRED. Unique URIs (for example a UUID). Identifier for the activity.
Label	REQUIRED. String. Name of the tool used (e.g. “bet”).
Command	REQUIRED. String. Command used to run the tool.

Faut en ajouter une ? Si oui, laquelle ? @cmaumet

Namespace handling directly with the uuid module

Can we improve the use of the uuid module so that it handles namespaces directly instead of manually indicating the namespace urn each time?

Describe the solution you'd like
A complete management of uuids and namespaces by a particular module (maybe uuid)

Describe alternatives you've considered
If we add the urn namespace manually then we will have to declare it somewhere, in the context and in the RDF python library.

Additional context
Context : bids-standard#83 (comment)

Digest Attribute

FROM Specification

RECOMMENDED. Dict. For files, this would include checksums of files. It would take the form {"<checksum-name>": "value"}

Actuellement

pour les entités entrantes (1 fichier par entité) : cf code
pourquoi un dictionnaire pour plusieurs fichiers dans la spec?
checksum-name = "sha256"+entity[@id] , bon ?

Exemple

"prov:Entity": [
      {
        "@id": "niiri:zzX6hqo35uwZqxZOOHjk",
        "label": "my_files_sub-01_task-tonecounting_bold.nii.gz",
        "prov:atLocation": "/my_files/sub-01_task-tonecounting_bold.nii.gz",
        "digest": { "sha256_niiri:zzX6hqo35uwZqxZOOHjk": 
        "e14eaba2b2f48a040d85da4abf9a1f62e27c710c16d770ea81b4904ee160c54e"
        }

Problem:

problème d'accès au fichier (cf issue )
pour effectuer le checksum. Pour l'exemple, faux fichier texte créé avec un chemin relatif au batch.m (input)
@cmaumet

[BUG] huge git history because of png pushed at every commit

Is your feature request related to a problem? Please describe.
this github repository weight more than 5 Go !! in local repository in directory .git/objects/pack
As Nidm-example pngs (~60 Mo) are pushed in github repository at every commits. Therefore, git history has grown !

Describe the solution you'd like
Perhaps reuse artefact ....

Others solutions to inspect

[Bug]: [AFNI] problem with 3dTstat with an extra "-"

batch name and line

3dTstat -sos -prefix - gmean.errts.unit.1D\' > out.gcor.1D

see in nidm-example :
https://github.com/incf-nidash/nidmresults-examples/blob/dfbcca00d59e58ba1ecee02f9788d89cd9ecb243/afni_default/subject_results/group.DS0011/subj.sub_001/proc.sub_001#L312

What happened?

Usually -prefix is followed by an output and not character "-" ! Here , an artificial entities with label "-" is created

Describe the expected ground-truth ?

The expected ground-truth : gmean.errts.unit.1D should be the output ?

What soft is concerned

AFNI

[Bug]: fslmaths

batch name and line

For the command /usr/local/fsl/bin/fslmaths /Users/cmaumet/Data/fsl_practicals/fsl_course_data/fmri/fmri_fluency/fmri prefiltered_func_data -odt float

What happened?

A bug happened!

fslmaths should get as input fmri and output prefiltered_func_data but it is not the case because there is no output from fslmaths :

Describe the expected ground-truth ?

The expected ground-truth :

What soft is concerned

FSL

[Bug]: Some extra activities should be deleted

batch name and line

Checking this example, we have an activity with label stats___numTS=38352.

I was not sure what Command would correspond to this and in fact when looking in the corresponding log file for numTS I found:

Log directory is: stats paradigm.getDesignMatrix().Nrows()=104 paradigm.getDesignMatrix().Ncols()=2 sizeTS=104 numTS=38352 Calculating residuals... Completed Estimating residual autocorrelation... Calculating raw AutoCorrs... Completed mode = 9979.81 sig = 963 Spatially smoothing auto corr estimates ......... Completed Tukey M = 10 Tukey estimates... Completed Completed Prewhitening and Computing PEs... Percentage done:

This is a message displayed in the Terminal and should be ignored when parsing the log file (and therefore the corresponding activity should not be created). I think this is True in other places as well, see for example https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/fsl/nidmresults-examples_fsl_con_f_multiple_report_log.jsonld#L1641-L1646.

What happened?

Extra (erroneous) activities are currently created by the FSL parser.

Describe the expected ground-truth ?

Remove the erroneous activities :)

What soft is concerned

FSL

[Bug]: Remove project info

batch name and line

Hard-coded information about the project that is found at the top of each example should be deleted.

For example here: https://github.com/Inria-Empenn/BEP028_BIDSprov/blob/master/results/nidmresults-examples_spm_2_t_test_batch.jsonld#L5-L12

What happened?

This info

  "wasGeneratedBy": {
    "@id": "INRIA",
    "@type": "Project",
    "wasAssociatedWith": {
      "@id": "NIH",
      "@type": "Organization",
      "hadRole": "Funding"
    }

should be removed

Describe the expected ground-truth ?

The expected ground-truth :

What soft is concerned

SPM

[Bug]: One activity per command in the logs

batch name and line

The parsing of the log files in FSL seems to be broken. In practice we should have an Activity created for each command in the logs.

For examples, in https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/fsl/nidmresults-examples_fsl_con_f_multiple_report_log.jsonld#L2870-L2872 , pngappend is an Entity,

But in the log we have:

/usr/local/packages/fsl-5.0.8/bin/pngappend sla.png + slb.png + slc.png + sld.png + sle.png + slf.png + slg.png + slh.png + sli.png + slj.png + slk.png + sll.png example_func2highres1.png ; /usr/local/packages/fsl-5.0.8/bin/slicer highres example_func2highres -s 2 -x 0.35 sla.png -x 0.45 slb.png -x 0.55 slc.png -x 0.65 sld.png -y 0.35 sle.png -y 0.45 slf.png -y 0.55 slg.png -y 0.65 slh.png -z 0.35 sli.png -z 0.45 slj.png -z 0.55 slk.png -z 0.65 sll.png ;

which means that pngappend is a Command and should instead be defined as an Activity.

This problem is likely to impact other Activities as well. I think that the main logic of the parsing of the log files might be in question here.

What soft is concerned

FSL

Remove hierarchies of Activities in FSL

Is your feature request related to a problem? Please describe.
Currently, the FSL parser proceeds by creating multiple levels of activities that are connected together using wasInfluencedByrelationships. See for example: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/from_parsers/fsl/fsl_default_report_log.jsonld#L16-L20.

While this might be something compatible with the BIDS-Prov spec, it is quite different from what we've done for SPM and it creates an extra level of complexity in the graphs.

Describe the solution you'd like
Is it feasible to comment out the creation of the extra Activities (i.e. those that do not correspond to a command in the logs)?

Describe alternatives you've considered
None

Parsing with </pre> tag

Describe the solution you'd like
Instead of giving markdown files as input (not provided initially), we could start directly from the HTML files. To do this, we need to be able to handle the tags. This would avoid the html -> md conversion.

Describe alternatives you've considered
Continue with the conversion to markdown which adds a step and does not seem essential. However, it does not alter the quality of the results.

[Bug]: Command field for FSL

batch name and line

https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/fsl/nidmresults-examples_fsl_con_f_multiple_report_log.md

What happened?

Currently there is no Command stored in the FSL Activities, see for example: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/fsl/nidmresults-examples_fsl_con_f_multiple_report_log.jsonld#L21-L30

Describe the expected ground-truth ?

The expected ground-truth :
'Command': "/bin/cp /tmp/feat_wYxBhi.fsf design.fsf"

What soft is concerned

FSL

[Bug]: command cp -r

batch name and line

Example of line : cp -r /usr/local/packages/fsl-5.0.8/doc/images .files/images located here.

What happened?

I have a doubt about the desired output. Indeed, the -r (recursive) option copies one directory into another but what do we expect to get?

Currently, we have this:

      {
        "@id": "urn:351b3990-6953-4085-9e32-413d38973535",
        "label": "feat_main_script_cp",
        "wasAssociatedWith": "urn:c433fb2a-5b6e-4d88-b596-6270fa9a3263",
        "command": "cp -r /usr/local/packages/fsl-5.0.8/doc/images .files/images",
        "attributes": [
          [
            "-r",
            "/usr/local/packages/fsl"  (the rest is missing by the way, this is corrected in the branch `boutiques`)
          ]
        ],
        "used": [
          "urn:162fe375-82fe-4ca1-9fe4-58d7cb95bd65"
        ],
        "prov:wasInfluencedBy": "urn:42e2bb15-639b-45a4-b493-49be04391084"
      }

Describe the expected ground-truth ?

I would expect to see an r attribute with no value and two used entities which are directories. After research, it seems possible to calculate checksums on entire directories. What do you think @cmaumet?

      {
        "@id": "urn:dbcad5f7-deb2-4311-9296-213ea6711461",
        "label": "feat_main_script_cp",
        "wasAssociatedWith": "urn:b06c78eb-7bf3-4c3f-a798-b0a8f8956d04",
        "command": "cp -r /usr/local/packages/fsl-5.0.8/doc/images .files/images",
        "attributes": [
        [
          "r"
        ]
        ],
        "used": [
          "urn:67a8a94c-02ec-41f2-8030-75e64f0d3794",
          "urn:139944bb-63b0-4b7e-b44d-e9950c07121d"
        ],
        "prov:wasInfluencedBy": "urn:de2c15c3-3a37-4bcd-ab20-469c87f6cd23"
      }

What soft is concerned

FSL

[AFNI] Taking into account the parameters

Is your feature request related to a problem? Please describe.
How to fill correctly field parameters of Activity ?
Example :

3dvolreg -verbose -zpad 1 -base pb01.$subj.r01.tshift+orig'[2]' \
             -1Dfile dfile.r$run.1D -prefix rm.epi.volreg.r$run     \
             -cubic                                                 \
             -1Dmatrix_save mat.r$run.vr.aff12.1D                   \
             pb01.$subj.r$run.tshift+orig

Describe the solution you'd like
Consider that all arguments that are neither input nor output are parameters ...
parameters = {'zpad': 1, 'cubic': True} ...

[Bug]: spm_full_example001

batch name and line

nidm example : batch.m :

matlabbatch{4}.spm.stats.results.spmmat(1) = cfg_dep('Contrast Manager: SPM.mat File', substruct('.','val', '{}',{3}, '.','val', '{}',{1}, '.','val', '{}',{1}), substruct('.','spmmat'));
matlabbatch{4}.spm.stats.results.conspec.titlestr = '';
matlabbatch{4}.spm.stats.results.conspec.contrasts = Inf;
matlabbatch{4}.spm.stats.results.conspec.threshdesc = 'none';
matlabbatch{4}.spm.stats.results.conspec.thresh = 0.001;
matlabbatch{4}.spm.stats.results.conspec.extent = 0;
matlabbatch{4}.spm.stats.results.conspec.mask.none = 1;
matlabbatch{4}.spm.stats.results.units = 1;
matlabbatch{4}.spm.stats.results.print = 'pdf';
matlabbatch{4}.spm.stats.results.write.none = 1;

What happened?

The bug :

python -m bids_prov.spm_parser --input_file ../nidmresults-examples/spm_full_example001/batch.m

file= ../nidmresults-examples/spm_full_example001/batch.m
Traceback (most recent call last):
  File "/home/cregan/Documents/CODE/Projets/3_FWIN/BIDS-prov/FORK_BEP028_BIDSpro/launch_multiple_spm.py", line 66, in <module>
    main()
  File "/home/cregan/Documents/CODE/Projets/3_FWIN/BIDS-prov/FORK_BEP028_BIDSpro/launch_multiple_spm.py", line 52, in main
    spm_to_bids_prov(root + "/" + str(file), conf.CONTEXT_URL, output_file=output_jsonld,
  File "/home/cregan/Documents/CODE/Projets/3_FWIN/BIDS-prov/FORK_BEP028_BIDSpro/bids_prov/spm_parser.py", line 349, in spm_to_bids_prov
    records = get_records(tasks, verbose=verbose)
  File "/home/cregan/Documents/CODE/Projets/3_FWIN/BIDS-prov/FORK_BEP028_BIDSpro/bids_prov/spm_parser.py", line 321, in get_records
    if entity["@id"] not in entities_ids:
KeyError: '@id'

Describe the expected ground-truth ?

The expected ground-truth :

What soft is concerned

spm

FSL : fslmaths description

The fslmaths function has many uses that complicate its description in the associated file.

Here are some commands from 2 versions of FSL (5.0.8 and 5.0.10)

/usr/local/packages/fsl-5.0.8/bin/fslmaths /home/tommaullin/Documents/Data/ds011/sub-01/func/sub-01_task-tonecounting_bold prefiltered_func_data -odt float
/usr/local/packages/fsl-5.0.8/bin/fslmaths /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/PREPROCESSING/ANATOMICAL/sub-01_T1w_brain highres
/usr/local/packages/fsl-5.0.8/bin/fslmaths /storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/PREPROCESSING/ANATOMICAL/sub-01_T1w_brain  highres_head
/usr/local/packages/fsl-5.0.8/bin/fslmaths /usr/local/packages/fsl-5.0.8/data/standard/MNI152_T1_2mm_brain standard
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_mcf -Tmean mean_func
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_mcf -mas mask prefiltered_func_data_bet
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_bet -thr 74.4585571 -Tmin -bin mask -odt char
/usr/local/packages/fsl-5.0.8/bin/fslmaths mask -dilF mask
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_thresh -Tmean mean_func
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_smooth -mul 16.8225970571 prefiltered_func_data_intnorm
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_intnorm -bptf 15.0 -1 -add tempMean prefiltered_func_data_tempfilt
/usr/local/packages/fsl-5.0.8/bin/fslmaths prefiltered_func_data_tempfilt filtered_func_data
/usr/local/packages/fsl-5.0.8/bin/fslmaths filtered_func_data -Tmean mean_func
/usr/local/packages/fsl-5.0.8/bin/fslmaths stats/zstat1 -mas mask thresh_zstat1
/usr/local/packages/fsl-5.0.10/bin/fslmaths bg_image -inm 1000 -Tmean bg_image -odt float
/usr/local/packages/fsl-5.0.10/bin/fslmaths ../mask -mul 14 -Tmean masksum -odt short
/usr/local/packages/fsl-5.0.10/bin/fslmaths masksum -thr 14 -add masksum masksum
/usr/local/packages/fsl-5.0.10/bin/fslmaths masksum -mul 0 maskunique
/usr/local/packages/fsl-5.0.10/bin/fslmaths /home/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/FSL/LEVEL1/sub-01.feat/reg_standard/mask -mul -1 -add 1 -mul 1 -add maskunique maskunique
/usr/local/packages/fsl-5.0.10/bin/fslmaths masksum -thr 13 -uthr 13 -bin -mul maskunique maskunique
/usr/local/packages/fsl-5.0.10/bin/fslmaths mask -Tmin mask
/usr/local/packages/fsl-5.0.10/bin/fslmaths mean_func -Tmean mean_func
/usr/local/packages/fsl-5.0.10/bin/fslmaths cope1 -mas mask cope1

For the used entities, it seems that the first argument is always part of it.
However, for generatedBy entities, it is more complicated. Sometimes it's the last argument, but sometimes the last parameter (e.g. -odt float) seems to change the output type of the function.

The following pattern could perhaps be interesting:

If the function does not contain the -odt <value> option then the last argument is to be added in generatedBy otherwise nothing to add in. However, the description file would certainly not be sufficient to handle this case. It would have to be partly done manually.

What do you think @cmaumet ?

Replace Entity labels with filename

Is your feature request related to a problem? Please describe.
In some of the entities in the SPM examples, the label is manually defined, e.g. https://github.com/bids-standard/BEP028_BIDSprov/blob/master/results/spm/nidmresults-examples_spm_2_t_test_batch.jsonld#L269-L271 we have "label": "Moved/Copied Files".

Describe the solution you'd like
Could we instead consistently have the filename as the label? Also for those, the prov:atLocation field is missing.

[Bug]: Replace all niiri by urn

batch name and line

In all identifiers for which niiri is used as a prefix.

For example here: https://github.com/Inria-Empenn/BEP028_BIDSprov/blob/master/results/nidmresults-examples_spm_2_t_test_batch.jsonld#L26

What happened?

the prefix niiri: is used for the identifier : "@id": "niiri:cfg_basicio.file_dir.file_ops.file_move._1gWIqVrGmQZ".

Describe the expected ground-truth ?

We have now decided (more context in bids-standard#69) that we no longer will use niiri as a prefix and therefore those should be replaced by urn.

What soft is concerned

SPM

Remove "nidmresults-examples" in examples filename

Is your feature request related to a problem? Please describe.
The filenames in the results folder are too long and therefore the extension sometimes hidden (replaced by ...) which make it difficult to parse the examples:

Describe the solution you'd like
Could we shorten the filenames to remove the nidmresults-examples prefix?

Describe alternatives you've considered
If there are other ideas to shorten filenames, those can be implemented at the same time :)

Additional context
This applies to both SPM and FSL examples

[Bug]: Reading a line beginning with anything other than a lowercase letter or /

batch name and line

This bug applies to all nidm examples.

For example, for the file fsl_con_f_multiple and for the step "Feat main script", the lines following the line containing "31622" (line 18) are not read.

What happened?

For each step (example: Feat main script), as soon as a line does not start with a lower case letter [a-z] or a/, the rest of the lines of the step are not read.

Describe the expected ground-truth ?

The following lines should also be read and analysed.

What soft is concerned

FSL

[Bug]: Use UUID as identifiers

batch name and line

In all identifiers.

For example here: https://github.com/Inria-Empenn/BEP028_BIDSprov/blob/master/results/nidmresults-examples_spm_2_t_test_batch.jsonld#L26

What happened?

The identifier is not a UUID : "@id": "niiri:cfg_basicio.file_dir.file_ops.file_move._1gWIqVrGmQZ".

Describe the expected ground-truth ?

The suffix (in the above example cfg_basicio.file_dir.file_ops.file_move._1gWIqVrGmQZ) should be replaced by a random identifier generated using the uuid library. This would lead to something like:

"@id": "niiri:08f5a59a47f33372cedc55d11fc67e1a"

(Note niiri will also be replaced by urn as described in #6)

What soft is concerned

No response

Provide moe human-readable labels for activities

Is your feature request related to a problem? Please describe.
Currently the labels for the activities are automatically extracted from the keys in the matlabbatch, this is great but could be improved to have more human-readable labels.

Describe the solution you'd like
We could add a mapping between the keys in the matlabbatch and a human-readable name/label. This could be stored in a parameter file very similarly to what is done to add inputs to activities.

What do you think?

Update contributor's list

Hi @cyril-data @hermann74 and @thomasbtnfr! Your name should really be on the contributor's list at: https://github.com/bids-standard/BEP028_BIDSprov#who-is-building-bep-028

Can you update the README to add your names?

In addition, it would be great to use emoji's as in https://github.com/bids-standard/bids-starter-kit#contributors- , can you add this as well?

Thanks!

[Bug]: Command field for SPM

batch name and line

Throughout SPM, we do not have the Command field for activities.

What happened?

There is no Command field from the bids-prov specification.

Describe the expected ground-truth ?

We can add this Command field. We can add this field but we risk having very long commands because an activity sometimes takes place over dozens of lines.

Example :

matlabbatch{7}.spm.spatial.preproc.channel.vols(1) = cfg_dep('GunZip Files: GunZipped Files', substruct('.','val', '{}',{4}, '.','val', '{}',{1}, '.','val', '{}',{1}, '.','val', '{}',{1}), substruct('()',{':'}));
matlabbatch{7}.spm.spatial.preproc.channel.biasreg = 0.001;
matlabbatch{7}.spm.spatial.preproc.channel.biasfwhm = 60;
matlabbatch{7}.spm.spatial.preproc.channel.write = [0 1];
matlabbatch{7}.spm.spatial.preproc.tissue(1).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,1'};
matlabbatch{7}.spm.spatial.preproc.tissue(1).ngaus = 1;
matlabbatch{7}.spm.spatial.preproc.tissue(1).native = [1 0];
matlabbatch{7}.spm.spatial.preproc.tissue(1).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(2).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,2'};
matlabbatch{7}.spm.spatial.preproc.tissue(2).ngaus = 1;
matlabbatch{7}.spm.spatial.preproc.tissue(2).native = [1 0];
matlabbatch{7}.spm.spatial.preproc.tissue(2).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(3).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,3'};
matlabbatch{7}.spm.spatial.preproc.tissue(3).ngaus = 2;
matlabbatch{7}.spm.spatial.preproc.tissue(3).native = [1 0];
matlabbatch{7}.spm.spatial.preproc.tissue(3).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(4).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,4'};
matlabbatch{7}.spm.spatial.preproc.tissue(4).ngaus = 3;
matlabbatch{7}.spm.spatial.preproc.tissue(4).native = [1 0];
matlabbatch{7}.spm.spatial.preproc.tissue(4).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(5).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,5'};
matlabbatch{7}.spm.spatial.preproc.tissue(5).ngaus = 4;
matlabbatch{7}.spm.spatial.preproc.tissue(5).native = [1 0];
matlabbatch{7}.spm.spatial.preproc.tissue(5).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(6).tpm = {'/storage/essicd/data/NIDM-Ex/spm12_update/tpm/TPM.nii,6'};
matlabbatch{7}.spm.spatial.preproc.tissue(6).ngaus = 2;
matlabbatch{7}.spm.spatial.preproc.tissue(6).native = [0 0];
matlabbatch{7}.spm.spatial.preproc.tissue(6).warped = [0 0];
matlabbatch{7}.spm.spatial.preproc.warp.mrf = 1;
matlabbatch{7}.spm.spatial.preproc.warp.cleanup = 1;
matlabbatch{7}.spm.spatial.preproc.warp.reg = [0 0.001 0.5 0.05 0.2];
matlabbatch{7}.spm.spatial.preproc.warp.affreg = 'mni';
matlabbatch{7}.spm.spatial.preproc.warp.fwhm = 0;
matlabbatch{7}.spm.spatial.preproc.warp.samp = 3;
matlabbatch{7}.spm.spatial.preproc.warp.write = [0 1];

What do you think @cmaumet ?

What soft is concerned

SPM

[Bug]: Identifier for the agent should be a uuid

batch name and line

Currently the identifier of the agent is hard-coded to be exampleAgentID

for example here: https://github.com/Inria-Empenn/BEP028_BIDSprov/blob/master/results/nidmresults-examples_spm_2_t_test_batch.jsonld#L17

this should be removed by a UUID (as for any activity/entity)

What happened?

Identifier of the agent is currently: "@id": "exampleAgentID"...

Describe the expected ground-truth ?

... but should look like:

"@id": "urn:08f5a59a47f33372cedc55d11fc67e1a"

Note that beyond the definition, this identifier should then be reused each time an activity has a wasAssociatedWithrelation to the agent.

For example in https://github.com/Inria-Empenn/BEP028_BIDSprov/blob/master/results/nidmresults-examples_spm_2_t_test_batch.jsonld#L31, instead of "wasAssociatedWith": "RRID:SCR_007037"we should have "wasAssociatedWith": "urn:08f5a59a47f33372cedc55d11fc67e1a"

What soft is concerned

SPM

[Bug]: Convert absolute path to relative path in matlab file ?

batch name and line

In bids_prov/tests/samples_test/batch_example_spm.m, line 6 :

matlabbatch{1}.cfg_basicio.file_dir.file_ops.file_move.files = {'/storage/essicd/data/NIDM-Ex/BIDS_Data/DATA/BIDS/ds011/sub-01/func/sub-01_task-tonecounting_bold.nii.gz'};

What happened?

Where can we cut the string :

/storage/essicd/data/NIDM-Ex/BIDS_Data/DATA/BIDS/ds011/sub-01/func/sub-01_task-tonecounting_bold.nii.gz

for relative path conversion ?

Describe the expected ground-truth ?

The expected ground-truth : Don't know: this is the question

ds011/sub-01/anat/sub-01_T1w.nii.gz ?
sub-01/anat/sub-01_T1w.nii.gz ?

What soft is concerned

SPM
@cmaumet