While evaluating legacy sop classes, @afshinmessiah and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.
This raised the discussion with @dclunie below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:
- this will be easier to understand and use for users trying to use legacy conversion functionality
- it may be easier to correct only errors that are important for legacy conversion specifically, rather than develop a tool that tries to fix all errors (e.g., not all of the attributes will be used for legacy conversion)
- if we first patch the input datasets, and then do the conversion, we would need to reference the ephemeral patched datasets, if we want to do things right
@hackermd did you think about this?
From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper
Hi Andrey
I also copied Steve.
In short, probably option 2 (patch the source dataset, and then
do the conversion).
It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.
E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.
Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.
I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.
But if one can make a fix, it would be nice to.
There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:
ย ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf
ย http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2
In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).
I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".
In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.
In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.
This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.
This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.
When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.
It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.
BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.
David
PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.
PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.
On 2/19/20 11:18 PM, Andrey Fedorov wrote:
Hi David,
As Afshin is working on the MF conversion task, we wanted to ask you a
"fundamental" question.
As you know, TCIA datasets may often have errors. What should be the
strategy for addressing those? Should we:
- carry forward those errors into MF representation, and just ignore
those while validating MF?
- patch the source dataset, and then do the conversion?
- patch the errors in the MF representation in the process of
conversion, and keep the originals intact?
I would probably prefer option 3.
AF