aidasoft / podio Goto Github PK
View Code? Open in Web Editor NEWPODIO
License: GNU General Public License v3.0
PODIO
License: GNU General Public License v3.0
As discussed in the context of EDM4HEP support for some metadata, or any information that is saved together with a collection would be useful. For example, this could be used to save a readout string (system:4,layer:5,module:18,x:-15,z:-15
) along with a dd4hep cellID. My first idea was to simply add a member to the collection template. @gaede , maybe you can comment how this is done in LCIO? Cheers
I have started to work on making the python code that fills the templates to generate the full classes a bit more readable. Ideally this would also result in better maintainability. While doing this I have stumbled upon some things for which I didn't really find anything in the documentation (or equally likely, I have missed it).
Some fields in the data model syntax are not really used, respectively after reading them in, they are not used for the class generation. E.g.:
const_declaration
in the ExtraCode
field of components (#115)ConstExtraCode
, TransientMembers
, Typedefs
for the datatypesOn the other hand it seems that const_declaration
and const_definition
should have been placed under the ConstExtraCode
but are now generally placed in ExtraCode
. It is not entirely clear from the documentation what the difference between declaration
and const_declaration
is or should be. As far as I have gathered the former goes only into the DataType
, while the latter additionally also goes into the ConstDataType
(?)
I am not sure what the original design idea was, but it seems to me that it is no longer rigorously followed.
I have also been playing with a python schema validation library, and from a few quick tests it looks to be feasible to apply to the podio datamodel files. In the gist I already do some very basic parsing of the members of the datatypes. I think to get the very basic errors that can happen in the yaml file this would be enough. It might even be possible to do a more thorough validation, but that would need to be tested. The main question here is, is it OK to pull in another dependency for this or should we aim to do this without?
There are quite a few calls to ROOT relying on global states like creating TTrees implicitly within the current gDirectory. Would it make sense to be more explicit in these calls?
The current travis configuration does not seem to work, it fails before trying the package:
The CI workflow for mac is currently always failing because of a
Fatal Python error: PyThreadState_Get: no current thread
occurring during the pyunittest
. From what I could gather up until now this is most likely related to some cross talk between different python versions on macOS. However, I haven't really been able to diagnose this any further.
Opened by @zoujh in EDM4hep: key4hep/EDM4hep#54
When getting a product from the EventStore
, e.g. via store.get<edm4hep::TrackCollection>("tracks")
, podio currently does not even issue any warning that the product is not available. Instead a default constructed object of the desired type is returned. The most trivial way this can happen is a simple typo. However, it might actually be the case that a user requires a product that is simply not present in the EventStore
.
I think podio should at least print a warning statement that the desired product is not found. Arguably, it should be possible for the user to check whether the object is present in the EventStore or not. (I know this is in principle possible with I just realized that this is actually easily possible by asking the returned collection whether it is valid via bool EventStore::get(const std::string&, const T*&)
, but I think this should also be possible using the more ergonomic EventStore::get(const std::string&)
).collection.isValid()
. The question then becomes do we want to force the user to check whether the collection is valid or not?
This would be an excellent use case for optional
, but that would require at least c++17, unless we want to pull in boost
.
When building podio libpodioDict.so
is not always properly linked into executables or other shared libraries, like libpodioRootIO.so
or the test executables. I am not sure whether it is a podio problem or something that is more closely related to some subtle cmake issue, as it e.g. occurs on Ubuntu 18.04, but not on Fedora 29 (or on lxplus).
I have prepared an Ubuntu docker image where the problem occurs and a Fedora docker image, where it does not.
To reproduce the problem in the Ubuntu container pull it and run it interactively, then do
bash build_podio.sh
This will build podio and run tests/write > /dev/null
as well as ldd tests/write
in the podio build directory. And result in output along the following lines
[...]
Error in <TTree::Branch>: The class requested (vector<podio::ObjectID>) for the branch "WithNamespaceRelationCopy#1" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vector<podio::ObjectID>) to avoid to write corrupted data.
Error in <TTree::Branch>: The pointer specified for CollectionIDs is not of a class or type known to ROOT
linux-vdso.so.1 (0x00007ffdfc9e1000)
libTestDataModel.so => /home/hsf/podio/build/tests/libTestDataModel.so (0x00007f0b7867d000)
libpodioRootIO.so => /home/hsf/podio/build/src/libpodioRootIO.so (0x00007f0b78472000)
libpodio.so => /home/hsf/podio/build/src/libpodio.so (0x00007f0b78269000)
libCore.so.6.20 => /home/hsf/root/lib/libCore.so.6.20 (0x00007f0b77b69000)
[...]
Repeating the same steps in the Fedora container leads to the following output, where only ldd tests/write
is now present, since tests/write > /dev/null
runs without problems.
[...]
linux-vdso.so.1 (0x00007ffe1b542000)
libTestDataModel.so => /home/hsf/podio/build/tests/libTestDataModel.so (0x00007f925746c000)
libpodioRootIO.so => /home/hsf/podio/build/src/libpodioRootIO.so (0x00007f925745f000)
libpodioDict.so => /home/hsf/podio/build/src/libpodioDict.so (0x00007f9257454000)
libpodio.so => /home/hsf/podio/build/src/libpodio.so (0x00007f9257449000)
libTree.so.6.20 => /home/hsf/root/lib/libTree.so.6.20 (0x00007f92572b5000)
[...]
The most important difference is that now libpodioDict.so
is present.
To me it is currently unclear from where this difference arises, but it seems to be unrelated to the exact gcc
version that is used on Ubuntu and also on how root
has been built.
The only hint towards a problem can be seen when using cmake-3.10
on Ubuntu, where the following warning appears at the cmake
stage
You have called ADD_LIBRARY for library podioDict without any source files. This typically indicates a problem with your CMakeLists.txt file
This message does not appear with newer cmake versions (tested with 3.14 and 3.17). It is also inconsequential as the outcome remains unchanged.
We should setup some continuous integration for this repo.
This would involve preparing a few docker containers with a self-contained environment on the platforms we want to test and using Travis to clone, build and test the code.
The https://github.com/HSF/prmon repo shows how to do that in a pretty straight forward way.
When converting a type to its const version, e.g. ExampleMC
to ConstExampleMC
, the iterator ranges describing OneToManyRelations
break if there is no related object. This seems to be true for other types as well. At least it also occurs for the edm4hep::ReconstructedParticle
from EDM4hep
With the following code snippet it is possible to produce a segmentation violation in the second loop over the collection, where each element is converted to a ConstExampleMC
before printing the daughters.
podio::EventStore store;
auto& collection = store.create<ExampleMCCollection>("collection");
// Create a few elements
for (int i = 0; i < 4; ++i) {
auto elem = collection.create();
elem.PDG(i + 10000);
}
// Add an alement adding some of the previously created elements as daughters
auto motherElem = collection.create();
motherElem.PDG(100);
motherElem.adddaughters(collection[1]);
motherElem.adddaughters(collection[2]);
motherElem.adddaughters(collection[3]);
// Add a daughter to the first element in the collection to show that the
// problem is not related to the position in the collection
collection[0].adddaughters(collection[3]);
// This works without problems and everything is as expected
for (const auto& elem : collection) {
std::cout << "element: " << elem.id() << " -> PDG: " << elem.PDG()
<< " (" << elem.daughters_size() << " daughters)" << std::endl;
for (auto it = elem.daughters_begin(); it != elem.daughters_end(); ++it) {
std::cout << " daughter: " << it->id() << " -> PDG: " << it->PDG() << std::endl;
}
}
std::cout << std::endl << std::endl;
for (const auto& elem : collection) {
// Convert to a ConstExampleMC and see things break
ConstExampleMC constElem{elem};
std::cout << "const element: " << elem.id() << " -> PDG: " << constElem.PDG()
<< " (" << constElem.daughters_size() << " daughters)" << std::endl;
// As soon as there are no daughters, this iterator range is invalid and we
// get a segmentation violation
for (auto it = constElem.daughters_begin(); it != constElem.daughters_end(); ++it) {
std::cout << " daughter: " << it->id() << " -> PDG: " << it->PDG() << std::endl;
}
}
The first loop works and gives the desired output:
element: 10000000 -> PDG: 10000 (1 daughters)
daughter: 10000003 -> PDG: 10003
element: 10000001 -> PDG: 10001 (0 daughters)
element: 10000002 -> PDG: 10002 (0 daughters)
element: 10000003 -> PDG: 10003 (0 daughters)
element: 10000004 -> PDG: 100 (3 daughters)
daughter: 10000001 -> PDG: 10001
daughter: 10000002 -> PDG: 10002
daughter: 10000003 -> PDG: 10003
During the second loop, the first element gets printed as expected, but trying to loop over the second element (without any daughters) a segmentation violation occurs:
const element: 10000000 -> PDG: 10000 (1 daughters)
daughter: 10000003 -> PDG: 10003
const element: 10000001 -> PDG: 10001 (0 daughters)
*** Break *** segmentation violation
daughter:
The stacktrace is the following:
#0 0x00007f98b00a1c2a in __GI___wait4 (pid=941256, stat_loc=stat_loc
entry=0x7ffcbe3d29f8, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1 0x00007f98b00a1beb in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7ffcbe3d29f8, options=options
entry=0) at waitpid.c:38
#2 0x00007f98b00110e7 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3 0x00007f98afe2a069 in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x5596e81bf800) at /tmp/tmadlener/spack-stage/spack-stage-root-6.20.04-3mcg47i4cbc57dngsf3ab3lpsp63xlny/spack-src/core/unix/src/TUnixSystem.cxx:2107
#4 TUnixSystem::StackTrace (this=0x5596e81bf800) at /tmp/tmadlener/spack-stage/spack-stage-root-6.20.04-3mcg47i4cbc57dngsf3ab3lpsp63xlny/spack-src/core/unix/src/TUnixSystem.cxx:2397
#5 0x00007f98afe271fc in TUnixSystem::DispatchSignals (this=0x5596e81bf800, sig=kSigSegmentationViolation) at /tmp/tmadlener/spack-stage/spack-stage-root-6.20.04-3mcg47i4cbc57dngsf3ab3lpsp63xlny/spack-src/core/unix/src/TUnixSystem.cxx:3628
#6 <signal handler called>
#7 0x00007f98b046aece in ConstExampleMC::getObjectID (this=0x0) at /home/tmadlener/work/podio/tests/src/ExampleMCConst.cc:104
#8 0x00005596e75cebcb in ConstExampleMC::id (this=0x0) at /home/tmadlener/work/podio/tests/datamodel/ExampleMCConst.h:83
#9 main () at /home/tmadlener/work/podio/tests/invalid_iterator/invalid_iterator_repro.cpp:46
Apparently ConstExampleMC::id
gets called on an invalid object. However, this should not even happen, since daughters_begin
and daughters_end
should describe an empty range, and it should not be possible to enter the loop.
The major problem is that T
can be implicitly converted to ConstT
(e.g. at function boundaries), so that the explicit conversion I did here is not too far fetched to actually happen.
Extra code that is declared as const_declaration
is not generated for components.
Since there really is no difference between const_declaration
and declaration
for components, I think it is OK to only consider the declaration
for components. However, since then const_declaration
is most likely a user input error, the validation should get this.
Do I miss something, that would make a differentiation between const_declaration
and declaration
necessary for components?
It is not really visible with the 2000 events that are currently used for the tests, but simply increasing that number and monitoring the memory consumption of the corresponding process shows that it continuously increases with the running time of the process. I am not yet sure where / what memory is leaked, but it is not negligible. For me the memory footprint for tests/write
increases from about 120M at startup to 1150M when writing 1e6 events. Similarly, tests/read
starts off at around 220M and increases to 1000M at the end.
Repeating the tests with different readers / writers, e.g. the SIOReader
and SIOWriter
from #130, shows similar behavior, so it seems that the interplay between the readers / writers and the EventStore could be the culprit here.
[ 93%] Building CXX object tests/CMakeFiles/read-multiple.dir/read-multiple.cpp.o
/Users/mato/Development/LCGSoft/build/frameworks/podio-00.11/src/podio/00.11/tests/write.cpp:73:23: error: implicit instantiation of undefined template 'std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
std::stringstream ss ; ss << " event_number_" << i ;
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:139:32: note: template is declared here
class _LIBCPP_TEMPLATE_VIS basic_stringstream;
^
1 error generated.
make[5]: *** [tests/CMakeFiles/write.dir/write.cpp.o] Error 1
make[4]: *** [tests/CMakeFiles/write.dir/all] Error 2
make[4]: *** Waiting for unfinished jobs....
When reading from more than one file the reading of the meta data breaks after the first file has been read. The read-multiple
test doesn't actually cover this so it has gone undetected until now.
The reason seems to be that in
Lines 18 to 20 in 6a1e506
the call to m_chain->GetFile()
possibly still returns the "old" file and has not necessarily switched to the "new" file yet. For this to happen a call to m_chain->LoadTree(m_eventNumber)
needs to happen, which is happening only in readCollection
:
Line 84 in 6a1e506
This issue is easily solved for meta data that is expected to change with every event / run. However, it is a bit harder, also from a conceptual point of view, for meta data that are read at the beginning of the file which is done in openFiles
, but only for the first file. It is possible that the stored meta data in the next file are different, but currently this is not taken into account when switching files.
Looking at https://github.com/hegner/podio/blob/a55171677ccd2b2b6de15afd828a6aceee61bd4e/init.sh it very aggressively modifies the environment, setting up an old LCG release with the clang compiler on top on any Linux machine that has cvmfs installed.
I understand that it's useful not to try to compile in a broken environment, but surely one only should help after checking that no suitable environment yet exists, e.g.,
Although honestly I would be happier if we rather added just some advice on setting up the LCG environment if required rather than writing magic to do so. (It's odd that ROOT and yaml are explicit, in the README but not the compiler, right?)
Can we make containers non-copyable?
(a follow up on #109)
Since the past few weeks have revealed some smaller and larger issues in the current implementation of podio
, I have tried to compile a list of things that I think should be addressed. I have split it into a major and minor issues part that mainly reflect how much work I estimate would be needed and how easily they could be addressed. This should mainly serve as a starting point for discussion and more things might get discovered along the way, while others might become obsolete. Also some of this points might be considered personal opinion or taste, but these were at least things I stumbled upon when starting to work with podio
.
Issues that require a considerable (i.e. non-negligible) amount of work or redesign to parts of podio and are also partially related to each other so that they can not be easily addressed in isolation
EventStore::create()
returns a reference to the created collection. Especially with auto
this can be dangerous as omitting the &
will turn this into a copy which will not have a relation to the EventStore. Hence, no element of the collection created on this copy will actually be known, tracked and stored by the EventStore.
Conversely Collection::create()
returns a value type and trying to bind it to a reference will actually fail during compilation. This difference between collections and their elements has the potential for a lot of confusion, especially for less well versed newcomers in c++.
Additionally, since references are not easy to store in STL containers, it would be nice to have some sort of "value-type" wrapper similar to the elements of the collections.
EventStore::get()
always returns a const
reference to the collection. Collections can only be modified in the process in which they are created which is not really a bad thing to have. However, in some cases it should potentially be possible. This is related to the issue that EventStore::create()
returns a reference, which makes a const-cast necessary at the moment for the implementation of the DelphesEDM4HepConverter
(https://github.com/key4hep/EDM4hep/blob/delphes/plugins/delphes/DelphesEDM4HepConverter.h#L185).
It is impossible to read collections from file and write them to another file. Related to this, I don't think it is possible to attach a Reader and a Writer to the same EventStore simultaneously at the moment. This should be possible, since it is a rather common task in reconstruction to read in some collections and produce some higher level collections from them.
IReader
is incomplete (at least if it is meant to be public)The current API of IReader
seems to be mainly aimed at internal usage for low(er) level I/O operations rather than an actually public facing API.
Should we consider defining a "complete" (whatever complete means in this sense) API for e.g. IReader
and IEventStore
(or ICollectionProvider
and IMetaDataProvider
as it currently stands). This would then allow to more easily provide different but consistent solutions (e.g. I/O with ROOT
and HDF5
)
The current canonical (?) way of reading files is to use ROOTReader::endOfEvent()
to read the next event at the end of the current event (Additionally EventStore::clear()
has to be called alongside it). Quite commonly some conditions are checked at the beginning of an event and if they are not met the rest of the event is skipped. This means that endOfEvent
has to be called at every place such a skip can occur or otherwise no new event will ever be read. This could be potentially be achieved by a wrapper that uses RAII to read the next event once the old one goes out of scope, but currently that would probably very strongly couple the EventStore and the Reader.
Smaller issues / bugs that can be easily addressed without too much work and (mostly) independent of each other
Objects with OneToManyRelations
should allow to loop over these relations using range-based for-loops. The current implementation only provides relation_begin()
and relation_end()
iterator ranges.
Objects with OneToManyRelations to the same type that form a cycle (e.g. MCParticle daughter and mother) lead to an infinite loop in operator<<
When trying to invoke podio_class_generator.py
using python3.8 with the --clangformat
option the following error occurs
Traceback (most recent call last):
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 1189, in <module>
gen.process()
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 71, in process
self.process_components(self.reader.components)
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 79, in process_components
self.create_component(name, components["Members"])
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 948, in create_component
self.fill_templates("Component", substitutions)
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 1136, in fill_templates
self.write_file(filename, content)
File "/home/tmadlener/work/podio/python/podio_class_generator.py", line 1088, in write_file
content = cfproc.communicate(input=content)[0]
File "/usr/lib/python3.8/subprocess.py", line 1024, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.8/subprocess.py", line 1846, in _communicate
input_view = memoryview(self._input)
TypeError: memoryview: a bytes-like object is required, not 'str'
This seems to be one of those cases where python3 behaves differently from python2. However, I have to setup a python2 environment first to see if it actually works there to confirm that it is indeed only the difference between the two versions.
If that is confirmed I would try to make this work in both versions.
Hello,
It would great if PODIO could be made to read LCIO files. What are the major challenges to implementing this?
Is it as straightforward as creating an SIOReader and use the right LCIO data model?
Cheers,
Whit
The EventStore currently has two methods to access it's collections by name:
/// access a collection by name. returns true if successful
template<typename T>
bool get(const std::string& name, const T*& collection);
/// access a collection by name
/// returns a collection w/ setting isValid to true if successful
template<typename T>
const T& get(const std::string& name );
That's not really ideal when trying to create collections from an external datasource where it is not known how many / which collections there are at compile time. In this case, ideally I would create branches according to what the external datasource provides and keep track of the collections in a map: branchname -> collectionptr.
The const pointer cannot be used for this purpose, as I cannot call create
in the next step of filling the branches in the event loop. And dealing with maps involving references is quite painful - maybe it's possible but I haven't figured it out. I managed to implement the same thing with a vector of references, but would then have to do a lot more bookkeeping. Simply adding a getter for non-const pointers seems like the easiest way.
There are some issue with finding python on ubuntu and with certain cmake versions (check your pipeline, it finds the wrong python) see problem explanation and solutions
In cmake/podioMacros.cmake, an COMMAND_EXPAND_LISTS
option for add_custom_command
is used. This option is not valid until CMake 3.8. But in the main CMakeLists.txt, the minimum required CMake is 3.3.
I'm starting to try out podio for use in the SuperNEMO experiment. One immediate issue I encountered is that our current data model uses fixed width integers from the Standard Library cstdint
such as uint16_t
. Podio's class generator complains about the use of these (naturally!), but are there reasons podio could not support them?
I'd be happy to experiment with adding them and submit a PR if there's no blocker - it looks like ClassDefinitionValidator
would be the place to do this, but clearly an extra #include <cstdint>
s would be needed somewhere (like for std:string
?)
It is easily possible to enter an infinite loop (and a subsequent core dump) using the ostream operator if OneToManyRelations
are used like they are e.g. in ExampleMC
(only the important parts here):
ExampleMC:
OneToManyRelations:
- ExampleMC parents
- ExampleMC daughters
Properly setting mother/daughter relations in this case
ExampleMC mcp1;
ExampleMC mcp2;
mcp1.adddaughters(mcp2);
mcp2.adddaughters(mcp1);
will lead to an infinite loop in the ostream operator, since that is generated to be
std::ostream& operator<<( std::ostream& o,const ConstExampleMC& value ){
o << " id : " << value.id() << std::endl ;
o << " parents : " ;
for(unsigned i=0,N=value.parents_size(); i<N ; ++i)
o << value.parents(i) << " " ; // the problem is here
o << std::endl ;
o << " daughters ): " ;
for(unsigned i=0,N=value.daughters_size(); i<N ; ++i)
o << value.daughters(i) << " " ; // and also here
o << std::endl ;
return o ;
}
and is thus calling itself over and over again if such a circular relation is found.
As a result of #54 the EDM egenration does not work anymore for external clients (EDM4Hep).
The reason is the hardcoded template_dir path "../templates" that is no longer available in the install. @drbenmorgan can you have a look at this ?
Current workaround:
cd INSTALL_DIR
ln -s SOURCE_DIR/templates .
Line 6 in 9b97c14
OS version: e.g. SLC6 Mageia
Compiler version: e.g GCC 6.2 10.2
PODIO version: tag or commit ID, or GitHub branch 9c86507
Reproduced by: exact steps to reproduce the problem: checkout, setup environment (Geant, ROOT versions, iLCSoft release), cmake, build, run...
Using ROOT 6.22.0
Build and run make tests
The python regression tests are failing
The first issue is that the variable LD_LIBRARY_PATH is not set so that python cannot find the .so files that are build in the compile.
Once I set the environment variables, I'm getting this error....
======================================================================
ERROR: test_chain (test_EventStore.EventStoreTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/joe/local2/bench/git/podio/python/test_EventStore.py", line 91, in test_chain
numbers.append(evinfo[0].Number())
TypeError: 'CollectionBase' object is not subscriptable
======================================================================
ERROR: test_collections (test_EventStore.EventStoreTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/joe/local2/bench/git/podio/python/test_EventStore.py", line 33, in test_collections
self.assertTrue(len(evinfo) > 0)
ReferenceError: attempt to access a null-pointer
======================================================================
ERROR: test_hash (test_EventStore.EventStoreTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/joe/local2/bench/git/podio/python/test_EventStore.py", line 66, in test_hash
for cluster in clusters:
TypeError: 'CollectionBase' object is not iterable
======================================================================
ERROR: test_one_to_many (test_EventStore.EventStoreTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/joe/local2/bench/git/podio/python/test_EventStore.py", line 49, in test_one_to_many
for cluster in clusters:
TypeError: 'CollectionBase' object is not iterable
Currently we cannot write out event data previously read in from a file to another file.
This is a standard use case, e.g. in reconstruction or any other event data processing job.
See example ./tests/read_and_write.cpp (see #102 ).
There is a need to setup some rpath handling, something along these lines would be good
https://github.com/root-project/root/blob/master/cmake/modules/RootBuildOptions.cmake#L401-L422
same for EDHM4hep
Add a helper class vector_view
and code generation such that on can write range based for loops, e.g.
auto mcps = store.get<MCParticleCollection>("MCParticles") ;
for( auto& p : mcps ){
for( auto& d : p.getDaughters() ) {
printDaughter(d) ;
}
}
I do not meet any problems and I just want to ask a question.
It is said that podio would have good performance in multi-thread or parallel computing. Is there an example for multi-thread or parallel Input/Output now in the newest version of podio?
I would like to start a discussion about how to improve the interplay between the different entities that are involved in reading and writing the data as well as the ones that provide access to the contents of the events. I have broadly structured this into three topics, but they are all somewhat related, which is also why I have kept them in one issue here.
From the implementation of the EventStore
it seems like it takes ownership of the resources that are provided by, e.g. the ROOTReader
. I am not sure if this comment
Lines 163 to 167 in d4a0265
At the moment there are only two types of data:
I think in this case it makes sense that the EventStore
owns the data, respectively takes responsibility for calling the necessary clear
methods of the collections. In this case the collections need to make sure that they truly free all resources when asked to do so.
There are two things that are currently referred to as metadata. The metadata that is necessary to reconstruct the collections and the relations between them, and the "user level" metadata at the Event, Run or Collection level. The former is only the CollectionIDTable
that is necessary to correctly assign the IDs to the names of the collections, while the latter are currently stored as GenericParameters
.
I think for both the collection data and the metadata it makes sense that the EventStore owns the data and is responsible for properly cleaning them up if they are no longer used. Also in view of a possible multi threading capable version of the EventStore it makes sense that the EventStore receives all the necessary data from the reader and then handles them from there and doesn't need to query the reader ever again. This is not the case at the moment, where it is possible that some collections are lazily loaded throughout the event.
I am not entirely sure whether the writers could serve as a sort of sink, or if it is easier to simply pass a reference of the data to the writer and still let the EventStore clean up the data once they are written.
When reading from more than one file, or at a run change it might become necessary for the reader to signal such a change to the EventStore, respectively at least trigger the update of some of the metadata. #132 covers a bug in the reader, where even upon request from the EventStore potentially the wrong data are provided.
I think ideally the EventStore is completely agnostic to such changes and is simply able to provide the correct data to the user. However, I am not entirely sure whether this is actually possible in all cases. E.g. the CollectionIDTable
might change at a file change (for whatever reason).
As mentioned in #136 the current implementation of providing access to the Event-, Run- and CollectionMetaData is not const-correct. There were also some discussions already in #109 and in this comment that the current implementation of const-handling is not completely consistent overall.
Const-correctness (almost) mandates that once a collection has been passed to the EventStore after creation it should be non-mutable for the rest of it's lifetime. Maybe the interface should reflect this, in a way that the current registerCollection
becomes the canonical way of adding collections to the EventStore?
I think podio should support true reference collections. Currently something similar can be achieved by defining a helper data type, a la
DataType:
Members:
- int member // just some exemplary member
DataTypeReference:
Members: {}
OneToOneRelations:
- DataType referenced // reference to the actual datatype
However, this is slightly awkward in its usage, as there is always an additional indirection to get to the actual information
auto& referenceCollection = eventStore.get<DataTypeReferenceCollection>("refCollection");
auto ref = referenceCollection[0];
auto data = ref.getReferenced();
int i = data.getMember();
Also setting this is slightly cumbersome
auto& dataCollection = eventStore.create<DataTypeCollection>("dataCollection");
auto& referenceCollection = eventStore.create<DataTypeReferenceCollection>("refCollection");
auto data = dataCollection.create();
auto ref = referenceCollection.create();
ref.setReferenced(data);
Ideally, reference collections behave the same as normal collections when they are const
, i.e. reference collections do not allow to alter the objects they reference in any way, but can be used for read only access to them. So something along the lines:
DataType:
Members:
- int member // just some exemplary member
for which podio would generate a DataTypeCollection
and a DataTypeRefCollection
(naming is just a proposal here, also to distinguish it from the above example more clearly). So that in the end the usage would look something like this
// when writing
auto& dataCollection = eventStore.create<DataTypeCollection>("dataCollection");
auto& referenceCollection = eventStore.create<DataTypeRefCollection>("refCollection");
auto data = dataCollection.create();
referenceCollection.push_back(data);
// changes to data should still be properly propagated to the reference object
// when reading
auto& referenceCollection = eventStore.get<DataTypeRefCollection>("refCollection");
auto data = referenceCollection[0];
int i = data.getMember();
I am not yet completely sure about the internals, but I think something like this should be possible. From a persistency point of view it should in principle work like any other OneToOneRelation
and storing a vector<ObjectID>
should be enough to rebuild the reference collection after reading it in. One of the major changes that would be necessary is probably to change the layout of the Obj
classes from storing a Data
object to only having a pointer to a Data
object. I have already briefly mentioned this earlier in #129 (comment) . However, since then I haven't really checked what would actually be necessary to make that change work.
This is at the moment just an idea and an early draft of an implementation, so any input is very welcome.
Define a policy about enabling "on-demand" creation and addition of containers
(a follow up on #109)
I just tried out podio using the new LCG release. Using /cvmfs/sft.cern.ch/lcg/views/LCG_96/x86_64-centos7-gcc8-opt/setup.sh
I get
Start 1: generate-edm
1/6 Test #1: generate-edm .....................***Failed 0.13 sec
Start 2: write
2/6 Test #2: write ............................ Passed 3.97 sec
Start 3: read
3/6 Test #3: read .............................***Failed 2.62 sec
Start 4: read-multiple
4/6 Test #4: read-multiple ....................***Failed 2.39 sec
Start 5: pyunittest
5/6 Test #5: pyunittest .......................***Failed 0.18 sec
Start 6: unittest
6/6 Test #6: unittest ......................... Passed 0.04 sec
The errors are of the type
...
2: Error in <TTree::Branch>: The pointer specified for CollectionIDs is not of a class or type known to ROOT
...
3: Error in <TTree::SetBranchAddress>: unknown branch -> CollectionIDs
3: reading event 0
3:
3: *** Break *** segmentation violation
and for the pyunittest
5: ImportError: Start directory is not importable: '/usr/local/python'
5/6 Test #5: pyunittest .......................***Failed 0.09 sec
There are several python2 only constructs in the podio_class_generator
for example.
print statements
xrange
x86_64-centos7-gcc62-opt, LCG94
We recently discovered that the FCC software framework looks for PODIO with a rather out-dated FindPODIO.cmake
script (https://github.com/HEP-FCC/FCCSW/blob/master/cmake/FindPODIO.cmake) I tried to remove that and use the proper config and targets, but it seems there are some errors in the template. For now this line gives an error: https://github.com/AIDASoft/podio/blob/master/cmake/podioConfig.cmake.in#L24 It seems like this line is still a copy-and-paste remnant. I can provide a fix once I make sure podioConfig.cmake
works
I cannot build the latest podio on my macbook (see error message below).
I suspect it is due to the split of the root dependencies into a separate library in #69.
The dictionary depends on the PythonEventStore
that in turn depends on the ROOTReader
but the libpodioRootIO is not linked.
Naively making the following change:
target_link_libraries(podioDict PUBLIC podioRootIO ) #podio::podio ROOT::Core ROOT::Tree)
results in a cyclic dependency:
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
"podioDict" of type SHARED_LIBRARY
depends on "podioRootIO" (weak)
"podioRootIO" of type SHARED_LIBRARY
depends on "podioDict" (weak)
At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed. Build files cannot be regenerated correctly.
@andre: Any idea how to fix this (and why it works on linux) ?
[ 6%] Linking CXX shared library libpodioDict.so
cd /Users/gaede/podio/podio/build/src && /usr/local/Cellar/cmake/3.15.1/bin/cmake -E cmake_link_script CMakeFiles/podioDict.dir/link.txt --verbose=1
/Library/Developer/CommandLineTools/usr/bin/c++ -std=c++14 -stdlib=libc++ -O2 -g -DNDEBUG -dynamiclib -Wl,-headerpad_max_install_names -o libpodioDict.so -install_name @rpath/libpodioDict.so CMakeFiles/podioDict.dir/podioDict.cxx.o -Wl,-rpath,/Users/gaede/podio/podio/build/src -Wl,-rpath,/data/ilcsoft/root/6.18.04/lib libpodio.so /data/ilcsoft/root/6.18.04/lib/libTree.so /data/ilcsoft/root/6.18.04/lib/libImt.so /data/ilcsoft/root/6.18.04/lib/libNet.so /data/ilcsoft/root/6.18.04/lib/libRIO.so /data/ilcsoft/root/6.18.04/lib/libThread.so /data/ilcsoft/root/6.18.04/lib/libCore.so
Undefined symbols for architecture x86_64:
"podio::ROOTReader::~ROOTReader()", referenced from:
ROOT::delete_podiocLcLPythonEventStore(void*) in podioDict.cxx.o
ROOT::deleteArray_podiocLcLPythonEventStore(void*) in podioDict.cxx.o
ROOT::destruct_podiocLcLPythonEventStore(void*) in podioDict.cxx.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
It currently seems to be impossible to use podio collections with the STL algorithms operating on iterator ranges, because the CollectionIterator
does not provide enough information, leading the compiler complaining about (greatly abbreviated to the crucial line):
/usr/include/c++/9/bits/stl_iterator_base_types.h:205:5: error: no type named ‘iterator_category’ in ‘struct std::iterator_traits<edm4hep::MCRecoParticleAssociationCollectionIterator>’
I think having a correctly implemented const-iterator is something we should aim for, to at least enable the usage of STL algorithms that can be used on ranges of const-iterators.
Hi,
Just starting a general discussion topic about using podio outside of HEP. Right now I'm working with a logistics company here in Hong Kong looking at using HEP software for things like algorithmic trading and financial data processing. The idea is to see if it would be possible to take software and frameworks like GAUDI and feed in stock trades rather than the detector events.
One of the problems in fintech is that there is a "tragedy of the commons" issue every bank and hedge fund uses their own event processing system. Once you've spent $$$$$ setting up a trading system, you don't want to share code. Code in the HEP community tends to be shared, because you need other groups to reproduce your results, and the "secret sauce" is the data and not the data reduction.
This is all very preliminary, and I don't know if it will turn out that the code is in fact the same, but that's research. However, haven't a high performance collection object that can handle low latency and multi-threading seems to be something that's a common component.
Some newbie questions:
What environment variables need to be set in order to get root loading to work?
Is there is simple text file read / write? I'd like to have a test system that just does a simple ascii/load save.
Any interest is setting up a chat room to deal with newbie questions? :-) :-) If so what's the preferred forum?
Looking through the code, the collections itself seems pretty simple, but the messy bits are the read/write with root.
Looking over the code, it seems that the read/write code is very tightly combined with the object code, and this causes some issues refactoring the code. For example, in trying to prevent bad copies, we are running into conflicts with ROOT, but those conflicts have to do with reading/writing.
So one thing that I propose is to separate out reading and writing code from the core components. In particular it should be possible to read/write objects and compile podio on a system without root. Alternatively it should be possible to use podio for format translation (i.e. read in the objects in ASCII format and output it into Root).
Would like some feedback as to what people think. If people think this is a good idea, then I'd like to work on some patches to separate out reading/writing from object handling, since a lot of the other issues that I've been looking at would be easier to handle if there was a clean separation between writing/reading and object handling
The recently added methods for meta data handling violate the concept of constness and thread-safety. Despite being declared as mutable internally, the return values of the meta data getters have to be const as well. Otherwise, multithreaded users of PODIO do not have a chance to implement concurrency.
Maybe the design of the meta data handling needs to be rethought?
When building podio on MacOS configured with -DCMAKE_CXX_STANDARD=17
I get the following compilation errors:
[ 98%] Built target write_ascii
In file included from /Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/unittest.cpp:7:
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:3453:14: error: no template named 'auto_ptr' in namespace 'std'
std::auto_ptr<StreamBufBase> m_streamBuf
~~~~~^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:3601:14: error: no template named 'auto_ptr' in namespace 'std'
std::auto_ptr<IStream const> m_stream
~~~~~^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:3592:28: error: cannot initialize return object of type 'const Catch::IStream *' with an rvalue of type 'Catch::DebugOutStream *'
return new DebugOutStream()
^~~~~~~~~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:5307:63: error: no matching function for call to 'getAllTestCasesSorted'
std::vector<TestCase> matchedTestCases = filterTests( getAllTestCasesSorted( config ), testSpec, config )
^~~~~~~~~~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:628:34: note: candidate function not viable: no known conversion from 'const Catch::Config' to 'const Catch::IConfig' for 1st argument
std::vector<TestCase> const& getAllTestCasesSorted( IConfig const& config )
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:5335:63: error: no matching function for call to 'getAllTestCasesSorted'
std::vector<TestCase> matchedTestCases = filterTests( getAllTestCasesSorted( config ), testSpec, config )
^~~~~~~~~~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:628:34: note: candidate function not viable: no known conversion from 'const Catch::Config' to 'const Catch::IConfig' for 1st argument
std::vector<TestCase> const& getAllTestCasesSorted( IConfig const& config )
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:5375:63: error: no matching function for call to 'getAllTestCasesSorted'
std::vector<TestCase> matchedTestCases = filterTests( getAllTestCasesSorted( config ), testSpec, config )
^~~~~~~~~~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:628:34: note: candidate function not viable: no known conversion from 'const Catch::Config' to 'const Catch::IConfig' for 1st argument
std::vector<TestCase> const& getAllTestCasesSorted( IConfig const& config )
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6201:105: error: reference to type 'const Ptr<const Catch::IConfig>' could not bind to an rvalue of type 'Catch::Config *'
Ptr<IStreamingReporter> reporter = getRegistryHub().getReporterRegistry().create( reporterName, config.get() )
^~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:5278:96: note: passing argument to parameter 'config' here
virtual IStreamingReporter* create( std::string const& name, Ptr<IConfig const> const& config ) const = 0
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6233:28: error: no viable conversion from 'Catch::Config *' to 'Ptr<const Catch::IConfig>'
Ptr<IConfig const> iconfig = config.get()
^ ~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:494:9: note: candidate constructor not viable: no known conversion from 'Catch::Config *' to 'const Catch::IConfig *' for 1st argument
Ptr( T* p ) : m_p( p ){
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:498:9: note: candidate constructor not viable: no known conversion from 'Catch::Config *' to 'const Catch::Ptr<const Catch::IConfig> &' for 1st argument
Ptr( Ptr const& other ) : m_p( other.m_p ){
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6355:17: error: no matching function for call to 'seedRng'
seedRng( *m_config )
^~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:405:10: note: candidate function not viable: no known conversion from 'Catch::Config' to 'const Catch::IConfig' for 1st argument
void seedRng( IConfig const& config )
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6358:21: error: no matching function for call to 'applyFilenamesAsTags'
applyFilenamesAsTags( *m_config )
^~~~~~~~~~~~~~~~~~~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6262:10: note: candidate function not viable: no known conversion from 'Catch::Config' to 'const Catch::IConfig' for 1st argument
void applyFilenamesAsTags( IConfig const& config ) {
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6431:26: error: no member named 'random_shuffle' in namespace 'std'
std::random_shuffle( sorted.begin(), sorted.end(), rng )
~~~~~^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:504:17: error: cannot initialize object parameter of type 'const Catch::SharedImpl<Catch::IConfig>' with an expression of type 'Catch::Config'
m_p->release()
^~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6289:9: note: in instantiation of member function 'Catch::Ptr<Catch::Config>::~Ptr' requested here
Session()
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:508:17: error: cannot initialize object parameter of type 'const Catch::SharedImpl<Catch::IConfig>' with an expression of type 'Catch::Config'
m_p->release()
^~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6315:26: note: in instantiation of member function 'Catch::Ptr<Catch::Config>::reset' requested here
m_config.reset()
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:496:17: error: cannot initialize object parameter of type 'const Catch::SharedImpl<Catch::IConfig>' with an expression of type 'Catch::Config'
m_p->addRef()
^~~
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:512:17: note: in instantiation of member function 'Catch::Ptr<Catch::Config>::Ptr' requested here
Ptr temp( p )
^
/Users/mato/Development/LCGSoft/build/podio-v00-09-02/src/podio/v00-09-02/tests/catch.hpp:6383:26: note: in instantiation of member function 'Catch::Ptr<Catch::Config>::operator=' requested here
m_config = new Config( m_configData )
^
14 errors generated.
make[6]: *** [tests/CMakeFiles/unittest.dir/unittest.cpp.o] Error 1
make[5]: *** [tests/CMakeFiles/unittest.dir/all] Error 2
make[4]: *** [all] Error 2
After preparing the environment and compiling with init.sh
, I am trying to follow the instruction under the section titled Running on the README page.
So I am in the directory: ~/podio/build and I try the following:
../install/tests/write
this prints out information about particle 0 through particle 9 on the console.
Next, I ran
../install/tests/read
which gives the following error: bash: ../install/tests/read: No such file or directory
It seems to me that the file example.root is not created. To verify installation, I ran
make test
from the build directory and apparently I am passing all the tests, but I can't understand why I am getting the above mentioned error.
NOTE: this issue started out as the one that is described in this first description here, but later changed to a slightly different but related topic below
Currently the options for generating the datamodel used for the tests are:
Lines 3 to 8 in 6a1e506
Consequently, these are also the only things that are currently really covered by the tests. In #130 a fourth option, createSIOHandlers
, will be added, making the number of possible combinations even larger. While we probably do not need to exhaustively test all of the combinations, at least having every option tested properly once would be a nice thing to have, I think.
The question now is how to most easily achieve this. Two options that came to my mind are the following (probably by far not the only ones):
datamodel.yaml
files with different values for the different options. Since some of the options are potentially not possible with the current definition (e.g. exposePODMembers
would lead to name clashes currently) this could be necessary in any case.datamodel.yaml
for multiple tests and simply calling the generator with different flags to generate the corresponding c++ files for multiple different options. However, this would introduce functionality which is only here for easier testing and not something that is needed for the users of podio
.In both cases the tests
folder would probably need a bit of restructuring, to keep the different cases nicely separated. Additionally, the getSyntax
option would also require to rewrite some of the tests while using a different syntax to access the members. On the other hand that would be an option that could be considered to not need exhaustive testing apart form successfully compiling it.
At least for #130 the easiest solution would be to have option 2 available, since we could then pass the corresponding option from cmake
to the code generator and would automatically have the corresponding files generated. Currently, we can either generate with createSIOHandlers: False
and break the newly introduced tests or we can generate with True
, which will break the build if SIO_HANDLERS=OFF
in cmake
.
Another, somewhat related, question is whether we want to enable building the tests and the test datamodel by default. Only building the things that are in the end installed takes probably less than 20 % of the build time, while building the tests, and especially the datamodel takes rather long.
Collection::isValid
only returns true
if the collection has previously been read from a file as the m_isValid
member is defaulted to false
and only set to true
in Collection::prepareAfterRead
. Naively I would have expected that a newly created collection should also be valid, e.g.
podio::EventStore store;
auto& collection = store.create<ExampleCollection>("aCollection");
collection.isValid(); // should be true but currently returns false
In my opinion the validity of a collection should not depend on whether it has been read from file or whether it has previously been created without an intermediate I/O operation. For the user this should be completely transparent and it should be possible to not care from where the collection is provided as long as it is known to the EventStore
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.