Coder Social home page Coder Social logo

Discussion on current issues about podio HOT 21 CLOSED

aidasoft avatar aidasoft commented on July 23, 2024
Discussion on current issues

from podio.

Comments (21)

hegner avatar hegner commented on July 23, 2024 2

Hi!

Could we split this issue into multiple ones? We now have parallel threads going on. I created #111 and #112 to move parts of it.

Thanks,
Benedikt

from podio.

joequant avatar joequant commented on July 23, 2024 1

Okay I think I know what's going on. I'm trying to read in root files that are corrupt when being written by write. When I run write, I get
'''
start processing
Error in TTree::Branch: The pointer specified for evtMD is not of a class known to ROOT and GenericParameters is not a known class
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "mcparticles#0" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "mcparticles#1" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
Error in TTree::Branch: The class requested (vectorpodio::ObjectID) for the branch "clusters#0" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vectorpodio::ObjectID) to avoid to write corrupted data.
'''

So it looks like something is being misconfigured.

As far the other statement, what I meant was that something to look at is to make sure that as much of the processing is done by
something that's part of the compler rather than by any external script. Basically you want to minimize the amount of code that gets generated by an external script and have the compiler do as much of the processing as possible.

For example rather than have something like

class (stringsubstitution1)Class {
}

class (stringsubstitution2)Class {
}

You want something that looks more like

#define CLASS_TEMPLATE(x) (single code generated)

CLASS_TEMPLATE(substitution1)
CLASS_TEMPLATE(substitution2)

I haven't looked deeply into the code so I don't know if this is an issue, but it's something I've seen in other scientific packages
like IRAF. What happened with IRAF is that they had a lot of external code processing that generated code for f77, which turns out to be painful/impossible to convert to f95. So what I'm thinking about is to try to make things work so that it won't be a pain to port the system over to C++2040 and python9 (and yes, it's the nature of this type of code that it stays around for a long time.)

from podio.

hegner avatar hegner commented on July 23, 2024

thanks Thomas for this very nice summary. Let me comment on the first two, which are about design choices:

  • Returning a reference. The semantics of a reference are in principle the right ones. I agree though that it is way too easy to do the wrong thing on user side. So something to work on.
  • The constness of collections on return was a design decision based on experience in the LHC experiments. I know that the LCIO choice was different w/ the isMutable flag. Mutability is very bad in the long run when supporting multithreaded applications, and we should avoid it as much as possible.

The other comments are about yet missing or buggy implementations. I essentially agree with all your assessment. The IReader part though was indeed meant as purely internal. For exactly the use case you mentioned.

I am happy about all the feedback you give. As PODIO's design needs quite some feedback on usability. A few things are just still in sort of prototype state.

from podio.

tmadlener avatar tmadlener commented on July 23, 2024

Hi Benedikt,

  • The constness of collections on return was a design decision based on experience in the LHC experiments. I know that the LCIO choice was different w/ the isMutable flag. Mutability is very bad in the long run when supporting multithreaded applications, and we should avoid it as much as possible.

I agree completely with this. I think the necessity for "non-const" access to the collections from the EventStore are currently also strongly related to the fact that creation returns an actual reference instead of some sort of handle which makes it hard to store in an STL container (even though std::reference_wrapper exists). At least for the DelphesEDM4HepConverter the issue mainly arises because we want to store all collections in one map as podio::CollectionBase*.

Nevertheless, I think we should consider the possible drawbacks of not having mutability. As far as I can see one could always work around non mutable collections by essentially copying an existing collection and change that on the fly. However, the feasibility of that probably depends on how often one actually would have to do that and also on the collections for which this would need to be done.

from podio.

joequant avatar joequant commented on July 23, 2024

Would it be possible to deal with the reference issue for EventStore::create() by using boost::noncopyable or having an undefined copy constructor? This will trigger a compile error if some tries to copy a class.

from podio.

hegner avatar hegner commented on July 23, 2024

actually I tried to do a delete on the copy constructor:
https://github.com/AIDASoft/podio/blob/master/python/templates/Collection.h.template#L57

ROOT I/O did not appreciate this...

from podio.

vvolkl avatar vvolkl commented on July 23, 2024

registerCollection is indeed an obvious way to avoid the const-casts in usecases like the delphes-conversion: vvolkl/EDM4HEP@98a0523

from podio.

joequant avatar joequant commented on July 23, 2024

I can take a look at the copy constructor issue since I need to get myself familiar with both podio and root. Do you have some test code that I can look at to debug?

actually I tried to do a delete on the copy constructor:
https://github.com/AIDASoft/podio/blob/master/python/templates/Collection.h.template#L57

ROOT I/O did not appreciate this...

from podio.

hegner avatar hegner commented on July 23, 2024

Great! Maybe things changed since I tried last time.

from podio.

joequant avatar joequant commented on July 23, 2024

What was the thing that broke? When I set the copy constructor for EventStore to delete, it compiles and unit test runs fine. I'm getting segfaults in root read and write, but this may or may not be due to the fact that I have a weird root set up. What did you see when you tried it?

One problem that I have is that I don't have a very stable set up so I'm using new compilers, weird dependencies etc. etc. so having feedback from someone with a less crazy setup will help debug the issue.

On another topic how dependent is podio on root?

from podio.

joequant avatar joequant commented on July 23, 2024

I'm able to reproduce the root problem with Collections. Will take a look at this.

from podio.

joequant avatar joequant commented on July 23, 2024

I'll need about two or three weeks to figure out how to work through the problem, but I think I can figure out how to make podio and root play nice with each other.

One thing that I'd suggest we do is to try to reduce the number of pieces that require processing outside of the standard compiler toolchain. My experience with other scientific software (IRAF and ratfor) is that people like to create preprocessors and external systems that do clever things, but that these systems become extraordinarily difficult to maintain after a few years.

from podio.

joequant avatar joequant commented on July 23, 2024

i'm getting the following error running Read

'''
(base) [joe@big-apple tests (master)]$ ./read
Error in TTree::SetBranchAddress: unknown branch -> CollectionIDs
reading event 0
Error in TTree::SetBranchAddress: unknown branch -> evtMD
read UserEventName: - expected : event_number_0
terminate called after throwing an instance of 'std::runtime_error'
what(): Couldn't read event meta data parameters 'UserEventName'
'''

This might be due to the fact that I'm running a stripped down ROOT and I'm not including some object. While I'm working on podio, I'm also working on getting a minimal install of ROOT. So I might be not including some library, and if someone has ideas on what they might be, let me know.

from podio.

tmadlener avatar tmadlener commented on July 23, 2024
(base) [joe@big-apple tests (master)]$ ./read
Error in TTree::SetBranchAddress: unknown branch -> CollectionIDs
reading event 0
Error in TTree::SetBranchAddress: unknown branch -> evtMD
read UserEventName: - expected : event_number_0
terminate called after throwing an instance of 'std::runtime_error'
what(): Couldn't read event meta data parameters 'UserEventName'

Hi @joequant
that reads like the file (or TTree) you are trying to read does not contain the necessary branches. Can you check whether the branches are in fact present? E.g. using python

import ROOT
f = ROOT.TFile.Open('your_file.root')
f.ls() # to see which TTrees are available

# Get one tree and print its branches
t = f.Get('metadata') # This is where the CollectionIDs should be
print([b.GetName() for b in t.GetListOfBranches()])

It is also possible to do this in the root command prompt using c++, but it is a bit more cumbersome because you will have to do some casts explicitly that are done implicitly in pyROOT


One thing that I'd suggest we do is to try to reduce the number of pieces that require processing outside of the standard compiler toolchain.

Which parts are you referring to here? Are there things outside the root things that are necessary to generate the dictionaries that root uses for I/O? I haven't really checked I must confess, so maybe there are some things that I miss currently.

from podio.

andresailer avatar andresailer commented on July 23, 2024

@joequant Looks like the location for the "rootmap" file for the dictionary is not in the LD_LIBRARY_PATH?

Do you just run write or the tests via ctest?

from podio.

tmadlener avatar tmadlener commented on July 23, 2024

Moved comment to #112 to try and minimize the cross talk here.

from podio.

joequant avatar joequant commented on July 23, 2024

@joequant Looks like the location for the "rootmap" file for the dictionary is not in the LD_LIBRARY_PATH?

Do you just run write or the tests via ctest?

I just ran write. If it's because I ran the executables incorrectly, I can fix the code a bit to change the error messages.

from podio.

tmadlener avatar tmadlener commented on July 23, 2024

Hi!

Could we split this issue into multiple ones? We now have parallel threads going on. I created #111 and #112 to move parts of it.

Thanks,
Benedikt

Yes, very good idea. I will move my comment to #112

from podio.

tmadlener avatar tmadlener commented on July 23, 2024

For example rather than have something like

class (stringsubstitution1)Class {
}

class (stringsubstitution2)Class {
}

You want something that looks more like

#define CLASS_TEMPLATE(x) (single code generated)

CLASS_TEMPLATE(substitution1)
CLASS_TEMPLATE(substitution2)

The question is whether preprocessor macros are in the end easier to debug/update than a python script that generates the c++ code. I think the main problem with preprocessor macros is that it is not easily possible to have different behaviors for the different classes. The main thing here is the possibility to define relations among the different classes (e.g. OneToManyRelations and OneToOneRelations) which might be present or not. All of these will result in specific parts of the code to be included (and possibly repeated for several such relations), that is not necessary if such collections are not present.

In the end what most users see is the yaml file with the definition of the datamodel. It serves as the single entry point and from there the users shouldn't need to care too much how the code is generated from it.

At least me personally, I would rather debug/port a python script and template string files, then having to dig into deeply nested macros. But that is mainly also because I am way more familiar with python. I suspect that this is similar for large parts of the HEP community. However, I agree that the current implementation of the python generator could be made a bit more readable. But in general I think it is not using anything really fancy from the python world and works with what is available without having to pip install anything new. So at least from that point of view this should hopefully be pretty stable (unless python decides to do another 2to3 update).

Maybe @hegner also has some more insights into this topic.

from podio.

hegner avatar hegner commented on July 23, 2024

It was a conscious choice to use Python as code generation, as indeed multiple classes need to be updated in sync. The readability is an issue, as it just grew and grew...
What would be worth doing is adjusting the generated code to newer C++ features. I am sure a few tricks could be removed.

from podio.

tmadlener avatar tmadlener commented on July 23, 2024

Closing this since essentially all of them have been addressed.

from podio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.