Coder Social home page Coder Social logo

Action classes & parameter I/O about grid HOT 17 CLOSED

paboyle avatar paboyle commented on August 18, 2024
Action classes & parameter I/O

from grid.

Comments (17)

paboyle avatar paboyle commented on August 18, 2024

Yes -- here we are coming into the much delayed issue of parameter I/O, IDLs, protocol buffers,
XML, JSON, or whatever.

When constructing Action objects, or building a factory we have to munge parameters from somewhere.

I've been delaying a decision on this, and also would prefer to bring my US colleagues into some reasonable level of agreement on what is decided.

"Lighter than Chroma" is definitely a design goal. The lighter the better. I see you are munging XML
in IroIro++ still. We should think carefully about the pattern and I would survey IroIro,
and possibly several other lattice codes for best practice ideas.

We need a goal of enabling "normal" people to add actions!

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

XML for iroiro++ was a design decision several years ago, I had no idea of protocol buffers, JSON, YAML or else.

Now I'm leaning towards one of JSON or YAML. I cannot see the advantage of protocol buffers still, I should read/experiment more. The first 2 remain more human readable to me.

As an example case. When we define the HMC with multilevel actions XML works very nicely in showing the levels, so would do JSON and YAML, in a cleaner way. In protocol buffers this can be cumbersome to implement, in my understanding. This would be a design request.

Let me write some simple test codes that use these three methods to parse data and then choose the one who fits best our needs.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

Notice that anyway an XML parser is needed in the code to handle ILDG format.

from grid.

paboyle avatar paboyle commented on August 18, 2024

the advantage of protocol buffers is an interface definition language.
you don't write the parsing, and there is a constraint of 1:1 mapping between the objects and the file contents.
The code is easier to develop because the parsing is keyed off the object definition, but
also there is a strict rule about what is in the file, rather than randomly chose decisions made in
the parsing code, of which Chroma at least has a lot.
We may need to organise a conference call to discuss issues like these; there are other views in the Brookhaven and Columbia, and perhaps taking a few weeks to assemble some examples for comparison may make sense. The lattice conference may be a good time.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

One drawback I see in protocol buffers is that they would introduce a major dependency on what people do at Google. Not only the library but also the tools that you need to install to generate the c+ files from the .proto files. Not so nice for portability, I think.
Also, PB are designed for messaging not for config files, the text support is there but not the primary concern for Google people.
I read a bit more and still I cannot understand how to create the HMC tree for variable number and type of multiply nested actions. Seems that the messaging system wants to know in advance the structure...

My experience is that QCD codes do not need complex parsers. My parser in IroIro is very simple with very few functions. Writing a parser of standard data serialization format, starting from one of the multiple available libraries, it is quite simple.

I agree that we can have a discussion during the lattice conference. Better to prepare some examples in advance.

from grid.

paboyle avatar paboyle commented on August 18, 2024

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

is interesting. Apache AVRO has a JSON idl. My enthusiasm is for an IDL that constrains a mapping between internal data types and external file formats, not for protocol buffers per se.

Might also be interesting to see what ROOT has done.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

I know the page. AVRO idl is interesting.
I understand the point. It would be a very nice feature but I do not think is strictly necessary. In 3 years I never had to change the parser lib and adding new inputs was as easy as writing one line of code. Also for users that were not in the code development.

we should write down some of the essential requirements for the configuration files and the internal handling of it. I'll start:

  1. human readability
  2. simplicity (vague, need to define more precisely with the final users contribution)
  3. less dependency as possible to external libs (this is mine)
  4. constrain file formats with internal data types (let's discuss the feasibility of this in all the QCD cases)
  5. easy extension of the inputs number/types for users not fully aware of the code.
  6. ...

Please correct/extend this list.

Also there is no need of a full fledged messaging system, just need to read/write configure/output files.

from grid.

paboyle avatar paboyle commented on August 18, 2024

Probably worth mentioning, in UKhadron I took to an ugly C preprocessor macro to set up
a common single definition of both parameter objects and their xml parsing/serialisation code.

It evaded an IDL (as there wasn't any for XML at the time) and still enforced the
1:1 mapping between objects and their text file representations.

It was easy to maintain but ugly.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

Having a common definition in one single place would be quite elegant. We tried to simplify things in this sense with parameters classes related to the bigger class that were responsible for reading them from the XML too. Everything in a small portion of the code easily manageable. But then the classes themselves need to be abstracted and composed and this is the role of factories.
Still I cannot see how to use IDL for configuration files in this respect. They are supposed to deal with messaging. I am puzzled by composability and 'nestability'.
In all the examples I have seen (PB, Thrift, Apache IDL) the lDL was used to generate the class code but no example about the real message (also because the real message is created and used during execution so support for text is less relevant and readability/simplicity is not a concern).
I must write some code by myself to see how it works.
Do you have an idea of how it works with abstract classes?

from grid.

paboyle avatar paboyle commented on August 18, 2024

Not that I want to betray weakness, but I'm starting to come round to your use of Pugi xml with no
external dependency. I hate having to install additional tools (gmp, mpfr, libxml) in cross compiled
environments, and self contained is clean.

At least people know XML by now, agree YAML looks cleaner. I do very much want to limit
the extent to which the details any given external markup language permeate the code. I find it
offensive that "string" containing XML is considered a "polymorphic" parameter in Chroma, for example. That's no more polymorphic that typecast through void *!

On another note, OpenQCD has a comment/ref which I haven't chased up yet claiming that closed form
solution of rational zolo approx for square root is known. Will look in at some point; it is possible the Remez and dependency on GMP and MPFR can be avoided.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

We can also be more advanced and give support to both xml and yaml eventually, leaving the user the choice. For the features we need the two are similar in structure, not in readability though.
Not because I chose that for IroIro but I think pugixml is more that fine for reading QCD configuration files. Small, few functions, fast enough. An XML parser must be there anyway for ILDG. For YAML eventually I would suggest to look at this
http://llvm.org/docs/doxygen/html/YAMLParser_8cpp_source.html
and write our own version, 1 file. I think that the dependencies can be easily solved.
Or this
https://github.com/yaml/libyaml/tree/yaml-1.2
also quite small if we eliminate all the useless stuff. I think can be integrated completely in the building process.
The thing a bit annoying about yaml is that nesting is created by blank spaces, prone to errors for the final user.
JSON I do not like so much because cannot be commented (well, you can with some tricks). And comments inside the config file are important in my point of view.

Then I can write an upper layer that masks the real markup language from the Grid code, assuming only that it is structured (i.e. has nesting and key-value-tags).

Let me see the OpenQCD claim, but people may like having more than the square root. Some staggered fermions HMC code need different exponents. But if we can get rid of GMP and MPFR would be nice.

from grid.

james-simone avatar james-simone commented on August 18, 2024

Hi,

Why not consider Lua as the configuration language? It's plain C, lightweight and embeddable. You gain the
advantages of having a full scripting language. With Lua parameters can be input as native tables, or apparently,
via Lua libraries, as JSON, YAML and even XML.
Concerning the latter three data formats, JSON is factors of ten faster to parse than YAML. YAML, however, is a bit easier
on human eyes but it is type deduction that slows down the parser. XML is verbose, difficult for humans to use, has many
features we will ever need, is slow to parse, and the parser implementations in C are big and ghastly.

On 06/01/2015 10:00 PM, Guido Cossu wrote:

We can also be more advanced and give support to both xml and yaml eventually, leaving the user the choice. For the features we
need the two are similar in structure, not in readability though.
Not because I chose that for IroIro but I think pugixml is more that fine for reading QCD configuration files. Small, few
functions, fast enough. An XML parser must be there anyway for ILDG. For YAML eventually I would suggest to look at this
http://llvm.org/docs/doxygen/html/YAMLParser_8cpp_source.html
and write our own version, 1 file. I think that the dependencies can be easily solved.
Or this
https://github.com/yaml/libyaml/tree/yaml-1.2
also quite small if we eliminate all the useless stuff. I think can be integrated completely in the building process.
The thing a bit annoying about yaml is that nesting is created by blank spaces, prone to errors for the final user.
JSON I do not like so much because cannot be commented (well, you can with some tricks). And comments inside the config file are
important in my point of view.

Then I can write an upper layer that masks the real markup language from the Grid code, assuming only that it is structured
(i.e. has nesting and key-value-tags).

Let me see the OpenQCD claim, but people may like having more than the square root. Some staggered fermions HMC code need
different exponents. But if we can get rid of GMP and MPFR would be nice.


Reply to this email directly or view it on GitHub #10 (comment).

from grid.

paboyle avatar paboyle commented on August 18, 2024

I did wonder about that and looked at the Lua pages a few times in the last week, but didn't get as far
as looking at examples. Will think about it.

Another thought that occurred to me was to define the parameters in a ubiquitous dynamically typed scripting language, such as python or whatever, and use that language to generate the C++ headers and parsers. Effectively generating our own IDL by starting from a language that provides the RTTI
features C++ is missing, rather than the other way round.

[ A random dream is that if the data structures are sufficiently well defined to cross languages,
a la schema's etc.. the construction of GUI for parameter file content is easy. Drop down list for which object you insert, and after selection parameter fields for object dynamically appear reflecting any
polymorphism automatically.

Also... wouldn't it be nice to draw the correlation functions in something resembling Jaxodraw, and have
the code get controlled in the right way...]

from grid.

mspraggs avatar mspraggs commented on August 18, 2024

I have some experience with using Python as a code generator. See for example:

https://github.com/mspraggs/pyQCD/blob/master/pyQCD/templates/core/types.hpp
https://github.com/mspraggs/pyQCD/blob/master/pyQCD/utils/codegen.py

Jinja2 in Python is a very easy to use templating package, and once you've got the header templates set up correctly it's very easy to generate new code. If you want to get more sophisticated there's also libclang bindings for Python, but I haven't tried this myself.

I think the Lua route is something that USQCD have worked on, no? They gave a talk at Lattice 2014 about QLua or some such.

If the "backend" part of Grid has sufficient encapsulation and provides a nice clean interface, then in principle this would give others the option of writing bindings or an interface of their own choosing. Of course some standard would need to be settled on within a particular collaboration or some such, but it's always nice to have the flexibility.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

I have read about LUA as configuration files and I really like the idea.
It can be used at different levels, being simple enough for standard users and using the full scripting for the powerusers.
Some ref I found useful
J Osborn lattice talk
https://indico.bnl.gov/getFile.py/access?contribId=403&sessionId=10&resId=0&materialId=slides&confId=736

http://www.ibm.com/developerworks/library/l-embed-lua/

http://stackoverflow.com/questions/15298253/why-lua-for-configuration-plugins

http://www.lua.org/pil/25.html

https://github.com/vstakhov/libucl (Universal Configuration Language)

http://webserver2.tecgraf.puc-rio.br/~celes/tolua/tolua-3.2.html (kind of IDL)

http://lua-users.org/wiki/BindingCodeToLua

(about the speed of parsers (JSON, YAML, XML, or whatever): this is not a problem since the configuration files are few hundred kb at most. There exist small parsers that can be included completely in the code without external dependencies, too).

About the python: I'd like to understand how it works from the final user point of view. How does a configuration file looks like?

I also like the idea of writing Grid that is unaware of the configuration interface, leaving coders the ability to write their own, or providing different choices.

from grid.

chulwoo1 avatar chulwoo1 commented on August 18, 2024

Hello,

FWIW, I've looked into formats discussed above a few month ago when we were starting to discuss "new CPS", and I ended up liking JSON at the time. Currently my preference I think is somewhere between JSON and YAML. Both seem to have good balance of human readability and 'writability'.
Being able to add comment is indeed a big plus, although some people may not like the default indentation scheme in YAML(It's a shame it doesn't allow tab as indentation). YAML 1.2 can parse JSON apparently, so the difference possibly is somewhat moot, and cosmetic in any case.

I seem to remember the lua interpreter had to be modified (slightly, I think) for either FUEL or Qlua. Anyone remembers this? I'll try to find out.

from grid.

paboyle avatar paboyle commented on August 18, 2024

I'm closing this issue -- the issue on serialisation presents the resolution of a lot of the object I/O issues.
For scripting interfaces... lets defer the problem.

The action fragments are in an HMC already running thanks Guido (Quenched) and my doing the fermion force. While I may revisit the HMC area to tidy up a little, I see no reason to keep this issue open any more, with the dangling aspects subsumed by the serialisation track.

Even the serialisation seems to only now be a question if implementing a virtual/abstract reader/writer
and we will get all of XML, JSON, YAML for free using MacroMagic.h tricks

from grid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.