scikit-hep / pylhe Goto Github PK

View Code? Open in Web Editor NEW

39.0 5.0 22.0 1.76 MB

Lightweight Python interface to read Les Houches Event (LHE) files

Home Page: https://pypi.org/project/pylhe/

License: Apache License 2.0

Python 97.92% Dockerfile 2.08%

scikit-hep particle-physics lhe hep

pylhe's Introduction

pylhe: Python LHE interface

Small and thin Python interface to read Les Houches Event (LHE) files

Install

To install pylhe from PyPI you can just do

python -m pip install pylhe

The visualization capabilities require the external dependency of Graphviz.

Get started

The example below provides a simple overview. Full functionality can be inspected from the functions provided in the pylhe module.

import itertools

# You can use LHE files from scikit-hep-testdata
from skhep_testdata import data_path

import pylhe

lhe_file = data_path("pylhe-testlhef3.lhe")
events = pylhe.read_lhe_with_attributes(lhe_file)
print(f"Number of events: {pylhe.read_num_events(lhe_file)}")

# Get event 1
event = next(itertools.islice(events, 1, 2))

# A DOT language graph of the event can be inspected as follows
print(event.graph.source)

# The graph is nicely displayed as SVG in Jupyter notebooks
event

# To save a DOT graph render the graph to a supported image format
# (refer to the Graphviz documentation for more)
event.graph.render(filename="test", format="png", cleanup=True)
event.graph.render(filename="test", format="pdf", cleanup=True)

Citation

The preferred BibTeX entry for citation of pylhe is

@software{pylhe,
  author = {Lukas Heinrich and Matthew Feickert and Eduardo Rodrigues},
  title = "{pylhe: v0.8.0}",
  version = {v0.8.0},
  doi = {10.5281/zenodo.1217031},
  url = {https://github.com/scikit-hep/pylhe},
}

Contributors

We hereby acknowledge the contributors that made this project possible (emoji key):

_{Matthew Feickert} 🚧 🎨 💻 📖	_Lukas 🚧 🎨 💻 📖	_{Eduardo Rodrigues} 🚧 💻 📖	_{Johannes Schumann} 💻	_{Henry Schreiner} 💻	_ariaradick 💻	_{Junghwan John Goh} 💻
_{fuenfundachtzig} 💻	_{Shantanu Gontia} 💻	_{Tom Eichlersmith} 💻	_{Alexander Puck Neuwirth} 💻

This project follows the all-contributors specification.

pylhe's People

Contributors

Stargazers

Watchers

pylhe's Issues

Python3

Hi, it appears that you may have updated this for Python 3, but I found an issue in init.py, line 19 should read "for k,v in kwargs.items():" rather than "for k,v in kwargs.iteritems():" in order to be compatible with Python 3. See the answer to this stack exchange question.

Slowdown and memory increase with time

I'm using pylhe for looping on several LHE files, each containing 100K events. Running the snippet below on a lxplus machine (CentOS Linux release 7.9.2009), one can see that iterations become slower as time progresses, and eventually the job gets killed due to too much memory being used.

import pylhe                                                                                                                                
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
atime = time.time()                                                                                                                         
for ievt, evt in enumerate(pylhe.read_lhe(afile)): #pylhe.read_lhe_with_attributes                                                          
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt))

The significant slowdown occurs at iteration ~40K/50K. I would expect no memory increase given that we are dealing with a generator.
Is the above behavior expected? I'm using Python 3.5.6 (GCC 6.2.0).

Accessing values and info of a particle?

Hi,

I was wondering about how to access the values of a particle's ID and other attributes (in numerical values) such as Px and Py. Are there any functions or methods I could use to get their numerical values?

Thanks!

Test file for everyone

I would suggest that you add to the notebook a test file so that anyone can trivially get a feeling for the package, running the thing with no need for anything else. Right now the notebook makes use of a private file. You could also get a test file in our package sciki-hep-testdata and use it from there …

BTW, most project packages have notebooks under notebooks/ rather than examples/. Could you consider changing the name?

Thanks.

Write LHE file function

Since powheg does not follow the lhe-3 standard strictly (i.e. weights see #220 ), I think pylhe would be very nice to convert powheg lhe files to lhe-3 standard. Since pylhe can already read those lhe files one only needs to write the loaded lhe-data from pylhe.
Does pylhe have a write_lhe function? I have not seen one.

I'll write it very soon, just wanted to check if this exists somewhere already or has been done before?

Read a list of LHE files with Pylhe ?

Is there a way to read a list of LHE files, like a TChain of Root files ?
If not would be possible to implement this feature in Pylhe ?

add CONTRIBUTING.md

Can't use readNumEvents with zipped LHE-files

pylhe/src/pylhe/__init__.py

Line 255 in 9b06dbd

element.tag == "event" for event, element in ET.iterparse(file, events=["end"])

Bug: `rwgt` overwritten by `weights`

https://github.com/scikit-hep/scikit-hep-testdata/blob/main/src/skhep_testdata/data/pylhe-testlhef3.lhe#L337

  <rwgt>
   <wgt id="1001"> 0.50109E+02 </wgt>
   <wgt id="1002"> 0.45746E+02 </wgt>
   <wgt id="1003"> 0.52581E+02 </wgt>
   <wgt id="1004"> 0.50109E+02 </wgt>
   <wgt id="1005"> 0.45746E+02 </wgt>
   <wgt id="1006"> 0.52581E+02 </wgt>
   <wgt id="1007"> 0.50109E+02 </wgt>
   <wgt id="1008"> 0.45746E+02 </wgt>
   <wgt id="1009"> 0.52581E+02 </wgt>
  </rwgt>
  <weights> 1.000e+00 0.204e+00 1.564e+00 </weights>

Unfortunately, #220 overwrites the values from rwgt by those of weights. I did not know madgraph uses both...

Ideally I think one would have both available as rwgt and weight attribute, but that would be a breaking change i.e. renaming current weights to rwgt. WDYT?

python3

Do you plan to support python3?

Use gh-action-pypi-publish v1.7.0+ APIs

In pypa/gh-action-pypi-publish v1.7.0 inputs were changed to use kebab-case. Update to all inputs that were previously snake_case to use the new input format.

c.f. https://github.com/pypa/gh-action-pypi-publish/releases/tag/v1.7.0

Revise tests around scikit-hep-testdata v0.4.3+

Following up on discussion in Issue #76, the files added in scikit-hep/scikit-hep-testdata#56 are in scikit-hep-testdata v0.4.3+. The test should be restructured around this.

Use Particle package

Seems a fair suggestion if you need particle names/properties :-).

Instead of raising an error on missing weight id the index could be used

pylhe/src/pylhe/__init__.py

Line 271 in c5503c2

except KeyError:

eg. by setting

wg_id = index

Origin: https://github.com/scikit-hep/pylhe/pull/220/files/45dff93321a13ab64d3a87d40358567af4dd97ce#r1465950808

Unweighted events produced from madgraph examples can't be read

This seems to be a problem for unweighted events:

$ file unweighted_events.lhe.gz 
unweighted_events.lhe.gz: gzip compressed data, was "tmp_0_unweighted_events.lhe", last modified: Tue Mar 30 07:22:36 2021, max compression, original size modulo 2^32 15580732
$ python
Python 3.8.6 (default, Jan  5 2021, 00:14:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylhe
>>> pylhe.readLHEInit("unweighted_events.lhe.gz")
weightgroup must have attribute 'type'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/feickert/Code/GitHub/Scikit-HEP/pylhe/src/pylhe/__init__.py", line 159, in readLHEInit
    wg_type = child.attrib["type"]
KeyError: 'type'
>>> pylhe.readLHEInit("unweighted_events.lhe")
weightgroup must have attribute 'type'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/feickert/Code/GitHub/Scikit-HEP/pylhe/src/pylhe/__init__.py", line 159, in readLHEInit
    wg_type = child.attrib["type"]
KeyError: 'type'

Originally posted by @matthewfeickert in #74 (comment)

investigate pyhepmc API

both LHE and HepMC are event file formats and should have a similar look & feel. We should investigate the API to see if pylhe could adapt

https://github.com/scikit-hep/pyhepmc

Use all-contributors for better recognition of contributions

Let me know @matthewfeickert if you agree with the idea and I will update the README. Thanks.

Include ability to read gzip'ed input LHE file

As a quick suggestion for an improvement, it might be very handy to include the ability of opening zipped lhe files without the need for unzipping beforehand.
This can be achieved checking the file type and in case using something like
https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files

This feature can be then disabled if the right dependency is not there, but it looks to me a rather minimal dependency and a quite wide usage for this feature.

Thanks for considering it.

use scikit-hep.vector objects for momenta

I find myself often using the following syntax

momentum = vector.Array(ak.flatten(array_from_lhe[<some selection>]['vector']))

so that I can use vector's helper functions for calculating common variables.
I get the array_from_lhe by using pylhe.to_awkward(pylhe.read_lhe(<some file>)).

To me, from the outside, the existence of the Momentum4D record within the awkward array returned by pylhe.to_awkward seems to be extraneous especially since we have a module with a known awkward interface and designed for 4D vectors. Is replacing Momentum4D with vector's Vector4D something that is feasible? If this isn't feasible (or desirable) for some reason, this isn't too much boilerplate code to do what I want, I was just curious.

Drop Python 2.7 support

Following the discussion RE: how to properly drop support for a release in scikit-hep/pyhf#1075, it seems that as only Python 3.6+ is tested in CI now that either Python 2.7 support should be dropped or we should test Python 2.7 make a statement about how long Python 2.7 support will be provided in the README. Either way we should add a python_requires to setup.cfg.

Thoughts @lukasheinrich?

add basic documentation via sphinx

Homogeneous treatment of parsing errors, with test coverage

Following up from #141 (comment):

The read_lhe type of functions and others, which parse the XML, do not all catch parsing errors.

We should discuss if parsing errors should raise errors or be dealt with smoothly, as now, with print statements. Any modification to the way these errors are dealt with should be done homogeneously across the various methods.
Tests should be added to cover the lines dealing with these parsing errors. That will require a special LHE file, if not several.

PyPI API token set wrong

Hi @lukasheinrich after trying to release v0.0.6 the workflow hit an error that resulted in a

HTTPError: 403 Forbidden from https://upload.pypi.org/legacy/
Invalid or non-existent authentication information. See https://pypi.org/help/#invalid-auth for more information.

Given the "Getting 403 forbidden from TestPypi" Discussion on pypa/gh-action-pypi-publish I think the PyPI API token is set wrong. Can you regenerate one and set it as a new PYPI_PASSWORD GitHub secret (or give me access to the PyPI page so I can do it)?

Write `developer.md` to explain maintainers process

Following up on #157 (comment) write up notes that explain how to take care of releases and general maintenance.

LaTeX fonts for graph visualisations

Have particle labels back in italic as before, cf. the discussion at #131 (comment).

GitHub release and PyPI release mismatch

The current release on PyPI is v0.0.2 while the GitHub release is v0.0.4. @lukasheinrich Can you please cut a new release?

license?

Hello Lukas!

I copied this code into another repo. I hope that is ok, but I guess without an explicit license I am breaking the law somehow.

Want to add one so I am no longer a thief? ;-)

Chris

Consider graphviz as alternative to pydot and others

I use alternatives to get PDF/PNG/... files (if needed) in DecayLanguage: why not simply go for https://pypi.org/project/graphviz/ for dot files, which provide you with everything you need? To be honest I started by using pydot (still in use in DecayLanguage) but this is also seems unmaintained whereas graphviz is equivalent and very well maintained, hence I'm going to make the switch asap.

Originally posted by @eduardo-rodrigues in #53 (comment)

Limit TestPyPI usage to release candidate testing

c.f. scikit-hep/pyhf#1727 for how

Add pylhe to Conda

This seems like a good idea in general. It is a requirement in order to get pylhe included in the scikit-hep metapackage, which is something many are keen on, see scikit-hep/scikit-hep#173.

awkward-array API

would be nice to be able to read this with awkward-array. Perhaps using some of the "behavioral interfaces" we've been discussing with @jpivarski

Add PyPI API token

Loosely related to Issue #14, it would be nice to be able to use GitHub Actions's CI systems to cut releases and distribute to PyPI. To do this, a PyPI API token needs to get registered for the project, which can only be done by someone with maintainer privileges. This was done successfully in pyhf PR 638.

Installing the library including all the requirements.

Hello everyone,

I was just installing the library using pip install pylhe. However, after running this command I check that awkward was not installed despite the fact that it is in the variable install_requires in the setup.cfg file. I just show the output from running pip install pylhe:

Requirement already satisfied: pylhe in path_to_packages/site-packages (0.2.1)
Requirement already satisfied: networkx~=2.2 in path_to_packages/site-packages (from pylhe) (2.8)
Requirement already satisfied: tex2pix~=0.3 in path_to_packages/site-packages (from pylhe) (0.3.1)
Requirement already satisfied: particle~=0.14 in path_to_packages/site-packages (from pylhe) (0.20.1)
Requirement already satisfied: attrs>=19.2 in path_to_packages/site-packages (from particle~=0.14->pylhe) (21.4.0)
Requirement already satisfied: hepunits>=2.0.0 in path_to_packages/site-packages (from particle~=0.14->pylhe) (2.2.0)

I installed the library with all the requirements without problem cloning the repo and running python setup.py install. If there is a different standard for a correct installation with all the requirements I will be happy to hear it.

docs: Add list of pylhe citations

We should get docs up in general, but @lukasheinrich pointed out that we should probably be tracking pylhe citations as well.

This is what I have just from https://www.google.com/search?q=pylhe+site%3Aarxiv.org:

These should also get added to the Scikit-HEP page.

Add tests for KeyErrors

In #139 (comment)_ it was mentioned that it would be good to add tests for instances in which KeyErrors are raised. This would be good to have in general to bring up the coverage.

Error running master/examples/

Hello again,

I just wanted to pointed out that examples appear to be outdated or are not working for me because a possible error. For example, running https://github.com/scikit-hep/pylhe/blob/master/examples/awkward_example.ipynb. The code pylhe.register_awkward() gives me the following error:

AttributeError: module 'pylhe' has no attribute 'register_awkward'

Error while reading events with an additional string '#aMCatNLO'

Problem:
pylhe raises error when I try to read a lhe file generated with aMC@NLO

>>> lhe = readLHE("events.lhe")
>>> for x in lhe:
...     print(x)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 201, in readLHE
    particle_objs = [LHEParticle.fromstring(p) for p in particles]
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 201, in <listcomp>
    particle_objs = [LHEParticle.fromstring(p) for p in particles]
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 71, in fromstring
    return cls(**dict(zip(cls.fieldnames, map(float, string.split()))))
ValueError: could not convert string to float: '#aMCatNLO'

The example LHE event block can be seen:

  <event>
  4      0 0.15466170E+09 0.14090630E+02 0.75467716E-02 0.20024995E+00
       21 -1    0    0  501  502 0.00000000E+00 0.00000000E+00 0.34453202E+03 0.34453202E+03 0.00000000E+00 0.0000E+00 0.9000E+01
       21 -1    0    0  502  503 0.00000000E+00 0.00000000E+00 -.14406924E+00 0.14406924E+00 0.00000000E+00 0.0000E+00 0.9000E+01
        5  1    1    2  501    0 0.41326091E+00 -.45455539E+01 0.23557499E+03 0.23566607E+03 0.47000000E+01 0.0000E+00 0.9000E+01
       -5  1    1    2    0  503 -.41326091E+00 0.45455539E+01 0.10881297E+03 0.10901002E+03 0.47000000E+01 0.0000E+00 0.9000E+01
#aMCatNLO 1  5  2  0  0 0.00000000E+00 0.00000000E+00 9  0  0 0.10000000E+01 0.35498143E+00 0.24370745E+01 0.00000000E+00 0.00000000E+00
  <rwgt>
   <wgt id='1001'> 0.15466E+09 </wgt>
   <wgt id='1002'> 0.24435E+09 </wgt>
   <wgt id='1003'> 0.76688E+08 </wgt>
   <wgt id='1004'> 0.11073E+09 </wgt>
   <wgt id='1005'> 0.17493E+09 </wgt>
   <wgt id='1006'> 0.54902E+08 </wgt>
   <wgt id='1007'> 0.23857E+09 </wgt>
   <wgt id='1008'> 0.37692E+09 </wgt>
   <wgt id='1009'> 0.11829E+09 </wgt>
  </rwgt>
  </event>

My Suggestion:
Modify the read_lhe() function in the __init__.py

minor fix to strip whitespaces
parse LHEParticles not more than the number of particles defined in the 1st line of the event data

def read_lhe(filepath):
    try:
        with _extract_fileobj(filepath) as fileobj:
            for event, element in ET.iterparse(fileobj, events=["end"]):
                if element.tag == "event":
                    data = element.text.strip().split("\n")
                    eventdata, particles = data[0], data[1:]
                    eventinfo = LHEEventInfo.fromstring(eventdata)
                    particles = particles[:int(eventinfo.nparticles)]
                    particle_objs = [LHEParticle.fromstring(p) for p in particles]
                    yield LHEEvent(eventinfo, particle_objs)
    except ET.ParseError as excep:
        print("WARNING. Parse Error:", excep)
        return

pypi release sources do not include tests

Reconsider tex2pix dependency

tex2pix is a core dependency of pylhe but the last release of v0.3.1 was in 2016, so it seems to be safely "unmaintained" at this point. The source code also doesn't seem to be publicly available on GitHub.

👋 @agbuckley, as the author of tex2pix (thanks! 🙇), can you comment on if there is any plans to maintain tex2pix in the future? Or would you recommend that we look for an alternative that is more probable to be patched if there are issues that are found. I can imagine that you already have more than enough on your plate and being responsible for maintaining another codebase might very understandably not be high on the list. :)

scikit-hep / pylhe Goto Github PK

pylhe's Introduction

pylhe: Python LHE interface

Install

Get started

Citation

Contributors

pylhe's People

Contributors

Stargazers

Watchers

Forkers

pylhe's Issues

Recommend Projects

Recommend Topics

Recommend Org