Coder Social home page Coder Social logo

scikit-hep / pylhe Goto Github PK

View Code? Open in Web Editor NEW
39.0 5.0 22.0 1.76 MB

Lightweight Python interface to read Les Houches Event (LHE) files

Home Page: https://pypi.org/project/pylhe/

License: Apache License 2.0

Python 97.92% Dockerfile 2.08%
scikit-hep particle-physics lhe hep

pylhe's Introduction

pylhe: Python LHE interface

pylhe logo

GitHub Project DOI Scikit-HEP

PyPI version Conda-forge version Supported Python versions

GitHub Actions Status Code Coverage pre-commit.ci status Code style: black

Small and thin Python interface to read Les Houches Event (LHE) files

Install

To install pylhe from PyPI you can just do

python -m pip install pylhe

The visualization capabilities require the external dependency of Graphviz.

Get started

The example below provides a simple overview. Full functionality can be inspected from the functions provided in the pylhe module.

import itertools

# You can use LHE files from scikit-hep-testdata
from skhep_testdata import data_path

import pylhe

lhe_file = data_path("pylhe-testlhef3.lhe")
events = pylhe.read_lhe_with_attributes(lhe_file)
print(f"Number of events: {pylhe.read_num_events(lhe_file)}")

# Get event 1
event = next(itertools.islice(events, 1, 2))

# A DOT language graph of the event can be inspected as follows
print(event.graph.source)

# The graph is nicely displayed as SVG in Jupyter notebooks
event

# To save a DOT graph render the graph to a supported image format
# (refer to the Graphviz documentation for more)
event.graph.render(filename="test", format="png", cleanup=True)
event.graph.render(filename="test", format="pdf", cleanup=True)

Citation

The preferred BibTeX entry for citation of pylhe is

@software{pylhe,
  author = {Lukas Heinrich and Matthew Feickert and Eduardo Rodrigues},
  title = "{pylhe: v0.8.0}",
  version = {v0.8.0},
  doi = {10.5281/zenodo.1217031},
  url = {https://github.com/scikit-hep/pylhe},
}

Contributors

We hereby acknowledge the contributors that made this project possible (emoji key):

Matthew Feickert
Matthew Feickert

๐Ÿšง ๐ŸŽจ ๐Ÿ’ป ๐Ÿ“–
Lukas
Lukas

๐Ÿšง ๐ŸŽจ ๐Ÿ’ป ๐Ÿ“–
Eduardo Rodrigues
Eduardo Rodrigues

๐Ÿšง ๐Ÿ’ป ๐Ÿ“–
Johannes Schumann
Johannes Schumann

๐Ÿ’ป
Henry Schreiner
Henry Schreiner

๐Ÿ’ป
ariaradick
ariaradick

๐Ÿ’ป
Junghwan John Goh
Junghwan John Goh

๐Ÿ’ป
fuenfundachtzig
fuenfundachtzig

๐Ÿ’ป
Shantanu Gontia
Shantanu Gontia

๐Ÿ’ป
Tom Eichlersmith
Tom Eichlersmith

๐Ÿ’ป
Alexander Puck Neuwirth
Alexander Puck Neuwirth

๐Ÿ’ป

This project follows the all-contributors specification.

pylhe's People

Contributors

8me avatar actions-user avatar allcontributors[bot] avatar apn-pucky avatar ariaradick avatar dependabot[bot] avatar eduardo-rodrigues avatar fuenfundachtzig avatar henryiii avatar jhgoh avatar lukasheinrich avatar matthewfeickert avatar pre-commit-ci[bot] avatar shantanu-gontia avatar tomeichlersmith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pylhe's Issues

Python3

Hi, it appears that you may have updated this for Python 3, but I found an issue in init.py, line 19 should read "for k,v in kwargs.items():" rather than "for k,v in kwargs.iteritems():" in order to be compatible with Python 3. See the answer to this stack exchange question.

Slowdown and memory increase with time

I'm using pylhe for looping on several LHE files, each containing 100K events. Running the snippet below on a lxplus machine (CentOS Linux release 7.9.2009), one can see that iterations become slower as time progresses, and eventually the job gets killed due to too much memory being used.

import pylhe                                                                                                                                
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
atime = time.time()                                                                                                                         
for ievt, evt in enumerate(pylhe.read_lhe(afile)): #pylhe.read_lhe_with_attributes                                                          
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt)) 

The significant slowdown occurs at iteration ~40K/50K. I would expect no memory increase given that we are dealing with a generator.
Is the above behavior expected? I'm using Python 3.5.6 (GCC 6.2.0).

Accessing values and info of a particle?

Hi,

I was wondering about how to access the values of a particle's ID and other attributes (in numerical values) such as Px and Py. Are there any functions or methods I could use to get their numerical values?

Thanks!

Test file for everyone

I would suggest that you add to the notebook a test file so that anyone can trivially get a feeling for the package, running the thing with no need for anything else. Right now the notebook makes use of a private file. You could also get a test file in our package sciki-hep-testdata and use it from there โ€ฆ

BTW, most project packages have notebooks under notebooks/ rather than examples/. Could you consider changing the name?

Thanks.

Write LHE file function

Since powheg does not follow the lhe-3 standard strictly (i.e. weights see #220 ), I think pylhe would be very nice to convert powheg lhe files to lhe-3 standard. Since pylhe can already read those lhe files one only needs to write the loaded lhe-data from pylhe.
Does pylhe have a write_lhe function? I have not seen one.

I'll write it very soon, just wanted to check if this exists somewhere already or has been done before?

Bug: `rwgt` overwritten by `weights`

https://github.com/scikit-hep/scikit-hep-testdata/blob/main/src/skhep_testdata/data/pylhe-testlhef3.lhe#L337

  <rwgt>
   <wgt id="1001"> 0.50109E+02 </wgt>
   <wgt id="1002"> 0.45746E+02 </wgt>
   <wgt id="1003"> 0.52581E+02 </wgt>
   <wgt id="1004"> 0.50109E+02 </wgt>
   <wgt id="1005"> 0.45746E+02 </wgt>
   <wgt id="1006"> 0.52581E+02 </wgt>
   <wgt id="1007"> 0.50109E+02 </wgt>
   <wgt id="1008"> 0.45746E+02 </wgt>
   <wgt id="1009"> 0.52581E+02 </wgt>
  </rwgt>
  <weights> 1.000e+00 0.204e+00 1.564e+00 </weights>

Unfortunately, #220 overwrites the values from rwgt by those of weights. I did not know madgraph uses both...

Ideally I think one would have both available as rwgt and weight attribute, but that would be a breaking change i.e. renaming current weights to rwgt. WDYT?

python3

Do you plan to support python3?

Unweighted events produced from madgraph examples can't be read

This seems to be a problem for unweighted events:

$ file unweighted_events.lhe.gz 
unweighted_events.lhe.gz: gzip compressed data, was "tmp_0_unweighted_events.lhe", last modified: Tue Mar 30 07:22:36 2021, max compression, original size modulo 2^32 15580732
$ python
Python 3.8.6 (default, Jan  5 2021, 00:14:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylhe
>>> pylhe.readLHEInit("unweighted_events.lhe.gz")
weightgroup must have attribute 'type'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/feickert/Code/GitHub/Scikit-HEP/pylhe/src/pylhe/__init__.py", line 159, in readLHEInit
    wg_type = child.attrib["type"]
KeyError: 'type'
>>> pylhe.readLHEInit("unweighted_events.lhe")
weightgroup must have attribute 'type'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/feickert/Code/GitHub/Scikit-HEP/pylhe/src/pylhe/__init__.py", line 159, in readLHEInit
    wg_type = child.attrib["type"]
KeyError: 'type'

Originally posted by @matthewfeickert in #74 (comment)

Include ability to read gzip'ed input LHE file

As a quick suggestion for an improvement, it might be very handy to include the ability of opening zipped lhe files without the need for unzipping beforehand.
This can be achieved checking the file type and in case using something like
https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files

This feature can be then disabled if the right dependency is not there, but it looks to me a rather minimal dependency and a quite wide usage for this feature.

Thanks for considering it.

use scikit-hep.vector objects for momenta

I find myself often using the following syntax

momentum = vector.Array(ak.flatten(array_from_lhe[<some selection>]['vector']))

so that I can use vector's helper functions for calculating common variables.
I get the array_from_lhe by using pylhe.to_awkward(pylhe.read_lhe(<some file>)).

To me, from the outside, the existence of the Momentum4D record within the awkward array returned by pylhe.to_awkward seems to be extraneous especially since we have a module with a known awkward interface and designed for 4D vectors. Is replacing Momentum4D with vector's Vector4D something that is feasible? If this isn't feasible (or desirable) for some reason, this isn't too much boilerplate code to do what I want, I was just curious.

Drop Python 2.7 support

Following the discussion RE: how to properly drop support for a release in scikit-hep/pyhf#1075, it seems that as only Python 3.6+ is tested in CI now that either Python 2.7 support should be dropped or we should test Python 2.7 make a statement about how long Python 2.7 support will be provided in the README. Either way we should add a python_requires to setup.cfg.

Thoughts @lukasheinrich?

Homogeneous treatment of parsing errors, with test coverage

Following up from #141 (comment):

The read_lhe type of functions and others, which parse the XML, do not all catch parsing errors.

  1. We should discuss if parsing errors should raise errors or be dealt with smoothly, as now, with print statements. Any modification to the way these errors are dealt with should be done homogeneously across the various methods.
  2. Tests should be added to cover the lines dealing with these parsing errors. That will require a special LHE file, if not several.

PyPI API token set wrong

Hi @lukasheinrich after trying to release v0.0.6 the workflow hit an error that resulted in a

HTTPError: 403 Forbidden from https://upload.pypi.org/legacy/
Invalid or non-existent authentication information. See https://pypi.org/help/#invalid-auth for more information.

Given the "Getting 403 forbidden from TestPypi" Discussion on pypa/gh-action-pypi-publish I think the PyPI API token is set wrong. Can you regenerate one and set it as a new PYPI_PASSWORD GitHub secret (or give me access to the PyPI page so I can do it)?

license?

Hello Lukas!

I copied this code into another repo. I hope that is ok, but I guess without an explicit license I am breaking the law somehow.

Want to add one so I am no longer a thief? ;-)

Chris

Consider graphviz as alternative to pydot and others

I use alternatives to get PDF/PNG/... files (if needed) in DecayLanguage: why not simply go for https://pypi.org/project/graphviz/ for dot files, which provide you with everything you need? To be honest I started by using pydot (still in use in DecayLanguage) but this is also seems unmaintained whereas graphviz is equivalent and very well maintained, hence I'm going to make the switch asap.

Originally posted by @eduardo-rodrigues in #53 (comment)

awkward-array API

would be nice to be able to read this with awkward-array. Perhaps using some of the "behavioral interfaces" we've been discussing with @jpivarski

Add PyPI API token

Loosely related to Issue #14, it would be nice to be able to use GitHub Actions's CI systems to cut releases and distribute to PyPI. To do this, a PyPI API token needs to get registered for the project, which can only be done by someone with maintainer privileges. This was done successfully in pyhf PR 638.

Installing the library including all the requirements.

Hello everyone,

I was just installing the library using pip install pylhe. However, after running this command I check that awkward was not installed despite the fact that it is in the variable install_requires in the setup.cfg file. I just show the output from running pip install pylhe:

Requirement already satisfied: pylhe in path_to_packages/site-packages (0.2.1)
Requirement already satisfied: networkx~=2.2 in path_to_packages/site-packages (from pylhe) (2.8)
Requirement already satisfied: tex2pix~=0.3 in path_to_packages/site-packages (from pylhe) (0.3.1)
Requirement already satisfied: particle~=0.14 in path_to_packages/site-packages (from pylhe) (0.20.1)
Requirement already satisfied: attrs>=19.2 in path_to_packages/site-packages (from particle~=0.14->pylhe) (21.4.0)
Requirement already satisfied: hepunits>=2.0.0 in path_to_packages/site-packages (from particle~=0.14->pylhe) (2.2.0)

I installed the library with all the requirements without problem cloning the repo and running python setup.py install. If there is a different standard for a correct installation with all the requirements I will be happy to hear it.

docs: Add list of pylhe citations

Add tests for KeyErrors

In #139 (comment)_ it was mentioned that it would be good to add tests for instances in which KeyErrors are raised. This would be good to have in general to bring up the coverage.

Error while reading events with an additional string '#aMCatNLO'

Problem:
pylhe raises error when I try to read a lhe file generated with aMC@NLO

>>> lhe = readLHE("events.lhe")
>>> for x in lhe:
...     print(x)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 201, in readLHE
    particle_objs = [LHEParticle.fromstring(p) for p in particles]
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 201, in <listcomp>
    particle_objs = [LHEParticle.fromstring(p) for p in particles]
  File "/store/sw/anaconda3/envs/ds4hep/lib/python3.9/site-packages/pylhe/__init__.py", line 71, in fromstring
    return cls(**dict(zip(cls.fieldnames, map(float, string.split()))))
ValueError: could not convert string to float: '#aMCatNLO'

The example LHE event block can be seen:

  <event>
  4      0 0.15466170E+09 0.14090630E+02 0.75467716E-02 0.20024995E+00
       21 -1    0    0  501  502 0.00000000E+00 0.00000000E+00 0.34453202E+03 0.34453202E+03 0.00000000E+00 0.0000E+00 0.9000E+01
       21 -1    0    0  502  503 0.00000000E+00 0.00000000E+00 -.14406924E+00 0.14406924E+00 0.00000000E+00 0.0000E+00 0.9000E+01
        5  1    1    2  501    0 0.41326091E+00 -.45455539E+01 0.23557499E+03 0.23566607E+03 0.47000000E+01 0.0000E+00 0.9000E+01
       -5  1    1    2    0  503 -.41326091E+00 0.45455539E+01 0.10881297E+03 0.10901002E+03 0.47000000E+01 0.0000E+00 0.9000E+01
#aMCatNLO 1  5  2  0  0 0.00000000E+00 0.00000000E+00 9  0  0 0.10000000E+01 0.35498143E+00 0.24370745E+01 0.00000000E+00 0.00000000E+00
  <rwgt>
   <wgt id='1001'> 0.15466E+09 </wgt>
   <wgt id='1002'> 0.24435E+09 </wgt>
   <wgt id='1003'> 0.76688E+08 </wgt>
   <wgt id='1004'> 0.11073E+09 </wgt>
   <wgt id='1005'> 0.17493E+09 </wgt>
   <wgt id='1006'> 0.54902E+08 </wgt>
   <wgt id='1007'> 0.23857E+09 </wgt>
   <wgt id='1008'> 0.37692E+09 </wgt>
   <wgt id='1009'> 0.11829E+09 </wgt>
  </rwgt>
  </event>

My Suggestion:
Modify the read_lhe() function in the __init__.py

  • minor fix to strip whitespaces
  • parse LHEParticles not more than the number of particles defined in the 1st line of the event data
def read_lhe(filepath):
    try:
        with _extract_fileobj(filepath) as fileobj:
            for event, element in ET.iterparse(fileobj, events=["end"]):
                if element.tag == "event":
                    data = element.text.strip().split("\n")
                    eventdata, particles = data[0], data[1:]
                    eventinfo = LHEEventInfo.fromstring(eventdata)
                    particles = particles[:int(eventinfo.nparticles)]
                    particle_objs = [LHEParticle.fromstring(p) for p in particles]
                    yield LHEEvent(eventinfo, particle_objs)
    except ET.ParseError as excep:
        print("WARNING. Parse Error:", excep)
        return

Reconsider tex2pix dependency

tex2pix is a core dependency of pylhe but the last release of v0.3.1 was in 2016, so it seems to be safely "unmaintained" at this point. The source code also doesn't seem to be publicly available on GitHub.

๐Ÿ‘‹ @agbuckley, as the author of tex2pix (thanks! ๐Ÿ™‡), can you comment on if there is any plans to maintain tex2pix in the future? Or would you recommend that we look for an alternative that is more probable to be patched if there are issues that are found. I can imagine that you already have more than enough on your plate and being responsible for maintaining another codebase might very understandably not be high on the list. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.