pm4py / pm4py-core Goto Github PK

View Code? Open in Web Editor NEW

702.0 37.0 277.0 107.44 MB

Public repository for the PM4Py (Process Mining for Python) project.

Home Page: https://pm4py.fit.fraunhofer.de

License: GNU General Public License v3.0

Python 56.82% Dockerfile 0.05% Jupyter Notebook 43.05% HTML 0.08%

process-mining machine-learning data-mining data-science python

pm4py-core's Introduction

pm4py

pm4py is a python library that supports (state-of-the-art) process mining algorithms in python. It is open source (licensed under GPL) and intended to be used in both academia and industry projects. pm4py is managed and developed by Process Intelligence Solutions (https://processintelligence.solutions/). pm4py was initially developed at the Fraunhofer Institute for Applied Information Technology FIT.

Documentation / API

The full documentation of pm4py can be found at https://processintelligence.solutions/

First Example

A very simple example, to whet your appetite:

import pm4py

if __name__ == "__main__":
    log = pm4py.read_xes('<path-to-xes-log-file.xes>')
    net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log)
    pm4py.view_petri_net(net, initial_marking, final_marking, format="svg")

Installation

pm4py can be installed on Python 3.9.x / 3.10.x / 3.11.x / 3.12.x by invoking: pip install -U pm4py

pm4py is also running on older Python environments with different requirements sets, including:

Python 3.8 (3.8.10): third_party/old_python_deps/requirements_py38.txt

Requirements

pm4py depends on some other Python packages, with different levels of importance:

Essential requirements: numpy, pandas, deprecation, networkx
Normal requirements (installed by default with the pm4py package, important for mainstream usage): graphviz, intervaltree, lxml, matplotlib, pydotplus, pytz, scipy, tqdm
Optional requirements (not installed by default): requests, pyvis, jsonschema, workalendar, pyarrow, scikit-learn, polars, openai, pyemd, pyaudio, pydub, pygame, pywin32, pygetwindow, pynput

Release Notes

To track the incremental updates, please refer to the CHANGELOG file.

Third Party Dependencies

As scientific library in the Python ecosystem, we rely on external libraries to offer our features. In the /third_party folder, we list all the licenses of our direct dependencies. Please check the /third_party/LICENSES_TRANSITIVE file to get a full list of all transitive dependencies and the corresponding license.

Citing pm4py

If you are using pm4py in your scientific work, please cite pm4py as follows:

Alessandro Berti, Sebastiaan van Zelst, Daniel Schuster. (2023). PM4Py: A process mining library for Python. Software Impacts, 17, 100556. DOI | Article Link

BiBTeX:

@article{pm4py,  
title = {PM4Py: A process mining library for Python},  
journal = {Software Impacts},  
volume = {17},  
pages = {100556},  
year = {2023},  
issn = {2665-9638},  
doi = {https://doi.org/10.1016/j.simpa.2023.100556},  
url = {https://www.sciencedirect.com/science/article/pii/S2665963823000933},  
author = {Alessandro Berti and Sebastiaan van Zelst and Daniel Schuster},  
}

pm4py-core's People

Contributors

Stargazers

Watchers

Forkers

edugonza s-j-v-zelst binhuang94 fit-daniel-schuster hherbertb javert899 jianfeidahai kleberstroeh abevieiramota lvzheqi visahak jwllee deepankerkoul evduijnhoven sebastiangj culibin pablitocho teamopohl mbafrani soliujing caoyukun0430 yhjohn163 md2059 zhengyuxin ericair001 fnedopetalski vijaysumanth yctsai116 teppoeva luisfsts civvy jpsteege chiaods hongmeiqq vanlynntriestocode dirkzhao621 phihei tcolo leogomat fmannhardt oli-on-tour kaushlesh-mphasis dominiquesommers a2gpt001 harisanthosh luohaoz sandy4321 batermj jingmouren alfian878787 ahmedhaies tejamoy bankchalo shahrukh-badar czwilling fabiante marcopegoraro adambanham brunompacheco sureshdontha sthagen georghildebrand guptam mirjamc flakon84 indrawaspada jnabende jlayton75 vladatrus deyvison ze1598 iliam gunnarwindsand jaldamiz boltmaud brucelit fakedrug nilsb1201 harrywang yetanothersmith johannesengelhardt hicala selenecodes ai-works lschade snambia yoannlgd1 hieule23 aktasbaris yae-leanix aniketb416 yjdave yanqinghao sangelastro taxxer denzoned godwin-stint cpn-paladino edomnitsch adamburkegh

pm4py-core's Issues

Inductive Miner ignores empty traces

The inductive miner ignores empty traces
For example, simple log [<>, <a,b>] results in ->(a,b), yet it sould result in x(tau,->(a,b)) (double checked in ProM)

Issue in importing boolean from XES

An issue has been signaled by Williams Rizzi of the FBK, about importing boolean from XES.

It seems that the function "bool" of Python did not the job correctly.

Filtering on variants

Regarding filtering on variants how does the decreasing factor play a role in the apply_auto_filter method?

For example:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'receipt.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = variants_filter.apply_auto_filter(trace_log)
variants_count = case_statistics.get_variant_statistics(trace_log)
variants_count = sorted(variants_count, key = lambda x : x['count'], reverse=True)

Before filtering, the variant counts were:
713, 123, 116, ..., 12, 10, 10, ..., 1.

After filtering, the most common are indeed kept, as it says in the documentation:
713, 123, 116, ..., 12.

The decreasing factor is explained in the context of filtering out the most frequent items in a multiset. How is it applied here, when removing the less frequent items and why is the cutoff at 12 in this example? Thanks.

Visualization in IPython

Hey,

this is my first time working with GitHub, so i am not really sure how to communicate the best way:

Using IPython (in Spyder) i was not able to visualize the diagrams wihle testing the following example:

from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.visualization.petrinet import factory as vis_factory

log = xes_importer.import_log('') # i of course added the correct path
net, initial_marking, final_marking = alpha_miner.apply(log)
gviz = vis_factory.apply(net, initial_marking, final_marking)
vis_factory.view(gviz)

After doing some research i solved the problem by editing the gview.py in pm4py-source/pm4py/visualization/common/ as follows:

def view(gviz):
"""
View the diagram

Parameters
-----------
gviz
    GraphViz diagram
"""
is_ipynb = False

try:
    get_ipython()
    is_ipynb = True
except NameError:
    # we are not inside Jupyter, do nothing
    pass

if is_ipynb:
    from IPython.display import Image

added / edited Lines

    image = Image(open(gviz.render(), "rb").read())
    from IPython.display import display
    return display(image)
else:
    return gviz.view()

the added commands lead to the correct visualization in IPython-console.

remove old log transform

Remove pm4py.objects.log.transform.py

Add option to deepcopy in log converison

Currently the log conversion only creates a new "enclosing" log object (e.g. EventLog) around the existing source log object.
In a way, one could see this as taking an alternative view on the log that acts as a source of the conversion.

It might be useful to add an option to the conversion (maybe through the parameter object?) that allows us to apply python's deep copy functionality.
This avoids side effects of changing the elements of either the source- or target log of the conversion.

Alignment: TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Code to reproduce:

Python 3.7.2 (C:\Users\felixm\AppData\Local\Continuum\anaconda3\envs\r-reticulate\python.exe)
Reticulate 1.10.0.9004 REPL -- A Python interpreter in R.
>>> from pm4py.objects.log.importer.xes import factory as xes_importer
>>> log = xes_importer.import_log("C:/Process Mining/R/pm4py/../heur.xes")
>>> log
[{'attributes': {'concept:name': 'Case1.0'}, 'events': [{'concept:name': 'a', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 57, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '135', '.order': '135'}, '..', {'concept:name': 'e', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 58, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '136', '.order': '136'}]}, '....', {'attributes': {'concept:name': 'Case6.0'}, 'events': [{'concept:name': 'a', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 57, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '137', '.order': '137'}, '..', {'concept:name': 'e', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 59, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '139', '.order': '139'}]}]
>>> import pm4py
>>> from pm4py.algo.conformance.alignments import factory as align_factory
>>> 
>>> import os
>>> from pm4py.objects.petri.importer import pnml as pnml_importer
>>> 
>>> net, initial_marking, final_marking = pnml_importer.import_net("C:/Process Mining/R/pm4py/../heur.pnml")
>>> 
>>> alignments = align_factory.apply_log(log, net, initial_marking, final_marking)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

This occurs with the following Petri net and log of the Heuristics Miner example:
heur.zip

This is the full stack trace:

  TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Detailed traceback: 
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 35, in apply
    final_marking, parameters, version)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 147, in apply_log
    log))
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 146, in <lambda>
    version=version),
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 79, in apply_trace
    return VERSIONS[version](trace, petri_net, initial_marking, final_marking, parameters)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 120, in apply
    alignments.utils.SKIP, ret_tuple_as_trans_desc=ret_tuple_as_trans_desc)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 141, in apply_sync_prod
    ret_tuple_as_trans_desc=ret_tuple_as_trans_desc)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 159, in __search
    tp = SearchTuple(curr.g + h, curr.g, h, curr.m, curr.p, curr.t, x, __trust_solution(x))

Could this be an issue with the LP solver?

B.Sc. Thesis Topics

Alignment Visualization
Alignments on DFG's (S. Leemans Approach)
Dotted Chart
SNA Metrics in PM4Py (according to original paper!)

Create PANDAS Log Object

There is some code that uses pandas as an underlying storage for event data.
This is mainly for performance reasons.

Somehow, there should be an encapsulation around this object, such that, in principle we can use it with any existing algorithm.

running_example.xes not loadable in ProM

It is somehow not possible to load the running_example.xes file in ProM.

Could this be due to loading / saving it through PM4Py, or is this not possible?

Precision and generalization measures

It would be nice to have an implementation of the approaches described in http://www.processmining.org/_media/blogs/pub2016/a_unified_approach_for_measuring_precision_and_generalization_based_on_anti-alignments.pdf for both precision and generalization

Log: convert xes to csv

The exported file (csv file) which is the result of xes file including start and end timestamp only has one timestamp which is a complete timestamp. (test on BPIChallenge 2017)

Filtering on attribute values

Hi,

first of all, great work on building this process mining package! It's a great tool to play around with
and helps understanding the process mining concepts even better.

So I have followed the pm4py tutorial and wanted to report a behavior that I find suspicious. I am using Python 3.6.5 on Linux and the project's master branch (1.0.23 / e0f408a).

It's about filtering on attribute values. Consider the following snippet:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'running-example.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = attributes_filter.apply_auto_filter(trace_log, parameters = {constants.PARAMETER_CONSTANT_ATTRIBUTE_KEY: "org:resource", "decreasingFactor": 0.6})

The resource count before applying the filter is
[('Sara', 12), ('Mike', 11), ('Pete', 7), ('Ellen', 7), ('Sean', 3), ('Sue', 2)]

After applying the filter, a trace containing the resources (Mike, Mike, Sean, Sara, Ellen) is trimmed to (Mike, Mike, Sean, Sara). On the whole, the resource count becomes
[('Mike', 10), ('Sara', 9), ('Pete', 7), ('Ellen', 5), ('Sean', 3), ('Sue', 2)]

I would have expected that traces would have been trimmed completely of Sarah, Mike (11/12 > 0.6), Pete (7/11 > 0.6) and Ellen (7/7 > 0.6). How exactly is the trimming of traces based on the frequency of resources happening?

Inductive Miner generates flower

I use inductive Miner on the log which has loop. But, it generated a flower in model.

And Prom's result is not, (directly follows graph->Process Tree->Petri Net)

Here is the log
log.zip

Conversion from Petri net into a BPMN

Hi,
No module named 'pm4py.objects.conversion.petri_to_bpmn'. It seems that this class doesn't exist!

Provide possibility to represent the Petri net in Graph Viz with an horizontal layout instead of vertical

As in title, expecially in publishing, it would be nice to have the option to output the Petri net in an horizontal way instead of vertical way.

redesign process tree

The process tree class should also describe classes for the operators (not only for the tree itself).
We should try to fix this asap, i.e. before building on top of this too much.

Move unzipping of .xes.gz to factory

Currently each xes importer unzips a file if it ends with .xes.gz.
This should be handled at the factory level

Limit `str` representation of TraceLog (and possibly EventLog)

Currently when using the Python str function on a TraceLog it will print its entire contents. Even for small logs with can be a large amount of data. Often the string representation is used for debugging purposes and some tools cannot handle large amounts of text or it leads to performance issues.

Would you consider returning a shortened representation of the event log? I don't know abput Python but in R the data frames have a nice property of printing the head 10 elements and the tail 10 elements along with some information about the object.

Add performance statistics to transition systems discovery learner

We can integrate lists of durations into the data attributes of states and transitions of the discovered transition systems.

(develop branch) refactor location of log to dataframe conversion

Provide a conda package

Thanks for the awesome work. I am working on a R port using the reticulate package, which provides a bridge between R and Python. At least on Windows this relies on a conda environment into which I can install pm4py using PIP, but I fear that this might lead to version conflicts if people already have installed conda packages that are required by other packages.

Would it be possible to provide a conda package for pm4py?

Refactor Visualization Library of Transition Systems

The visualization library of transition systems needs to be refactored.
It is now tightly coupled to the discovery of transition systems, yet it should be independent.

States and edges should show the name as well as any possible decoration that is attached to them (if this is possible).

The source of the transition system, e.g., "view based" should not be of relevance.

Inductive Miner Cut Maximization

The inductive miner uses cuts of maximal size 2 rather than maximizing a found non-maximal cut.

We need to fix this (fast) as we (expect to) not have perfect replay fitness guarantees because of this.

Non-semantic transitions ids of the tracenet when doing alignments

This is related to #40, the identifiers used for the trace net when returned when using PARAM_ALIGNMENT_RESULT_IS_SYNC_PROD_AWARE (which works well btw) are simply enumerated t1 ... tn.

I wonder whether this is a good choice since the transition ids do not refer to anything that the caller of the method recognises and do not match in between traces. So t1 for the first trace may be a different activity than t1 for the second trace. Is it a requirement of the alignment that the transitions are sequentially numbered or could this be replaced by actual event activity labels (maybe with suffixes to disambiguate duplicates).

I am not sure what is best here, but simply raising this for discussion.

Check Overall pythonic compliance

most of us have a java background.

some stuff might not yet be pythonic enough, e.g., we use CamelCasing instead of the_pythonic_way_of_naming_things in the /tests folder.
We need to fix this prior to the release, as this is of major importance for the adoption of the library!!!

Index column in CSV exporting

Actually, event logs are exported in CSV formats with Pandas.

The export includes an index column that is not very useful, and causes problems with some CSV parsers (one of them is included in ProM).

A solution of this is removing the index column from the exported CSV file

How to install the library ?

Hi,

Im trying to find an alternative to bupaR in python and I found this one which seem quite complete but I can't figure out how to install the library.
I downloaded the zip but there no setup.py and running the requirements.txt only install the needed library.
Any help ?

Thanks

Log: sort function refactor

The sort function of the log should appropriately reflect the sort function on an arbitrary list.
Currently it integrates specific event log knowledge and always assumes that the key is applied on event level (which is not necessarily the case).

The lambda function currently in the call to the underlying sort function should be provided by the user.

Failed to install 'python-graphviz' with conda and Python 3.7

Trying to install pm4py according to the installation guide, I encounter the following error:

Collecting python-graphviz (from -r requirements.txt (line 9))
  Could not find a version that satisfies the requirement python-graphviz (from -r requirements.txt (line 9)) (from versions: )
No matching distribution found for python-graphviz (from -r requirements.txt (line 9))

Full trace is here:

PS C:\Process Mining\Python\pm4py> pip install -r requirements.txt
Collecting ciso8601 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/61/d8/e17159b6037a26e3facdfeec8750dd17312284a915c5e0da79562b982035/ciso8601-2.0.1.tar.gz
Collecting cvxopt (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/df/06/1d498703387acd6593592a57cd8fee740e5e8e53a0feed453afc024b9b92/cvxopt-1.2.1-cp37-cp37m-win_amd64.whl (803kB)
    100% |████████████████████████████████| 808kB 1.6MB/s
Collecting dataclasses (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/26/2f/1095cdc2868052dd1e64520f7c0d5c8c550ad297e944e641dbf1ffbb9a5d/dataclasses-0.6-py3-none-any.whl
Collecting flask (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/7f/e7/08578774ed4536d3242b14dacb4696386634607af824ea997202cd0edb4b/Flask-1.0.2-py2.py3-none-any.whl (91kB)
    100% |████████████████████████████████| 92kB 2.0MB/s
Collecting flask-cors (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/d1/db/f3495569d5c3e2bdb9fb8a66c54503364abb6f35a9da2227cf5c9c50dc42/Flask_Cors-3.0.6-py2.py3-none-any.whl
Collecting lxml (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/56/95/ec75d37b4244c00a49584a29bce2adc11b81a6295eadf410b3bfd8f7fa7d/lxml-4.2.4-cp37-cp37m-win_amd64.whl (3.6MB)
    100% |████████████████████████████████| 3.6MB 1.6MB/s
Collecting graphviz (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/47/87/313cd4ea4f75472826acb74c57f94fc83e04ba93e4ccf35656f6b7f502e2/graphviz-0.9-py2.py3-none-any.whl
Collecting pandas (from -r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/58/a8/03e5fe0edbc522e46cb27df2abfb4266814129253d8462f38bc704a76a2a/pandas-0.23.4-cp37-cp37m-win_amd64.whl (7.9MB)
    100% |████████████████████████████████| 7.9MB 1.7MB/s
Collecting python-graphviz (from -r requirements.txt (line 9))
  Could not find a version that satisfies the requirement python-graphviz (from -r requirements.txt (line 9)) (from versions: )
No matching distribution found for python-graphviz (from -r requirements.txt (line 9))

My environment is:

PS C:\Process Mining\Python\pm4py> pip list
Package           Version
----------------- ---------
astroid           2.0.4
certifi           2018.8.24
colorama          0.3.9
isort             4.3.4
lazy-object-proxy 1.3.1
mccabe            0.6.1
pip               18.0
pylint            2.1.1
setuptools        40.2.0
six               1.11.0
wheel             0.31.1
wincertstore      0.2
wrapt             1.10.11

and

PS C:\Process Mining\Python\pm4py> conda list
# packages in environment at C:\Users\felixm\AppData\Local\Continuum\anaconda3\envs\pm4py:
#
# Name                    Version                   Build  Channel
astroid                   2.0.4                     <pip>
certifi                   2018.8.24                py37_1
colorama                  0.3.9                     <pip>
isort                     4.3.4                     <pip>
lazy-object-proxy         1.3.1                     <pip>
mccabe                    0.6.1                     <pip>
pip                       10.0.1                   py37_0
pip                       18.0                      <pip>
pylint                    2.1.1                     <pip>
python                    3.7.0                hea74fb7_0
setuptools                40.2.0                   py37_0
six                       1.11.0                    <pip>
vc                        14                   h0510ff6_3
vs2015_runtime            14.0.25123                    3
wheel                     0.31.1                   py37_0
wincertstore              0.2                      py37_0
wrapt                     1.10.11                   <pip>

Looking at the available packages:
https://pypi.org/search/?q=graphviz
it seems that you want to use this one?
https://pypi.org/project/graphviz-python/

move sample from log object

The sample function should be removed from the log object.
Since it does not represent an operation that manipulates the object itself, nor a property of the event log, it should be populated in a separate file / module.

Filtering on timeframe

Hi,

timestamp_filter.filter_traces_intersecting seems to not produce the output that is expected, i.e. return traces that have at least one event inside the specified time interval.

Consider the example:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'running-example.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = timestamp_filter.filter_traces_intersecting(trace_log, "2011-01-05 00:00:00", "2011-01-10 23:59:59")

All cases in the example have events inside the time interval above and outside it, either from one side (all events happen before 2011-01-11 or after 2011-01-05) except case id 3, where there are events events happening on both sides of the interval and within the interval. Why is this trace not considered intersecting?

Tested on Python 3.6.5, Ubuntu 16.04, pm4py version 1.0.23 / commit e0f408a

Alignments on generic accepting Petri net

The only version of the alignments implemented until now is contained in state_equation_a_star and resolves the marking equation to retrieve the heuristic for the A* algorithm.

In future, we shall integrate a version of the alignments that possibly works on any accepting Petri net.

Refactoring SNA

SNA needs to be refactored a bit to follow the general factory method design

we should also consider implementing the "classical" metrics as described in the paper by v.d. Aalst et al (for example arbitrary distance SNA with fall factor beta)

Two different parameters for `case glue` (or case identifier)

Two different parameters are used for the case identifier. One here:
https://github.com/pm4py/pm4py-source/blob/97f72523f15a3aca65215965af7dc7a0a528eb4b/pm4py/objects/log/util/general.py#L4
The other here:
https://github.com/pm4py/pm4py-source/blob/97f72523f15a3aca65215965af7dc7a0a528eb4b/pm4py/util/constants.py#L4

Unfortunately, they are not compatible so there is no way to call an alignment with a Pandas data frame at the moment.

Alignment is missing names of invisible transitions

The alignments returned by PM4PY use the label of the transitions instead of its identifier/name in the resulting list:
https://github.com/pm4py/pm4py-source/blob/afa8d9d1eb1a87ee57a2ba3a75d7d292963590ff/pm4py/algo/conformance/alignments/versions/state_equation_a_star.py#L185

This leads to invisible transitions appearing as None in the result, for example:

{'alignment': [('>>', None), 
('Registration', 'Registration'),
 ('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'), 
('Check-out', 'Check-out'), 
('>>', None), 
('Registration', 'Registration'), 
('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'),
 ('Check-out', 'Check-out'), 
('>>', None)], 'cost': 3, 'visited_states': 15, 'queued_states': 43, 'traversed_arcs': 50, 'fitness': 1.0}

This makes it hard to use the alignment for visualization and or analysis purposes. Also, when having transitions with duplicate labels some computation would be required to infer which one was executed.

Could you simply use the name instead of the label? Since the label can then be looked up.

Provide `get_dataframe_from_log` as helper function

I would like to convert an EventLog back to a Pandas DataFrame. Currently, I am using the method get_dataframe_from_log from pandas_csv_exp.py for this purpose.

I have two questions;
(1) Is this efficient? I saw that the method walks through the whole log to build a sequence from an EventLog, which seems to already be a sequence.

(2) Would it be possible tot provide this method in some more generic helper/util package? It seems wrong to rely on the CSV exporter only to convert between EventLog and Pandas DataFrame.

Log Animation

Hi everyone,
is a Log Animation function scheduled for future developments?
I was thinking about something like this R version:
https://github.com/fmannhardt/processanimateR

Thanks for the great work,
C.

Proposed Names Change: TraceLog -> EventLog, EventLog -> EventStream

After some discussion with some people "in the community".

I was requested to rename TraceLogs to EventLogs (as this is what they really are) and to rename EventLogs in PM4Py to something else.
In that sense, the EventLogs in PM4Py are equal to (finite) Event Streams in numerous academic articles.

I therefore to propose to do the renaming/refactoring ASAP:
-> EventLog -> EventStream
-> TraceLog -> EventLog

Add timeout parameter to long running operations

A features request I just came across while running some experiments in which the precision calculation took forever. Would it be possible to add a timeout parameter to potentially long running operations?

Not sure how to best to this in Python, but usually the cleanest way is to check from time to time in a loop within the long running operation.

Tweaks in Alignment Code

Some possible tweaks in alignment code:

remove f value (is derived on g+h anyways)
fix lt of SearchTuple
do check if calculating exact h-value is different from approx. h value, if not: no need to push again on the queue

Performance of Inductive miner

I tried to run the well known sepsis case with the Inductive miner and I found that (averages of 10 tests)

xes_importer.import_log took 1.2 seconds
inductive_miner.apply took 65 seconds

Prom does this in approx. 1 second.
You say that you've seen better performance than other open-source tools.

So I was wondering what was going wrong?

(sidenote none of my threads were showing significant work being done)

By the way: I was also implementing some process discovery techniques for my thesis. So I would consider collaborating on this project if the runtime of the algorithms get managable (approx. 1 second per run for a smaller set like sepsis).

Compatibility with Pandas 0.24

Hi everyone,

Last night there was the release of Pandas 0.24.0

At the moment, our code has some issues with that version, so the requirements will be restricted to Pandas 0.23.4

Sorry for this.

Inconsistent parameters `version` vs `variant`

Upfront, thanks for the great work and sorry for nitpicking, this is only to improve/understand the API.

I noted that sometimes the factory apply methods take a variant parameter and sometimes a version parameter? Is this distinction on purpose and if so what is the difference?

variants_filter doesn't return all variants with decreasingFactor set to 0.0

The variants_filter doesn't return all the variants when the decreasingFactor is set to 0.0 . Here is the code that I wrote for the same :

all_traces = variants_filter.apply_auto_filter(log, parameters={"decreasingFactor": 1.0})
all_traces = case_statistics.get_variant_statistics(all_traces)
all_traces

It returns all traces except the one with the lowest frequency.

Operating System : Windows 10

Import CSV

Hi,

I get the following error when I import a CSV file.
KeyError: 'concept:name'

My code:

from pm4py.objects.log.importer.csv import factory as csv_importer

event_log = csv_importer.import_log('grouped/'+file_name)

from pm4py.objects.log import transform

trace_log = transform.transform_event_log_to_trace_log(event_log,case_glue="caseID")

net, initial_marking, final_marking = inductive_miner.apply(trace_log)

gviz = vis_factory.apply(net, initial_marking, final_marking)
vis_factory.view(gviz)

It seems I should change the activity column to 'name'. Is there a way to not do so?

Keep case identifier as attribute when converting to a TraceLog

It is my understanding while reading the source code that the attribute according to which a TraceLog is built from an EventLog cannot be retrieved after conversion. At least I did not see a way to retrieve it.

If you would keep it in some kind of attribute as part of the log object that would make it easier to convert between the R eventlog and the pm4py TraceLog. I guess it would also facilitate some other operations which rely on a trace being identifiable.

bug in inductive miner

The inductive miner has a bug:

When given the log [,<b,c>], it will ignore and produce ->(b,c).
Expected results is: x(a,->(b,c))

PARAM_ACTIVITY_KEY used in evaluation is inconsistent with generic parameter

The parameter used for the token replay in the evaluation module:
https://github.com/pm4py/pm4py-source/blob/afa8d9d1eb1a87ee57a2ba3a75d7d292963590ff/pm4py/evaluation/factory.py#L9
is inconsistent with the parameter defined globally here:
https://github.com/pm4py/pm4py-source/blob/master/pm4py/util/constants.py

It would simplify the API if this could be aligned.

Bug in Inductive Miner

I found another bug in the inductive miner.
On the simple log [<b,c,e,j>, <b,d,j>, <f,h,g,i,k>] (attached a .txt (remove it to get the .xes)
pm4py_imdf_issue.xes.txt
), the IM fails to recognize the xor construct between c,e and d.

I have attached a log that contains the data, the output of the IM in pm4py, and the output of the same miner (dfg based) from ProM.

Please fix (ASAP!), because this is a rather severe problem and should not be in the release.