Coder Social home page Coder Social logo

pm4py / pm4py-core Goto Github PK

View Code? Open in Web Editor NEW
677.0 35.0 267.0 107.74 MB

Public repository for the PM4Py (Process Mining for Python) project.

Home Page: https://pm4py.fit.fraunhofer.de

License: GNU General Public License v3.0

Python 56.82% Dockerfile 0.05% Jupyter Notebook 43.06% HTML 0.08%
process-mining machine-learning data-mining data-science python

pm4py-core's People

Contributors

awth13 avatar chiaods avatar denisesato avatar dvdwndrdl avatar eduardogoulart1 avatar fit-alessandro-berti avatar fit-daniel-schuster avatar fit-humam-kourani avatar fit-pm4py avatar fit-sebastiaan-van-zelst avatar fmannhardt avatar har-leen-kaur avatar henrikkirchmann avatar javert899 avatar jdaniloc avatar jonahschueller avatar lsabi avatar lschade avatar madhubs08 avatar marcopegoraro avatar ngynkvn avatar nicoelbert avatar pastra98 avatar phi1-h avatar renovate-bot avatar renovate-bot-fit avatar s-j-v-zelst avatar t0asty avatar victorshima avatar xiaohuwang0921 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pm4py-core's Issues

Inconsistent parameters `version` vs `variant`

Upfront, thanks for the great work and sorry for nitpicking, this is only to improve/understand the API.

I noted that sometimes the factory apply methods take a variant parameter and sometimes a version parameter? Is this distinction on purpose and if so what is the difference?

variants_filter doesn't return all variants with decreasingFactor set to 0.0

The variants_filter doesn't return all the variants when the decreasingFactor is set to 0.0 . Here is the code that I wrote for the same :

all_traces = variants_filter.apply_auto_filter(log, parameters={"decreasingFactor": 1.0})
all_traces = case_statistics.get_variant_statistics(all_traces)
all_traces

It returns all traces except the one with the lowest frequency.

Operating System : Windows 10

Log: convert xes to csv

The exported file (csv file) which is the result of xes file including start and end timestamp only has one timestamp which is a complete timestamp. (test on BPIChallenge 2017)

Performance of Inductive miner

I tried to run the well known sepsis case with the Inductive miner and I found that (averages of 10 tests)

xes_importer.import_log took 1.2 seconds
inductive_miner.apply took 65 seconds

Prom does this in approx. 1 second.
You say that you've seen better performance than other open-source tools.

So I was wondering what was going wrong?

(sidenote none of my threads were showing significant work being done)

By the way: I was also implementing some process discovery techniques for my thesis. So I would consider collaborating on this project if the runtime of the algorithms get managable (approx. 1 second per run for a smaller set like sepsis).

Alignment: TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Code to reproduce:

Python 3.7.2 (C:\Users\felixm\AppData\Local\Continuum\anaconda3\envs\r-reticulate\python.exe)
Reticulate 1.10.0.9004 REPL -- A Python interpreter in R.
>>> from pm4py.objects.log.importer.xes import factory as xes_importer
>>> log = xes_importer.import_log("C:/Process Mining/R/pm4py/../heur.xes")
>>> log
[{'attributes': {'concept:name': 'Case1.0'}, 'events': [{'concept:name': 'a', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 57, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '135', '.order': '135'}, '..', {'concept:name': 'e', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 58, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '136', '.order': '136'}]}, '....', {'attributes': {'concept:name': 'Case6.0'}, 'events': [{'concept:name': 'a', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 57, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '137', '.order': '137'}, '..', {'concept:name': 'e', 'lifecycle:transition': 'complete', 'org:resource': 'UNDEFINED', 'time:timestamp': datetime.datetime(2010, 11, 4, 19, 59, 15, tzinfo=datetime.timezone.utc), 'concept:instance': '139', '.order': '139'}]}]
>>> import pm4py
>>> from pm4py.algo.conformance.alignments import factory as align_factory
>>> 
>>> import os
>>> from pm4py.objects.petri.importer import pnml as pnml_importer
>>> 
>>> net, initial_marking, final_marking = pnml_importer.import_net("C:/Process Mining/R/pm4py/../heur.pnml")
>>> 
>>> alignments = align_factory.apply_log(log, net, initial_marking, final_marking)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

This occurs with the following Petri net and log of the Heuristics Miner example:
heur.zip

This is the full stack trace:

  TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Detailed traceback: 
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 35, in apply
    final_marking, parameters, version)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 147, in apply_log
    log))
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 146, in <lambda>
    version=version),
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\factory.py", line 79, in apply_trace
    return VERSIONS[version](trace, petri_net, initial_marking, final_marking, parameters)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 120, in apply
    alignments.utils.SKIP, ret_tuple_as_trans_desc=ret_tuple_as_trans_desc)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 141, in apply_sync_prod
    ret_tuple_as_trans_desc=ret_tuple_as_trans_desc)
  File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\algo\conformance\alignments\versions\state_equation_a_star.py", line 159, in __search
    tp = SearchTuple(curr.g + h, curr.g, h, curr.m, curr.p, curr.t, x, __trust_solution(x))

Could this be an issue with the LP solver?

Non-semantic transitions ids of the tracenet when doing alignments

This is related to #40, the identifiers used for the trace net when returned when using PARAM_ALIGNMENT_RESULT_IS_SYNC_PROD_AWARE (which works well btw) are simply enumerated t1 ... tn.

I wonder whether this is a good choice since the transition ids do not refer to anything that the caller of the method recognises and do not match in between traces. So t1 for the first trace may be a different activity than t1 for the second trace. Is it a requirement of the alignment that the transitions are sequentially numbered or could this be replaced by actual event activity labels (maybe with suffixes to disambiguate duplicates).

I am not sure what is best here, but simply raising this for discussion.

Filtering on timeframe

Hi,

timestamp_filter.filter_traces_intersecting seems to not produce the output that is expected, i.e. return traces that have at least one event inside the specified time interval.

Consider the example:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'running-example.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = timestamp_filter.filter_traces_intersecting(trace_log, "2011-01-05 00:00:00", "2011-01-10 23:59:59")

All cases in the example have events inside the time interval above and outside it, either from one side (all events happen before 2011-01-11 or after 2011-01-05) except case id 3, where there are events events happening on both sides of the interval and within the interval. Why is this trace not considered intersecting?

Tested on Python 3.6.5, Ubuntu 16.04, pm4py version 1.0.23 / commit e0f408a

Issue in importing boolean from XES

An issue has been signaled by Williams Rizzi of the FBK, about importing boolean from XES.

It seems that the function "bool" of Python did not the job correctly.

bug in inductive miner

The inductive miner has a bug:

When given the log [,<b,c>], it will ignore and produce ->(b,c).
Expected results is: x(a,->(b,c))

Create PANDAS Log Object

There is some code that uses pandas as an underlying storage for event data.
This is mainly for performance reasons.

Somehow, there should be an encapsulation around this object, such that, in principle we can use it with any existing algorithm.

Alignment is missing names of invisible transitions

The alignments returned by PM4PY use the label of the transitions instead of its identifier/name in the resulting list:
https://github.com/pm4py/pm4py-source/blob/afa8d9d1eb1a87ee57a2ba3a75d7d292963590ff/pm4py/algo/conformance/alignments/versions/state_equation_a_star.py#L185

This leads to invisible transitions appearing as None in the result, for example:

{'alignment': [('>>', None), 
('Registration', 'Registration'),
 ('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'), 
('Check-out', 'Check-out'), 
('>>', None), 
('Registration', 'Registration'), 
('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'),
 ('Check-out', 'Check-out'), 
('>>', None)], 'cost': 3, 'visited_states': 15, 'queued_states': 43, 'traversed_arcs': 50, 'fitness': 1.0}

This makes it hard to use the alignment for visualization and or analysis purposes. Also, when having transitions with duplicate labels some computation would be required to infer which one was executed.

Could you simply use the name instead of the label? Since the label can then be looked up.

Proposed Names Change: TraceLog -> EventLog, EventLog -> EventStream

After some discussion with some people "in the community".

I was requested to rename TraceLogs to EventLogs (as this is what they really are) and to rename EventLogs in PM4Py to something else.
In that sense, the EventLogs in PM4Py are equal to (finite) Event Streams in numerous academic articles.

I therefore to propose to do the renaming/refactoring ASAP:
-> EventLog -> EventStream
-> TraceLog -> EventLog

Bug in Inductive Miner

I found another bug in the inductive miner.
On the simple log [<b,c,e,j>, <b,d,j>, <f,h,g,i,k>] (attached a .txt (remove it to get the .xes)
pm4py_imdf_issue.xes.txt
), the IM fails to recognize the xor construct between c,e and d.

I have attached a log that contains the data, the output of the IM in pm4py, and the output of the same miner (dfg based) from ProM.

Please fix (ASAP!), because this is a rather severe problem and should not be in the release.

pm4py_imdf_correct
pm4py_imdf_issue

B.Sc. Thesis Topics

  • Alignment Visualization
  • Alignments on DFG's (S. Leemans Approach)
  • Dotted Chart
  • SNA Metrics in PM4Py (according to original paper!)

Refactor Visualization Library of Transition Systems

The visualization library of transition systems needs to be refactored.
It is now tightly coupled to the discovery of transition systems, yet it should be independent.

States and edges should show the name as well as any possible decoration that is attached to them (if this is possible).

The source of the transition system, e.g., "view based" should not be of relevance.

Index column in CSV exporting

Actually, event logs are exported in CSV formats with Pandas.

The export includes an index column that is not very useful, and causes problems with some CSV parsers (one of them is included in ProM).

A solution of this is removing the index column from the exported CSV file

Provide a conda package

Thanks for the awesome work. I am working on a R port using the reticulate package, which provides a bridge between R and Python. At least on Windows this relies on a conda environment into which I can install pm4py using PIP, but I fear that this might lead to version conflicts if people already have installed conda packages that are required by other packages.

Would it be possible to provide a conda package for pm4py?

Add option to deepcopy in log converison

Currently the log conversion only creates a new "enclosing" log object (e.g. EventLog) around the existing source log object.
In a way, one could see this as taking an alternative view on the log that acts as a source of the conversion.

It might be useful to add an option to the conversion (maybe through the parameter object?) that allows us to apply python's deep copy functionality.
This avoids side effects of changing the elements of either the source- or target log of the conversion.

Add timeout parameter to long running operations

A features request I just came across while running some experiments in which the precision calculation took forever. Would it be possible to add a timeout parameter to potentially long running operations?

Not sure how to best to this in Python, but usually the cleanest way is to check from time to time in a loop within the long running operation.

Failed to install 'python-graphviz' with conda and Python 3.7

Trying to install pm4py according to the installation guide, I encounter the following error:

Collecting python-graphviz (from -r requirements.txt (line 9))
  Could not find a version that satisfies the requirement python-graphviz (from -r requirements.txt (line 9)) (from versions: )
No matching distribution found for python-graphviz (from -r requirements.txt (line 9))

Full trace is here:

PS C:\Process Mining\Python\pm4py> pip install -r requirements.txt
Collecting ciso8601 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/61/d8/e17159b6037a26e3facdfeec8750dd17312284a915c5e0da79562b982035/ciso8601-2.0.1.tar.gz
Collecting cvxopt (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/df/06/1d498703387acd6593592a57cd8fee740e5e8e53a0feed453afc024b9b92/cvxopt-1.2.1-cp37-cp37m-win_amd64.whl (803kB)
    100% |████████████████████████████████| 808kB 1.6MB/s
Collecting dataclasses (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/26/2f/1095cdc2868052dd1e64520f7c0d5c8c550ad297e944e641dbf1ffbb9a5d/dataclasses-0.6-py3-none-any.whl
Collecting flask (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/7f/e7/08578774ed4536d3242b14dacb4696386634607af824ea997202cd0edb4b/Flask-1.0.2-py2.py3-none-any.whl (91kB)
    100% |████████████████████████████████| 92kB 2.0MB/s
Collecting flask-cors (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/d1/db/f3495569d5c3e2bdb9fb8a66c54503364abb6f35a9da2227cf5c9c50dc42/Flask_Cors-3.0.6-py2.py3-none-any.whl
Collecting lxml (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/56/95/ec75d37b4244c00a49584a29bce2adc11b81a6295eadf410b3bfd8f7fa7d/lxml-4.2.4-cp37-cp37m-win_amd64.whl (3.6MB)
    100% |████████████████████████████████| 3.6MB 1.6MB/s
Collecting graphviz (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/47/87/313cd4ea4f75472826acb74c57f94fc83e04ba93e4ccf35656f6b7f502e2/graphviz-0.9-py2.py3-none-any.whl
Collecting pandas (from -r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/58/a8/03e5fe0edbc522e46cb27df2abfb4266814129253d8462f38bc704a76a2a/pandas-0.23.4-cp37-cp37m-win_amd64.whl (7.9MB)
    100% |████████████████████████████████| 7.9MB 1.7MB/s
Collecting python-graphviz (from -r requirements.txt (line 9))
  Could not find a version that satisfies the requirement python-graphviz (from -r requirements.txt (line 9)) (from versions: )
No matching distribution found for python-graphviz (from -r requirements.txt (line 9))

My environment is:

PS C:\Process Mining\Python\pm4py> pip list
Package           Version
----------------- ---------
astroid           2.0.4
certifi           2018.8.24
colorama          0.3.9
isort             4.3.4
lazy-object-proxy 1.3.1
mccabe            0.6.1
pip               18.0
pylint            2.1.1
setuptools        40.2.0
six               1.11.0
wheel             0.31.1
wincertstore      0.2
wrapt             1.10.11

and

PS C:\Process Mining\Python\pm4py> conda list
# packages in environment at C:\Users\felixm\AppData\Local\Continuum\anaconda3\envs\pm4py:
#
# Name                    Version                   Build  Channel
astroid                   2.0.4                     <pip>
certifi                   2018.8.24                py37_1
colorama                  0.3.9                     <pip>
isort                     4.3.4                     <pip>
lazy-object-proxy         1.3.1                     <pip>
mccabe                    0.6.1                     <pip>
pip                       10.0.1                   py37_0
pip                       18.0                      <pip>
pylint                    2.1.1                     <pip>
python                    3.7.0                hea74fb7_0
setuptools                40.2.0                   py37_0
six                       1.11.0                    <pip>
vc                        14                   h0510ff6_3
vs2015_runtime            14.0.25123                    3
wheel                     0.31.1                   py37_0
wincertstore              0.2                      py37_0
wrapt                     1.10.11                   <pip>

Looking at the available packages:
https://pypi.org/search/?q=graphviz
it seems that you want to use this one?
https://pypi.org/project/graphviz-python/

Check Overall pythonic compliance

most of us have a java background.

some stuff might not yet be pythonic enough, e.g., we use CamelCasing instead of the_pythonic_way_of_naming_things in the /tests folder.
We need to fix this prior to the release, as this is of major importance for the adoption of the library!!!

Visualization in IPython

Hey,

this is my first time working with GitHub, so i am not really sure how to communicate the best way:

Using IPython (in Spyder) i was not able to visualize the diagrams wihle testing the following example:

from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.visualization.petrinet import factory as vis_factory

log = xes_importer.import_log('') # i of course added the correct path
net, initial_marking, final_marking = alpha_miner.apply(log)
gviz = vis_factory.apply(net, initial_marking, final_marking)
vis_factory.view(gviz)

After doing some research i solved the problem by editing the gview.py in pm4py-source/pm4py/visualization/common/ as follows:

def view(gviz):
"""
View the diagram

Parameters
-----------
gviz
    GraphViz diagram
"""
is_ipynb = False

try:
    get_ipython()
    is_ipynb = True
except NameError:
    # we are not inside Jupyter, do nothing
    pass

if is_ipynb:
    from IPython.display import Image

added / edited Lines

    image = Image(open(gviz.render(), "rb").read())
    from IPython.display import display
    return display(image)
else:
    return gviz.view()

the added commands lead to the correct visualization in IPython-console.

Inductive Miner generates flower

I use inductive Miner on the log which has loop. But, it generated a flower in model.
pm4py
And Prom's result is not, (directly follows graph->Process Tree->Petri Net)
prom
Here is the log
log.zip

Refactoring SNA

SNA needs to be refactored a bit to follow the general factory method design

we should also consider implementing the "classical" metrics as described in the paper by v.d. Aalst et al (for example arbitrary distance SNA with fall factor beta)

Tweaks in Alignment Code

Some possible tweaks in alignment code:

  • remove f value (is derived on g+h anyways)
  • fix lt of SearchTuple
  • do check if calculating exact h-value is different from approx. h value, if not: no need to push again on the queue

How to install the library ?

Hi,

Im trying to find an alternative to bupaR in python and I found this one which seem quite complete but I can't figure out how to install the library.
I downloaded the zip but there no setup.py and running the requirements.txt only install the needed library.
Any help ?

Thanks

Alignments on generic accepting Petri net

The only version of the alignments implemented until now is contained in state_equation_a_star and resolves the marking equation to retrieve the heuristic for the A* algorithm.

In future, we shall integrate a version of the alignments that possibly works on any accepting Petri net.

Provide `get_dataframe_from_log` as helper function

I would like to convert an EventLog back to a Pandas DataFrame. Currently, I am using the method get_dataframe_from_log from pandas_csv_exp.py for this purpose.

I have two questions;
(1) Is this efficient? I saw that the method walks through the whole log to build a sequence from an EventLog, which seems to already be a sequence.

(2) Would it be possible tot provide this method in some more generic helper/util package? It seems wrong to rely on the CSV exporter only to convert between EventLog and Pandas DataFrame.

Inductive Miner ignores empty traces

The inductive miner ignores empty traces
For example, simple log [<>, <a,b>] results in ->(a,b), yet it sould result in x(tau,->(a,b)) (double checked in ProM)

Limit `str` representation of TraceLog (and possibly EventLog)

Currently when using the Python str function on a TraceLog it will print its entire contents. Even for small logs with can be a large amount of data. Often the string representation is used for debugging purposes and some tools cannot handle large amounts of text or it leads to performance issues.

Would you consider returning a shortened representation of the event log? I don't know abput Python but in R the data frames have a nice property of printing the head 10 elements and the tail 10 elements along with some information about the object.

move sample from log object

The sample function should be removed from the log object.
Since it does not represent an operation that manipulates the object itself, nor a property of the event log, it should be populated in a separate file / module.

Log: sort function refactor

The sort function of the log should appropriately reflect the sort function on an arbitrary list.
Currently it integrates specific event log knowledge and always assumes that the key is applied on event level (which is not necessarily the case).

The lambda function currently in the call to the underlying sort function should be provided by the user.

Keep case identifier as attribute when converting to a TraceLog

It is my understanding while reading the source code that the attribute according to which a TraceLog is built from an EventLog cannot be retrieved after conversion. At least I did not see a way to retrieve it.

If you would keep it in some kind of attribute as part of the log object that would make it easier to convert between the R eventlog and the pm4py TraceLog. I guess it would also facilitate some other operations which rely on a trace being identifiable.

Compatibility with Pandas 0.24

Hi everyone,

Last night there was the release of Pandas 0.24.0

At the moment, our code has some issues with that version, so the requirements will be restricted to Pandas 0.23.4

Sorry for this.

redesign process tree

The process tree class should also describe classes for the operators (not only for the tree itself).
We should try to fix this asap, i.e. before building on top of this too much.

Filtering on attribute values

Hi,

first of all, great work on building this process mining package! It's a great tool to play around with
and helps understanding the process mining concepts even better.

So I have followed the pm4py tutorial and wanted to report a behavior that I find suspicious. I am using Python 3.6.5 on Linux and the project's master branch (1.0.23 / e0f408a).

It's about filtering on attribute values. Consider the following snippet:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'running-example.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = attributes_filter.apply_auto_filter(trace_log, parameters = {constants.PARAMETER_CONSTANT_ATTRIBUTE_KEY: "org:resource", "decreasingFactor": 0.6})

The resource count before applying the filter is
[('Sara', 12), ('Mike', 11), ('Pete', 7), ('Ellen', 7), ('Sean', 3), ('Sue', 2)]

After applying the filter, a trace containing the resources (Mike, Mike, Sean, Sara, Ellen) is trimmed to (Mike, Mike, Sean, Sara). On the whole, the resource count becomes
[('Mike', 10), ('Sara', 9), ('Pete', 7), ('Ellen', 5), ('Sean', 3), ('Sue', 2)]

I would have expected that traces would have been trimmed completely of Sarah, Mike (11/12 > 0.6), Pete (7/11 > 0.6) and Ellen (7/7 > 0.6). How exactly is the trimming of traces based on the frequency of resources happening?

Import CSV

Hi,

I get the following error when I import a CSV file.
KeyError: 'concept:name'

My code:

from pm4py.objects.log.importer.csv import factory as csv_importer

event_log = csv_importer.import_log('grouped/'+file_name)

from pm4py.objects.log import transform

trace_log = transform.transform_event_log_to_trace_log(event_log,case_glue="caseID")

net, initial_marking, final_marking = inductive_miner.apply(trace_log)

gviz = vis_factory.apply(net, initial_marking, final_marking)
vis_factory.view(gviz)

It seems I should change the activity column to 'name'. Is there a way to not do so?

Filtering on variants

Regarding filtering on variants how does the decreasing factor play a role in the apply_auto_filter method?

For example:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'receipt.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = variants_filter.apply_auto_filter(trace_log)
variants_count = case_statistics.get_variant_statistics(trace_log)
variants_count = sorted(variants_count, key = lambda x : x['count'], reverse=True)

Before filtering, the variant counts were:
713, 123, 116, ..., 12, 10, 10, ..., 1.

After filtering, the most common are indeed kept, as it says in the documentation:
713, 123, 116, ..., 12.

The decreasing factor is explained in the context of filtering out the most frequent items in a multiset. How is it applied here, when removing the less frequent items and why is the cutoff at 12 in this example? Thanks.

Inductive Miner Cut Maximization

The inductive miner uses cuts of maximal size 2 rather than maximizing a found non-maximal cut.

We need to fix this (fast) as we (expect to) not have perfect replay fitness guarantees because of this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.