Coder Social home page Coder Social logo

Performance of Inductive miner about pm4py-core HOT 8 CLOSED

pm4py avatar pm4py commented on August 28, 2024
Performance of Inductive miner

from pm4py-core.

Comments (8)

Javert899 avatar Javert899 commented on August 28, 2024

Hi Luttik

Your issue will be worked in the next 2 weeks

I need a confirmation though, is the Sepsis log you are using the one with 1050 cases?

from pm4py-core.

Luttik avatar Luttik commented on August 28, 2024

Thanks @Javert899, it is indeed the one with 1050 cases.

from pm4py-core.

Javert899 avatar Javert899 commented on August 28, 2024

Thanks and sorry for prompting you again

I am in holiday and I am doing tests with my Atom x5-z8700 powered netbook (one of the worst CPU available, 5 times slower than a I7-7550U) on the specific log and, while there is always scope to gain performance, I could not replicate the terrible level of performance you described

Indeed, running the following script:

from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.inductive import factory as inductive_miner
import time
aa = time.time()
log = xes_importer.apply("sepsis.xes", variant="nonstandard")
bb = time.time()
print("log imported ",(bb-aa))
for i in range(10):
cc = time.time()
net, initial_marking, final_marking = inductive_miner.apply(log)
dd = time.time()
print(i,"model calculated ",(dd-cc))

I get the following results:

log imported 2.201340436935425
0 model calculated 15.711833477020264
1 model calculated 11.852041006088257
2 model calculated 11.16152548789978
3 model calculated 12.034109592437744
4 model calculated 13.190895080566406
5 model calculated 11.990545749664307
6 model calculated 11.35164999961853
7 model calculated 12.128112316131592
8 model calculated 11.036775350570679
9 model calculated 10.43460464477539

That means around 2s average execution time on a I7-7550U

Part of the performance disadvantage could be explained with the fact that in the current version, at the end of the Inductive Miner application on the log, a reduction algorithm is applied in order to check if some of the hidden (skip/loop) transitions are effectively useful. This behavior, that could be removed in future versions, could be disabled through the parameter "enable_reduction", so the Inductive Miner could be applied with net, initial_marking, final_marking = inductive_miner.apply(log, parameters={"enable_reduction": False}) like in the following example script:

from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.inductive import factory as inductive_miner
import time
aa = time.time()
log = xes_importer.apply("sepsis.xes", variant="nonstandard")
bb = time.time()
print("log imported ",(bb-aa))
for i in range(10):
cc = time.time()
net, initial_marking, final_marking = inductive_miner.apply(log, parameters={"enable_reduction": False})
dd = time.time()
print(i,"model calculated ",(dd-cc))

That obtains the following performance on a Intel Atom x5-z8700:

log imported 2.0938966274261475
0 model calculated 0.25001978874206543
1 model calculated 0.26564526557922363
2 model calculated 0.234389066696167
3 model calculated 0.25001978874206543
4 model calculated 0.20313763618469238
5 model calculated 0.20313811302185059
6 model calculated 0.2031404972076416
7 model calculated 0.23941516876220703
8 model calculated 0.4732391834259033
9 model calculated 0.27600789070129395

So a good option could be to try that parameter, that is currently defaulted to True, but in future versions may be defaulted to False (although generally the reduction algorithm is very fast).

Anyhow, your performance disadvantage that shows 6 times slower performance than an Atom x5-z8700 CPU, suggests me that something else is going wrong in your set-up. Could I ask you some details on your configuration? Like the operating system, the Python version (Miniconda? Anaconda? 3.6?), the CPU, the available RAM,

from pm4py-core.

Luttik avatar Luttik commented on August 28, 2024

Ive tested some more and found that the issue only exist within the pycharm debugger (which I tend to use by default, and which usually doesnt give a noticable performance hit).

I've tested a bit more (using both python 3.6.8 and 3.7.2)

Results from command line using python 3.6:

Without parameters={"enable_reduction": False}

log imported  0.2352139949798584
0 model calculated  1.8907153606414795
1 model calculated  1.8897316455841064
2 model calculated  1.9047293663024902
3 model calculated  1.890270471572876
4 model calculated  1.9163618087768555
5 model calculated  1.8657050132751465
6 model calculated  1.9346880912780762
7 model calculated  1.9047298431396484
8 model calculated  1.897731065750122
9 model calculated  1.9077517986297607

With parameters={"enable_reduction": False}

log imported  0.2252049446105957
0 model calculated  0.044039249420166016
1 model calculated  0.057051897048950195
2 model calculated  0.03903484344482422
3 model calculated  0.03903627395629883
4 model calculated  0.03903675079345703
5 model calculated  0.05304837226867676
6 model calculated  0.057051897048950195
7 model calculated  0.039035797119140625
8 model calculated  0.03803420066833496
9 model calculated  0.03903508186340332

from pm4py-core.

Luttik avatar Luttik commented on August 28, 2024

Another thing that is noteworthy to mention are the results when using debugger With parameters={"enable_reduction": False}
(which is a ~400x performance increase)

log imported  0.45915865898132324
0 model calculated  0.15416908264160156
1 model calculated  0.1780402660369873
2 model calculated  0.1293489933013916
3 model calculated  0.19217395782470703
4 model calculated  0.17015457153320312
5 model calculated  0.14813494682312012
6 model calculated  0.16514968872070312
7 model calculated  0.1611461639404297
8 model calculated  0.13812541961669922
9 model calculated  0.14012670516967773

from pm4py-core.

Javert899 avatar Javert899 commented on August 28, 2024

Thanks for your efforts

The way threads are managed inside the reduction routine seems pretty Pycharm debugger unfriendly, while having an efficiency penalty that is noticeable

Now things look clear and a solution will be proposed in the next release (in 2 weeks time)

from pm4py-core.

Luttik avatar Luttik commented on August 28, 2024

Thank you!

from pm4py-core.

Javert899 avatar Javert899 commented on August 28, 2024

Hi,

In the current release (1.0.3), that you can get through pip install -U pm4py, the reduction routine has been improved in terms of performance, expecially if executed inside the Pycharm debugger, thanks to a better threads management.

Performance without using the Pycharm debugger (w/o reduction, "old" reduction, "new" reduction) on some considered logs:

Log | Time without reduction | Time old reduction | Time new reduction
01_running-example.xes.gz | 0.0020067691802978516 | 0.0058557987213134766 | 0.0054013729095458984
02_teleclaims.xes.gz | 0.08221936225891113 | 0.1330554485321045 | 0.13355684280395508
03_repairExample.xes.gz | 0.04612302780151367 | 0.23680973052978516 | 0.0764472484588623
04_reviewing.xes.gz | 0.021056175231933594 | 0.39980459213256836 | 0.12076592445373535
05_bpic2017.xes.gz | 0.40908098220825195 | 0.6976630687713623 | 0.6778769493103027
06_bpic2013_incidents.xes.gz | 0.20857572555541992 | 2.1720473766326904 | 1.181793212890625
07_bpic2013_problems.xes.gz | 0.014037847518920898 | 0.16789650917053223 | 0.09397721290588379
08_receipt.xes.gz | 0.03609633445739746 | 0.1930091381072998 | 0.11531186103820801
09_Billing.xes | 0.8983669281005859 | 5.217352390289307 | 3.642712116241455
10_roadtraffic.xes | 1.105943202972412 | 3.347074270248413 | 3.359720230102539
11_SEPSIS.xes | 0.07218694686889648 | 1.9693987369537354 | 0.4550590515136719

So, on the SEPSIS log, the performance outside of the debugger is more than 4 times faster.

Inside the Pycharm debugger, the execution time on my workstation is now 2.6 seconds (so, performance inside the debugger is MUCH greater)

Hoping that this is useful for your purposes

from pm4py-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.