gertjanssenswillen / processmapr Goto Github PK
View Code? Open in Web Editor NEW!! repository moved to https://github.com/bupaverse/processmapR !! This repo is read-only from now one.
License: Other
!! repository moved to https://github.com/bupaverse/processmapR !! This repo is read-only from now one.
License: Other
When trying dotted_chart
on a grouped event log, the grouping columns are missing in the plot data. For example:
sepsis %>% group_by(resource) %>% dotted_chart()
Error: At least one layer must contain all faceting variables: `resource`.
* Plot is missing `resource`
* Layer 1 is missing `resource`
I am using the process_map within an R MARKDOWN... When there are too many cases the code wait untill user confirms he wants to plot regardless possible ununderstandable graph. Is it possible to pass programmatically the confirmation, e.g., using a parameter lot_of_traces_behavious = c("ask", "N", Y")
Thanks in advance for the attention
myevnlog %>% process_map()
You are about to draw a process map with a lot of traces.
This might take a long time. Try to filter your event log. Are you sure you want to proceed?
Y/N: Y
Warning messages:
1: In bind_rows_(x, .id) :
binding factor and character vector, coercing into character vector
2: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
dotted_chart(x = "absolute", y = "start")
or plotly_dotted_chart()
work well, if the start timestamps of the cases are all different. However, sometimes many cases start at the same time, if there's some batch behaviour. For these cases, the dotted_chart()
function seems to arrange the cases according to their position in the log or even in reverse order. This can produce strange graphs, such as this one (see the area marked with the red ellipse):
If several cases start at the same time, the dotted chart with the "start"-argument should arrange them according to the timestamp of the second activities in the case, then the third activities, etc. The phenomenon is particularly relevant, if the timestamps are not very granular and only consist of dates.
Or do I somehow have to prearrange the cases in the log, before using the Dotted Chart?
Hi Gert,
Thank you for all these fundamental improvements of the bupaR package.
I have a problem with the process_map () function: when I try to produce a process map, RStudio's Viewer reports this error "Error: syntax in row 37 near" ("." I found that this occurs when the size of my data increases (for example, by moving filter_activity_frequency from 0.1 to 0.2).
Do you have some advice?
Thank you
Wanted to officially document what I saw here in case anyone else sees similar issue. The minute I execute devtools::install_github("gertjanssenswillen/processmapr", dependencies=TRUE), I end up with the map essentially upside down (start is still on top and end on bottom). Below was the code I attempted to execute:
data.map2 <- data %>%
process_map(type_nodes = frequency(value = "absolute_case"), type_edges = performance(FUN = median, units = "mins"))
Along with the following error:
Error in data.frame(id = 1:n, from = from, to = to, rel = rel, stringsAsFactors = FALSE) : arguments imply differing number of rows: 2, 0 In addition: Warning messages: 1: In bind_rows_(x, .id) : binding factor and character vector, coercing into character vector 2: In bind_rows_(x, .id) : binding character and factor vector, coercing into character vector 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf
Running the command patients %>% process_map()
results in the following error:
Error in set_global_graph_attrs(., attr = "rankdir", value = "LR", attr_type = "graph") : could not find function "set_global_graph_attrs"
I'm using the latest packages of bupaR (0.3.2) and DiagrammeR (1.0.0).
According to DiagrammeR issue #277 it appears that this function has been replaced by add_global_graph_attrs
.
EDIT: this issue also occurs with function resource_map()
Thanks for your continuing development of bupaR!
Here's a suggestion for the plotly_dotted_chart()
function.
Log %>% plotly_dotted_chart()
seems to only work without arguments.
If I enter plotly_dotted_chart(x = "relative", sort = "duration")
I get an error, saying there's an unused argument.
Suggestion 1: Make plotly_dotted_chart
available for relative dotted charts as well. This would be very useful, since the user can then get the Case-ID from long-running cases directly from the graph and investigate specific cases further in the raw data.
Suggestion 2: Create a function plotly_trace_explorer
. This would be useful too, since the trace_explorer
graph is sometimes hard to read if there are many traces, due to the shortening of activity names. The tooltips from plotly would alleviate this.
I'm using the CRAN-versions (v 0.4.1 from bupaR and v 0.3.2 from processmapR).
Hi Gert,
as you have already intended, I am using bupaR for a project. I noted some incongruences when I use the process_map() function and the precedence_matrix(type = "absolute") function: it seems that the arcs in process_map doesn't fit well the same information showed by precedence_matrix.
Is it possible?
Thanks
Hi, Gert. I'm using your useful package and I have an isseu on processmapR.
The precedence_matrix function return the following error:
Error in mutate_impl(.data, dots) :
Not compatible with requested type: [type=character; target=integer]
Thanks.
There is a prompt when one asks for generation of a processmap with more than 750 traces. The problem is that this prompt stops knittr from doing its work. Please remove it from the function.
The next code creates the problem:
if(n_traces(eventlog) > 750) {
message("You are about to draw a process map with a lot of traces.
This might take a long time. Try to filter your event log. Are you sure you want to proceed?")
answer <- readline("Y/N: ")
if(answer != "Y")
break()
}
Process_map output doesn't appear to layout the graph correctly, e.g.
event_log %>%
process_map(
rankdir = "RL"
)
produces a graph where the nodes seem to be randomly all over the place, and not in order from left-to-right for the most frequent path.
Understand that this is probably caused by DiagrammeR, and maybe I could modify the DiagrammR Graph object myself to obtain a more intuitive rankdir. Have you seen this before?
Hello, I have a couple questions in regards to how best to handle activities that occur at the same time as other activities or overlap with other activities.
I'm trying to convert a series of customer and staff activities that can occur as part of a case, but often they can be created and/or completed at the same time.
All in all its quite a complex process with a variety of activities and users (Over 100 different activity types) but when filtering the frequency of the activities and then trying to visualise the performance process maps I seem to get a large amount of negative durations on the edges between activities on which I don't know how I should be handling.
My questions are:
I've replicated a similar issue in one of repos using the loan application event log data set which is available here jessevent/loan-app-process. Funnily enough it also causes processanimateR tokens to traverse backwards and float off of edges to different activities which is actually how I first identified I was experiencing something odd.
This is essentially the format/code i'm using to transform my activity instances into the event log.
example_log_4 %>%
mutate(activity_instance = 1:nrow(.)) %>%
gather(status, timestamp, schedule, start, complete) %>%
filter(!is.na(timestamp)) %>%
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
)
Thanks so much for any assistance, the whole bupaR framework is an outstanding piece of work and an amazing achievement. Personally i've spent a long time looking for a framework like this and am quite excited with the progress and future to come!
When supplying an event log with a (possibly) unrelated column names 'time' process_map fails with an error:
> process_map(log)
Error in summarise_impl(.data, dots) :
Column `time` must have a unique name
> traceback()
13: stop(list(message = "Column `time` must have a unique name",
call = summarise_impl(.data, dots), cppstack = list(file = "",
line = -1L, stack = "C++ stack not available on this system")))
12: summarise_impl(.data, dots)
11: summarise.tbl_df(., start_time = min(time), end_time = max(time),
min_order = min(.order))
10: summarize(., start_time = min(time), end_time = max(time), min_order = min(.order))
9: function_list[[k]](value)
8: withVisible(function_list[[k]](value))
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: grouped_log %>% summarize(start_time = min(time), end_time = max(time),
min_order = min(.order)) at process_map.R#91
1: process_map(log)
Code to reproduce:
x <- data.frame(case = c(1), time = c("foobar"), timestamp = c(as.POSIXct(Sys.time())), activity = c("test"), activity_instance_id = c(1), resource = c("bar"), lifecyle = "complete")
log <- eventlog(x, case_id = "case", timestamp = "timestamp", activity_id = "activity", activity_instance_id = "activity_instance_id", resource_id = "resource", lifecycle_id = "lifecyle")
process_map(log)
Session info:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Norwegian Bokmål_Norway.1252 LC_CTYPE=Norwegian Bokmål_Norway.1252 LC_MONETARY=Norwegian Bokmål_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian Bokmål_Norway.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 petrinetR_0.2.0 processmonitR_0.1.0 xesreadR_0.2.2 processmapR_0.3.2.9000 eventdataR_0.2.0 edeaR_0.8.1
[8] bupaR_0.4.1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
[15] tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 lubridate_1.7.4 lattice_0.20-35 visNetwork_2.0.4 utf8_1.1.4 assertthat_0.2.0 digest_0.6.16 mime_0.5
[9] R6_2.2.2 cellranger_1.1.0 plyr_1.8.4 backports_1.1.2 httr_1.3.1 pillar_1.3.0 rlang_0.2.2 lazyeval_0.2.1
[17] readxl_1.1.0 shinyTime_0.2.1 rstudioapi_0.7 data.table_1.11.4 miniUI_0.1.1.1 DiagrammeR_1.0.0 downloader_0.4 htmlwidgets_1.2
[25] igraph_1.2.2 munsell_0.5.0 shiny_1.1.0 broom_0.5.0 compiler_3.5.1 influenceR_0.1.0 rgexf_0.15.3 httpuv_1.4.5
[33] modelr_0.1.2 pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.4 gridExtra_2.3 XML_3.98-1.16 fansi_0.3.0 viridisLite_0.3.0
[41] crayon_1.3.4 withr_2.1.2 later_0.7.3 grid_3.5.1 nlme_3.1-137 jsonlite_1.5 xtable_1.8-2 gtable_0.2.0
[49] magrittr_1.5 scales_1.0.0 cli_1.0.0 stringi_1.2.4 viridis_0.5.1 promises_1.0.1 ggthemes_4.0.1 xml2_1.2.0
[57] brew_1.0-6 RColorBrewer_1.1-2 tools_3.5.1 glue_1.3.0 hms_0.4.2 Rook_1.1-1 yaml_2.2.0 colorspace_1.3-2
[65] rvest_0.3.2 plotly_4.8.0 bindr_0.1.1 haven_1.1.2
Is it possible to integrate also the QUANTILE metric inside performance()?
As showed in the example below, I'm hoping to get from process_map or other graphs, the quantile performance evaluation.
The classic 3rd, 1st qualtile for example, or better, the prefered quantile.
For example the number 0.32 for 32nd occurrence.
Even better if avaiable for both, nodes and edges.
patients %>%
process_map(performance(quantile, 0,32 , "days"))
Hope you the best
@gertjanssenswillen
Hi,
I tried the recently added fixed_node_pos
parameter, and it worked fine with the patient example in another issue but when I tried it with my data that have self edge or multiple edges going to multiple nodes, everything get stacked and the edge values get behind.
Any way to control the edges ? Maybe adding more curve to the edges would fix this.
Edit; this is how it is by default
But even when trying another kind of disposition, it still make straight edges:
The precedence_matrix()
is sometimes the better alternative than a process_map()
in case of a "spaghetti-process" with many variants. It would therefore be useful to have a performance view for the precedence diagrams too.
This would have the same syntax as the process_map
function, e.g.
precedence_matrix(performance(median, "days"))
and its other options.
Hi, after installing the new version of processmapR from Github I got the following error message (see below), I have just used this code to test:
library(bupaR)
patients %>%
process_map(type = frequency("relative"))
Error in FUN(X[[i]], ...): object '.order' not found
Traceback:
1. patients %>% process_map(type = frequency("relative"))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. process_map(., type = frequency("relative"))
10. eventlog %>% as.data.frame() %>% droplevels %>% select(act = !(!activity_id_(eventlog)),
. aid = !(!activity_instance_id_(eventlog)), case = !(!case_id_(eventlog)),
. time = !(!timestamp_(eventlog)), .order) %>% group_by(act,
. aid, case) %>% summarize(start_time = min(time), end_time = max(time),
. min_order = min(.order))
11. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
12. eval(quote(`_fseq`(`_lhs`)), env, env)
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. `_fseq`(`_lhs`)
15. freduce(value, `_function_list`)
16. function_list[[i]](value)
17. select(., act = !(!activity_id_(eventlog)), aid = !(!activity_instance_id_(eventlog)),
. case = !(!case_id_(eventlog)), time = !(!timestamp_(eventlog)),
. .order)
18. select.data.frame(., act = !(!activity_id_(eventlog)), aid = !(!activity_instance_id_(eventlog)),
. case = !(!case_id_(eventlog)), time = !(!timestamp_(eventlog)),
. .order)
19. select_vars(names(.data), !(!(!quos(...))))
20. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
21. map(.x[matches], .f, ...)
22. lapply(.x, .f, ...)
23. FUN(X[[i]], ...)
Hi,
This is an awesome package! Thank you! I have a very large process map and cannot see the individual nodes. Is there a way to increase their size? I have tried to use visnetwork and end up with a cool ball of sphaghetti.. Any suggestions would be greatly appreciated!!! Thank you!!
Sincerely,
tom
It would be nice to have a secondary metric in the process_map
function to show both frequencies and waiting times on the edges. The secondary metric could have a slightly smaller font size (see Disco).
process_map(performance(sum, "days"))
is an alternative to show the severity of waiting times, but a secondary metric would provide even more context.
Hy,
I'm guessing if is possible to add 2 more intresting features for the processmap and trace_explorer functions.
1. process_map
Could be usefull to extract the raw.data behind the plot, in a raw form, usable for furthers manipulations and so on. Some processmap (raw.data=T), that provide back a data.frame with all the data strucutre behind the plot (edges and nodes values), maybe inside a data.frame.
2. trace_explorer
Also there, a raw.data parameter has been allready provided. But instead of return just the first occurrence for every trace, I'm looking for a full raw data.frame return.
I'm meaning that now, if two patients A and B, following the user guide, make the same trace for example 1-2-3, only the trace regarding the patient A, is returned back. Is it possible to extend the raw.data to all the patients?
Hope you the best,
Antonio
The edeaR
resource metric levels are hyphen separated, for example "resource-activity"
.
precedence_matrix()
types are underscore separated, for example, "relative_antecedent"
.
This should be made consistent.
When the type
is absolute, the scale label should read Absolute Frequency
but it actually reads Relative Frequency
.
See https://github.com/gertjanssenswillen/processmapR/blob/master/R/precedence_matrix.plot.R#L29
Hi Gert,
I'm using the process_map() function. I'm not able to understand why the activity is sometimes printed white and sometimes printed black when the node in process map is colored white/pink.
The problem is that it is impossible to read activity in some cases .
Thanks.
Hi!.
process_map give warning concerning new conception in rlang:
m2 %>%
+ processmapR::process_map(type = processmapR::frequency("relative_case"))
Warning message:
Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0.
Please use the non-prefixed form or `!!` instead.
# Bad:
rlang::expr(mean(rlang::UQ(var) * 100))
# Ok:
rlang::expr(mean(UQ(var) * 100))
# Good:
rlang::expr(mean(!!var * 100))
This warning is displayed once per session.
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rTRNG_4.20-1 anytime_0.3.4 data.table_1.12.2 tictoc_1.0 datapasta_3.0.0
[6] tidylog_0.1.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2
[11] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] nlme_3.1-140 bitops_1.0-6 matrixStats_0.54.0 lubridate_1.7.4
[5] RColorBrewer_1.1-2 httr_1.4.0 ggsci_2.9 profvis_0.3.6
[9] tools_3.6.1 backports_1.1.4 R6_2.4.0 lazyeval_0.2.2
[13] colorspace_1.4-1 withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3
[17] compiler_3.6.1 cli_1.1.0 rvest_0.3.4 xml2_1.2.0
[21] influenceR_0.1.0 plotly_4.9.0 scales_1.0.0 checkmate_1.9.4
[25] RApiDatetime_0.0.4 digest_0.6.20 pkgconfig_2.0.2 htmltools_0.3.6
[29] htmlwidgets_1.3 rlang_0.4.0 ggthemes_4.2.0 readxl_1.3.1
[33] rstudioapi_0.10 pryr_0.1.4 shiny_1.3.2 visNetwork_2.0.7
[37] generics_0.0.2 zoo_1.8-6 jsonlite_1.6 rgexf_0.15.3
[41] RCurl_1.95-4.12 magrittr_1.5 rapportools_1.0 Rcpp_1.0.1
[45] munsell_0.5.0 viridis_0.5.1 eventdataR_0.2.0 yaml_2.2.0
[49] stringi_1.4.3 plyr_1.8.4 shinyTime_1.0.0 grid_3.6.1
[53] parallel_3.6.1 promises_1.0.1 edeaR_0.8.2 crayon_1.3.4
[57] bupaR_0.4.2 miniUI_0.1.1.1 lattice_0.20-38 haven_2.1.1
[61] pander_0.6.3 summarytools_0.9.3 hms_0.5.0 magick_2.0
[65] zeallot_0.1.0 pillar_1.4.2 tcltk_3.6.1 igraph_1.2.4.1
[69] processmapR_0.3.3 codetools_0.2-16 XML_3.98-1.20 glue_1.3.1
[73] downloader_0.4 RcppParallel_4.4.3 modelr_0.1.4 vctrs_0.2.0
[77] httpuv_1.5.1 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[81] mime_0.7 skimr_1.0.7 xtable_1.8-4 broom_0.5.2
[85] later_0.8.0 viridisLite_0.3.0 Rook_1.1-1 DiagrammeR_1.0.1
[89] brew_1.0-6
Hi,
I was wondering which mining algorithm do you use to plot your process_map ?
And is there any way to change it ?
Thanks
Hello everyone,
I have a problem by using visNetwork as output format.
All the structure and colors getting lost by switching to this kind ob export format.
Even numbers on the edges get lost.
Do I habe to rebuild the design by switching to visnetwork or is this a bug?
Thanks for your help!
Greetings,
Niklas
Hey Guys,
after updating DiagrammeR to Version 1.0.0 I have a problem to create a process map with the processmapR package, wich is trying to use one of the contained functions:
Log2 %>%
process_map()
Error in set_global_graph_attrs(., attr = "rankdir", value = "LR", attr_type = "graph") :
could not find function "set_global_graph_attrs"
I get this error for event_logs with nrow > 97.
Error: syntax error in line 29 near '"'
What additional information do you need from me?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.