gertjanssenswillen / processmapr Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 9.0 3.1 MB

!! repository moved to https://github.com/bupaverse/processmapR !! This repo is read-only from now one.

License: Other

R 100.00%

processmapr's People

Contributors

Stargazers

Watchers

Forkers

iboland strategist922 marijkeswennen bijst wfeijenoi fmannhardt wfeijen cba987

processmapr's Issues

Custom attribute fails with NA's

dotted_chart.grouped_eventlog does not work

When trying dotted_chart on a grouped event log, the grouping columns are missing in the plot data. For example:

sepsis %>% group_by(resource) %>% dotted_chart()
Error: At least one layer must contain all faceting variables: `resource`.
* Plot is missing `resource`
* Layer 1 is missing `resource`

pass render=TRUE

I am using the process_map within an R MARKDOWN... When there are too many cases the code wait untill user confirms he wants to plot regardless possible ununderstandable graph. Is it possible to pass programmatically the confirmation, e.g., using a parameter lot_of_traces_behavious = c("ask", "N", Y")

Thanks in advance for the attention

myevnlog %>% process_map()
You are about to draw a process map with a lot of traces.
        This might take a long time. Try to filter your event log. Are you sure you want to proceed?
Y/N: Y
Warning messages:
1: In bind_rows_(x, .id) :
  binding factor and character vector, coercing into character vector
2: In bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector

Sort Dotted Chart on all activities

dotted_chart(x = "absolute", y = "start") or plotly_dotted_chart() work well, if the start timestamps of the cases are all different. However, sometimes many cases start at the same time, if there's some batch behaviour. For these cases, the dotted_chart() function seems to arrange the cases according to their position in the log or even in reverse order. This can produce strange graphs, such as this one (see the area marked with the red ellipse):

If several cases start at the same time, the dotted chart with the "start"-argument should arrange them according to the timestamp of the second activities in the case, then the third activities, etc. The phenomenon is particularly relevant, if the timestamps are not very granular and only consist of dates.

Or do I somehow have to prearrange the cases in the log, before using the Dotted Chart?

process_map() issue

Hi Gert,

Thank you for all these fundamental improvements of the bupaR package.

I have a problem with the process_map () function: when I try to produce a process map, RStudio's Viewer reports this error "Error: syntax in row 37 near" ("." I found that this occurs when the size of my data increases (for example, by moving filter_activity_frequency from 0.1 to 0.2).

Do you have some advice?

Thank you

Process Map ends up backwards

Wanted to officially document what I saw here in case anyone else sees similar issue. The minute I execute devtools::install_github("gertjanssenswillen/processmapr", dependencies=TRUE), I end up with the map essentially upside down (start is still on top and end on bottom). Below was the code I attempted to execute:

data.map2 <- data %>%
  process_map(type_nodes = frequency(value = "absolute_case"), type_edges = performance(FUN = median, units = "mins"))

Along with the following error:
Error in data.frame(id = 1:n, from = from, to = to, rel = rel, stringsAsFactors = FALSE) : arguments imply differing number of rows: 2, 0 In addition: Warning messages: 1: In bind_rows_(x, .id) : binding factor and character vector, coercing into character vector 2: In bind_rows_(x, .id) : binding character and factor vector, coercing into character vector 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf

could not find function "set_global_graph_attrs"

Running the command patients %>% process_map() results in the following error:

Error in set_global_graph_attrs(., attr = "rankdir", value = "LR", attr_type = "graph") : could not find function "set_global_graph_attrs"

I'm using the latest packages of bupaR (0.3.2) and DiagrammeR (1.0.0).

According to DiagrammeR issue #277 it appears that this function has been replaced by add_global_graph_attrs.

EDIT: this issue also occurs with function resource_map()

plotly_dotted_chart for relative view

Thanks for your continuing development of bupaR!

Here's a suggestion for the plotly_dotted_chart()function.

Log %>% plotly_dotted_chart() seems to only work without arguments.

If I enter plotly_dotted_chart(x = "relative", sort = "duration") I get an error, saying there's an unused argument.

Suggestion 1: Make plotly_dotted_chart available for relative dotted charts as well. This would be very useful, since the user can then get the Case-ID from long-running cases directly from the graph and investigate specific cases further in the raw data.

Suggestion 2: Create a function plotly_trace_explorer. This would be useful too, since the trace_explorer graph is sometimes hard to read if there are many traces, due to the shortening of activity names. The tooltips from plotly would alleviate this.

I'm using the CRAN-versions (v 0.4.1 from bupaR and v 0.3.2 from processmapR).

process_map()

Hi Gert,

as you have already intended, I am using bupaR for a project. I noted some incongruences when I use the process_map() function and the precedence_matrix(type = "absolute") function: it seems that the arcs in process_map doesn't fit well the same information showed by precedence_matrix.

Is it possible?

Thanks

precedence_matrix

Hi, Gert. I'm using your useful package and I have an isseu on processmapR.

The precedence_matrix function return the following error:
Error in mutate_impl(.data, dots) :
Not compatible with requested type: [type=character; target=integer]

Thanks.

trace_explorer: Nr of traces attribute as alternative to coverage

Please remove it from process_map

There is a prompt when one asks for generation of a processmap with more than 750 traces. The problem is that this prompt stops knittr from doing its work. Please remove it from the function.

The next code creates the problem:

if(n_traces(eventlog) > 750) {
message("You are about to draw a process map with a lot of traces.
This might take a long time. Try to filter your event log. Are you sure you want to proceed?")
answer <- readline("Y/N: ")

if(answer != "Y")
	break()

}

process_map output does not have nodes in rankdir order

Process_map output doesn't appear to layout the graph correctly, e.g.

event_log %>%
process_map(
rankdir = "RL"
)

produces a graph where the nodes seem to be randomly all over the place, and not in order from left-to-right for the most frequent path.

Understand that this is probably caused by DiagrammeR, and maybe I could modify the DiagrammR Graph object myself to obtain a more intuitive rankdir. Have you seen this before?

Negative durations for overlapping/parallel activity periods

Hello, I have a couple questions in regards to how best to handle activities that occur at the same time as other activities or overlap with other activities.

I'm trying to convert a series of customer and staff activities that can occur as part of a case, but often they can be created and/or completed at the same time.

Case gets created when customer fills out details in an online form
Based on details customer provides in the form, 3 activities get created for the customer to provide bank statements, proof of address and recent payslips with the same creation timestamp.
Customer can provide these complete these separately or all at once, and on customer activity completion it triggers the creation of staff activities to verify the documents.
As part of this verification the staff can then create a new activity requesting further documents, or book an appointment with the customer
The staff can complete all the activities and grant the application which results in all the staff activities having same completion timestamp

All in all its quite a complex process with a variety of activities and users (Over 100 different activity types) but when filtering the frequency of the activities and then trying to visualise the performance process maps I seem to get a large amount of negative durations on the edges between activities on which I don't know how I should be handling.

My questions are:

Am I just stupidly doing something wrong when I generate the event log?
How do you recommend i approach a scenario like this?
Is there anything I could do to ensure I always have a positive duration between edges?

I've replicated a similar issue in one of repos using the loan application event log data set which is available here jessevent/loan-app-process. Funnily enough it also causes processanimateR tokens to traverse backwards and float off of edges to different activities which is actually how I first identified I was experiencing something odd.

This is essentially the format/code i'm using to transform my activity instances into the event log.

example_log_4 %>%
    mutate(activity_instance = 1:nrow(.)) %>%
    gather(status, timestamp, schedule, start, complete)  %>%
    filter(!is.na(timestamp)) %>%
    eventlog(
        case_id = "patient",
        activity_id = "activity",
        activity_instance_id = "activity_instance",
        lifecycle_id = "status",
        timestamp = "timestamp",
        resource_id = "resource"
    )

Thanks so much for any assistance, the whole bupaR framework is an outstanding piece of work and an amazing achievement. Personally i've spent a long time looking for a framework like this and am quite excited with the progress and future to come!

process_map fails on data frame containing a column named 'time'

When supplying an event log with a (possibly) unrelated column names 'time' process_map fails with an error:

> process_map(log)
Error in summarise_impl(.data, dots) : 
  Column `time` must have a unique name
> traceback()
13: stop(list(message = "Column `time` must have a unique name", 
        call = summarise_impl(.data, dots), cppstack = list(file = "", 
            line = -1L, stack = "C++ stack not available on this system")))
12: summarise_impl(.data, dots)
11: summarise.tbl_df(., start_time = min(time), end_time = max(time), 
        min_order = min(.order))
10: summarize(., start_time = min(time), end_time = max(time), min_order = min(.order))
9: function_list[[k]](value)
8: withVisible(function_list[[k]](value))
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: grouped_log %>% summarize(start_time = min(time), end_time = max(time), 
       min_order = min(.order)) at process_map.R#91
1: process_map(log)

Code to reproduce:

x <- data.frame(case = c(1), time = c("foobar"), timestamp = c(as.POSIXct(Sys.time())), activity = c("test"), activity_instance_id = c(1), resource = c("bar"), lifecyle = "complete")
log <- eventlog(x, case_id = "case", timestamp = "timestamp", activity_id = "activity", activity_instance_id = "activity_instance_id", resource_id = "resource", lifecycle_id = "lifecyle")
process_map(log)

Session info:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Norwegian Bokmål_Norway.1252  LC_CTYPE=Norwegian Bokmål_Norway.1252    LC_MONETARY=Norwegian Bokmål_Norway.1252 LC_NUMERIC=C                            
[5] LC_TIME=Norwegian Bokmål_Norway.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2         petrinetR_0.2.0        processmonitR_0.1.0    xesreadR_0.2.2         processmapR_0.3.2.9000 eventdataR_0.2.0       edeaR_0.8.1           
 [8] bupaR_0.4.1            forcats_0.3.0          stringr_1.3.1          dplyr_0.7.6            purrr_0.2.5            readr_1.1.1            tidyr_0.8.1           
[15] tibble_1.4.2           ggplot2_3.0.0          tidyverse_1.2.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18       lubridate_1.7.4    lattice_0.20-35    visNetwork_2.0.4   utf8_1.1.4         assertthat_0.2.0   digest_0.6.16      mime_0.5          
 [9] R6_2.2.2           cellranger_1.1.0   plyr_1.8.4         backports_1.1.2    httr_1.3.1         pillar_1.3.0       rlang_0.2.2        lazyeval_0.2.1    
[17] readxl_1.1.0       shinyTime_0.2.1    rstudioapi_0.7     data.table_1.11.4  miniUI_0.1.1.1     DiagrammeR_1.0.0   downloader_0.4     htmlwidgets_1.2   
[25] igraph_1.2.2       munsell_0.5.0      shiny_1.1.0        broom_0.5.0        compiler_3.5.1     influenceR_0.1.0   rgexf_0.15.3       httpuv_1.4.5      
[33] modelr_0.1.2       pkgconfig_2.0.2    htmltools_0.3.6    tidyselect_0.2.4   gridExtra_2.3      XML_3.98-1.16      fansi_0.3.0        viridisLite_0.3.0 
[41] crayon_1.3.4       withr_2.1.2        later_0.7.3        grid_3.5.1         nlme_3.1-137       jsonlite_1.5       xtable_1.8-2       gtable_0.2.0      
[49] magrittr_1.5       scales_1.0.0       cli_1.0.0          stringi_1.2.4      viridis_0.5.1      promises_1.0.1     ggthemes_4.0.1     xml2_1.2.0        
[57] brew_1.0-6         RColorBrewer_1.1-2 tools_3.5.1        glue_1.3.0         hms_0.4.2          Rook_1.1-1         yaml_2.2.0         colorspace_1.3-2  
[65] rvest_0.3.2        plotly_4.8.0       bindr_0.1.1        haven_1.1.2

add QUANTILE performance

Is it possible to integrate also the QUANTILE metric inside performance()?

As showed in the example below, I'm hoping to get from process_map or other graphs, the quantile performance evaluation.
The classic 3rd, 1st qualtile for example, or better, the prefered quantile.
For example the number 0.32 for 32nd occurrence.
Even better if avaiable for both, nodes and edges.

patients %>%
process_map(performance(quantile, 0,32 , "days"))

Hope you the best
@gertjanssenswillen

Edges are stacked when using fixed_node_pos

Hi,

I tried the recently added fixed_node_pos parameter, and it worked fine with the patient example in another issue but when I tried it with my data that have self edge or multiple edges going to multiple nodes, everything get stacked and the edge values get behind.
Any way to control the edges ? Maybe adding more curve to the edges would fix this.

Edit; this is how it is by default

But even when trying another kind of disposition, it still make straight edges:

precedence_matrix with performance views

The precedence_matrix() is sometimes the better alternative than a process_map()in case of a "spaghetti-process" with many variants. It would therefore be useful to have a performance view for the precedence diagrams too.

This would have the same syntax as the process_map function, e.g.

precedence_matrix(performance(median, "days")) and its other options.

process_map() new problem after installing processmapR from Github

Hi, after installing the new version of processmapR from Github I got the following error message (see below), I have just used this code to test:

library(bupaR)
patients %>%
    process_map(type = frequency("relative"))

Error in FUN(X[[i]], ...): object '.order' not found
Traceback:

1. patients %>% process_map(type = frequency("relative"))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. process_map(., type = frequency("relative"))
10. eventlog %>% as.data.frame() %>% droplevels %>% select(act = !(!activity_id_(eventlog)), 
  .     aid = !(!activity_instance_id_(eventlog)), case = !(!case_id_(eventlog)), 
  .     time = !(!timestamp_(eventlog)), .order) %>% group_by(act, 
  .     aid, case) %>% summarize(start_time = min(time), end_time = max(time), 
  .     min_order = min(.order))
11. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
12. eval(quote(`_fseq`(`_lhs`)), env, env)
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. `_fseq`(`_lhs`)
15. freduce(value, `_function_list`)
16. function_list[[i]](value)
17. select(., act = !(!activity_id_(eventlog)), aid = !(!activity_instance_id_(eventlog)), 
  .     case = !(!case_id_(eventlog)), time = !(!timestamp_(eventlog)), 
  .     .order)
18. select.data.frame(., act = !(!activity_id_(eventlog)), aid = !(!activity_instance_id_(eventlog)), 
  .     case = !(!case_id_(eventlog)), time = !(!timestamp_(eventlog)), 
  .     .order)
19. select_vars(names(.data), !(!(!quos(...))))
20. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
21. map(.x[matches], .f, ...)
22. lapply(.x, .f, ...)
23. FUN(X[[i]], ...)

Enlarging nodes in process map

Hi,

This is an awesome package! Thank you! I have a very large process map and cannot see the individual nodes. Is there a way to increase their size? I have tried to use visnetwork and end up with a cool ball of sphaghetti.. Any suggestions would be greatly appreciated!!! Thank you!!

Sincerely,

tom

The Start Node Wants to be on Top

The placement of the "Start" node isn't ideal when event logs have a fair amount of trace variants.
Is there a way to coerce the Start node to the top?

Secondary metric for the process_map

It would be nice to have a secondary metric in the process_map function to show both frequencies and waiting times on the edges. The secondary metric could have a slightly smaller font size (see Disco).

process_map(performance(sum, "days")) is an alternative to show the severity of waiting times, but a secondary metric would provide even more context.

Extended RAW DATA parameter

Hy,

I'm guessing if is possible to add 2 more intresting features for the processmap and trace_explorer functions.

1. process_map
Could be usefull to extract the raw.data behind the plot, in a raw form, usable for furthers manipulations and so on. Some processmap (raw.data=T), that provide back a data.frame with all the data strucutre behind the plot (edges and nodes values), maybe inside a data.frame.

2. trace_explorer
Also there, a raw.data parameter has been allready provided. But instead of return just the first occurrence for every trace, I'm looking for a full raw data.frame return.
I'm meaning that now, if two patients A and B, following the user guide, make the same trace for example 1-2-3, only the trace regarding the patient A, is returned back. Is it possible to extend the raw.data to all the patients?

Hope you the best,
Antonio

precedence_matrix() types should be hyphen-separated

The edeaR resource metric levels are hyphen separated, for example "resource-activity".

precedence_matrix() types are underscore separated, for example, "relative_antecedent".

This should be made consistent.

Incorrect scale label on precedence matrix plot when type is absolute

When the type is absolute, the scale label should read Absolute Frequency but it actually reads Relative Frequency.

See https://github.com/gertjanssenswillen/processmapR/blob/master/R/precedence_matrix.plot.R#L29

process_map()

Hi Gert,

I'm using the process_map() function. I'm not able to understand why the activity is sometimes printed white and sometimes printed black when the node in process map is colored white/pink.

The problem is that it is impossible to read activity in some cases .

Thanks.

Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0

Hi!.
process_map give warning concerning new conception in rlang:

m2 %>%
+   processmapR::process_map(type = processmapR::frequency("relative_case"))

Warning message:
Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0.
Please use the non-prefixed form or `!!` instead.
  # Bad:
  rlang::expr(mean(rlang::UQ(var) * 100))
  # Ok:
  rlang::expr(mean(UQ(var) * 100))
  # Good:
  rlang::expr(mean(!!var * 100))
This warning is displayed once per session.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rTRNG_4.20-1      anytime_0.3.4     data.table_1.12.2 tictoc_1.0        datapasta_3.0.0  
 [6] tidylog_0.1.0     forcats_0.4.0     stringr_1.4.0     dplyr_0.8.3       purrr_0.3.2      
[11] readr_1.3.1       tidyr_0.8.3       tibble_2.1.3      ggplot2_3.2.0     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] nlme_3.1-140       bitops_1.0-6       matrixStats_0.54.0 lubridate_1.7.4   
 [5] RColorBrewer_1.1-2 httr_1.4.0         ggsci_2.9          profvis_0.3.6     
 [9] tools_3.6.1        backports_1.1.4    R6_2.4.0           lazyeval_0.2.2    
[13] colorspace_1.4-1   withr_2.1.2        tidyselect_0.2.5   gridExtra_2.3     
[17] compiler_3.6.1     cli_1.1.0          rvest_0.3.4        xml2_1.2.0        
[21] influenceR_0.1.0   plotly_4.9.0       scales_1.0.0       checkmate_1.9.4   
[25] RApiDatetime_0.0.4 digest_0.6.20      pkgconfig_2.0.2    htmltools_0.3.6   
[29] htmlwidgets_1.3    rlang_0.4.0        ggthemes_4.2.0     readxl_1.3.1      
[33] rstudioapi_0.10    pryr_0.1.4         shiny_1.3.2        visNetwork_2.0.7  
[37] generics_0.0.2     zoo_1.8-6          jsonlite_1.6       rgexf_0.15.3      
[41] RCurl_1.95-4.12    magrittr_1.5       rapportools_1.0    Rcpp_1.0.1        
[45] munsell_0.5.0      viridis_0.5.1      eventdataR_0.2.0   yaml_2.2.0        
[49] stringi_1.4.3      plyr_1.8.4         shinyTime_1.0.0    grid_3.6.1        
[53] parallel_3.6.1     promises_1.0.1     edeaR_0.8.2        crayon_1.3.4      
[57] bupaR_0.4.2        miniUI_0.1.1.1     lattice_0.20-38    haven_2.1.1       
[61] pander_0.6.3       summarytools_0.9.3 hms_0.5.0          magick_2.0        
[65] zeallot_0.1.0      pillar_1.4.2       tcltk_3.6.1        igraph_1.2.4.1    
[69] processmapR_0.3.3  codetools_0.2-16   XML_3.98-1.20      glue_1.3.1        
[73] downloader_0.4     RcppParallel_4.4.3 modelr_0.1.4       vctrs_0.2.0       
[77] httpuv_1.5.1       cellranger_1.1.0   gtable_0.3.0       assertthat_0.2.1  
[81] mime_0.7           skimr_1.0.7        xtable_1.8-4       broom_0.5.2       
[85] later_0.8.0        viridisLite_0.3.0  Rook_1.1-1         DiagrammeR_1.0.1  
[89] brew_1.0-6

Which mining algorithm is used ?

Hi,

I was wondering which mining algorithm do you use to plot your process_map ?
And is there any way to change it ?

Thanks

Problems by using visNetwork as output format

Hello everyone,

I have a problem by using visNetwork as output format.
All the structure and colors getting lost by switching to this kind ob export format.
Even numbers on the edges get lost.

Do I habe to rebuild the design by switching to visnetwork or is this a bug?
Thanks for your help!

Greetings,
Niklas

could not find function "set_global_graph_attrs"

Hey Guys,

after updating DiagrammeR to Version 1.0.0 I have a problem to create a process map with the processmapR package, wich is trying to use one of the contained functions:

Log2 %>%
process_map()
Error in set_global_graph_attrs(., attr = "rankdir", value = "LR", attr_type = "graph") :
could not find function "set_global_graph_attrs"

process_map

Hi, Gert. I found that process_map function doesn't work well when type = perfomance().

Nodes are numbers instead of "actions" and the starting node seems to be a final node.

Error: syntax error in line 29 near '"'

I get this error for event_logs with nrow > 97.

Error: syntax error in line 29 near '"'

What additional information do you need from me?

gertjanssenswillen / processmapr Goto Github PK

processmapr's People

Contributors

Stargazers

Watchers

Forkers

processmapr's Issues

Recommend Projects

Recommend Topics

Recommend Org