gertjanssenswillen / edear Goto Github PK
View Code? Open in Web Editor NEW!! repository moved to https://github.com/bupaverse/edeaR !! This repo is read-only from now one.
License: Other
!! repository moved to https://github.com/bupaverse/edeaR !! This repo is read-only from now one.
License: Other
Hi.
I'm trying to filter my event log to discard those cases longer than 10 weeks, and I found this situation.
> eventlog_ %>% filter_throughput_time(interval = c(10, NA), units = "week")
Error in mutate_impl(.data, dots) :
Evaluation error: invalid units specified.
Otherwise, if I use a different time unit, it works seamlessly.
> eventlog %>% filter_throughput_time(interval = c(10, NA), units = "days")
Event log consisting of:
4480 events
52 traces
684 cases
12 activities
4480 activity instances
# A tibble: 4,480 x 14
[...]
Is there anything I'm doing wrong?
Hi, I would like to suggest a new parameter option for the filter_activivty_presence()
function
Currently you can ony show data that pass on all activity from a list or only 1 activity of a list (all
or one_off
)
Adding a third option which filter data that only pass on the listed activites would be very helpfull.
Using
filter_activivty_presence(c("A","B","C","D"))
will filter a process passing in "A","B","A","C","D" and not show a process passing in "A","B","C","D","E"
The filter_precedence()
function puzzles me quite a bit (even after reading the available documentation).
When using this function, before creating a process_map()
, I would expect to see the same number of events as is listed in the precedence_matrix()
. However, that's not the case.
According to patients %>% precedence_matrix() %>% plot()
, there are 492 cases/events between Discuss Results and Check-out.
However, running patients %>% filter_precedence("Discuss Results", "Check-out") %>% case_labels()
shows only 3 cases.
After further analysis it appears that one must use the argument filter_method = "none"
to achieve the expected outcome: getting only those case which do have these precedence activities included in their trace. Forgetting this argument results in an, for me unintended and unexpected, outcome (which actually shows the opposite: cases having not the provided filter argument).
So my question is: could this be a bug or am I misunderstanding the purpose of this function?
Thx!
Hi Gert,
happy new year.
I'm working on bupaR and I noted some inconsistencies.
One of my issue last year was about the inconsistency of the information about the functions process_map() and precedence_matrix().
Now it's working well, but the information generated with the function start_activity() has not consistency yet.
Attached here some immages about the problem (Activities with letters - A, B, C).
Thanks for all.
edeaR/R/filter_endpoints_percentile.R
Line 22 in 695c0fd
This does not provide a list of cases anymore. Maybe changes in dplyr or somewhere else in edeaR?
The throughput_time
function shows some strange behaviour. I would assume that the following three code examples should produce the same output. However, all resulting quartiles and the mean are very different, except the Min. and the Max.
sepsis %>% throughput_time(level = "case") %>% summary()
sepsis %>% throughput_time(level = "case", append = TRUE) %>%
select(throughput_time_case, force_df = TRUE) %>% summary()
sepsis %>% throughput_time(level = "log") %>% summary()
Could there be a bug or do I misunderstand the function?
As an example of inconsistency, number_of_repetitions()
returns values ordered by resource whereas resource_frequency()
returns values ordered by count.
library(edeaR)
data(sepsis, package = "eventdataR")
number_of_repetitions(sepsis, level = "resource")
Using default type: all
## # resource_metric [26 ร 3]
## first_resource absolute relative
## <fct> <dbl> <dbl>
## 1 ? 0 0
## 2 A 0 0
## 3 B 1536 0.189
## 4 C 3 0.00285
## 5 D 0 0
## 6 E 0 0
## 7 F 16 0.0741
## 8 G 67 0.453
## 9 H 6 0.109
## 10 I 12 0.0952
resource_frequency(sepsis, level = "resource")
# A tibble: 26 x 3
## resource absolute relative
## <fct> <int> <dbl>
## 1 B 8111 0.533
## 2 A 3462 0.228
## 3 C 1053 0.0692
## 4 E 782 0.0514
## 5 ? 294 0.0193
## 6 F 216 0.0142
## 7 L 213 0.0140
## 8 O 186 0.0122
## 9 G 148 0.00973
## 10 I 126 0.00828
## # ... with 16 more rows
I think it make sense to have every resource metric return values in the same order. You could take the approach of dplyr::count()
and have a sort
argument that determines whether or not to sort the rows by count.
Hi I found this error useing level=trace on trace_length...
ciao=edeaR::trace_length(eventlog = evl, level="trace")
Error in eval(lhs, parent, parent) :
argument "eventlog" is missing, with no default
Actually, I gave "eventlog" parameter.
What I expected: When trimming to a specific time period is that events that are partly in the time period are also trimmed so that they stay in the result.
What I got: Events that are only partly in the trimmed period are discarded.
Why is this a problem?: We use trim mainly to slice a larger period in even parts so that we can measure what the total processing time is per part. This is only possible when events also sliced and attribute processing time to the right part. Which is why we need the sharper knife which also cuts the raisins in the cake.
Note: for the other filter options: "contained", "intersecting", "start", "complete" there is no problem.
It seems like the eventlog function is missing from the edeaR package, is it? I don't see it in the package or in the documention, there is only an eventlog_from_xes function. The eventlog function is referenced in the vignette on data preprocessing in the importing from csv section.
Thanks so much for this helpful R package! How would you suggest cleaning up events based on the lifecycle_id variable?
For example suppose I have an activity that should always have a "start" and "complete" event. However my event log is a bit messy and occasionally a case has only a "start" or only a "complete" event but not both. How would you suggest I filter activities to ensure that every activity has a "start" and "complete" event?
This seems similar to what the filter_precedence function does but I want filter events based on the ordering of the lifecycle_id within each activity.
I can create a reproducible example if that would be helpful.
This is currently not supported. It returns a function, which is confusing.
library(edeaR)
data(sepsis, package = "eventdataR")
n_reps <- number_of_repetitions(sepsis, level = "resource")
## Using default type: all
plot(n_reps)
## function (...)
## tags$p(...)
## <bytecode: 0x1022c4c50>
## <environment: namespace:htmltools>
This ought to show a bar plot of the absolute number of repetitions by resource (to match the behavior when level = "activity"
).
Hi, thanks for the bupaR package, it is very useful!
I am having an issue using start_activity() and activity_frequency() functions:
where evl is an eventlog object. The output is:
Any suggestion?
Thanks.
Have a look at patients %>% start_activities(level = "case")
. The output is as expected, it's just that the column label is a bit confusing (showing end_activity
).
Hello Gert,
I'm using the activity_frequency (level = "activity") and n_events ().
What I see is that n_events () has twice the number of events compared to activity_frequency (level = "activity") (perhaps due to the start/full state).
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.