Coder Social home page Coder Social logo

edear's People

Contributors

bbrewington avatar fmannhardt avatar gertjanssenswillen avatar marijkeswennen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

edear's Issues

`filter_throughput_time` fails when using `week` units

Hi.

I'm trying to filter my event log to discard those cases longer than 10 weeks, and I found this situation.

> eventlog_ %>% filter_throughput_time(interval = c(10, NA), units = "week")
Error in mutate_impl(.data, dots) : 
  Evaluation error: invalid units specified.

Otherwise, if I use a different time unit, it works seamlessly.

> eventlog %>% filter_throughput_time(interval = c(10, NA), units = "days")
Event log consisting of:
4480 events
52 traces
684 cases
12 activities
4480 activity instances

# A tibble: 4,480 x 14
[...]

Is there anything I'm doing wrong?

Add method "only" for filter_activivty_presence()

Hi, I would like to suggest a new parameter option for the filter_activivty_presence() function
Currently you can ony show data that pass on all activity from a list or only 1 activity of a list (all or one_off)
Adding a third option which filter data that only pass on the listed activites would be very helpfull.
Using
filter_activivty_presence(c("A","B","C","D"))
will filter a process passing in "A","B","A","C","D" and not show a process passing in "A","B","C","D","E"

filter_precedence()

The filter_precedence() function puzzles me quite a bit (even after reading the available documentation).

When using this function, before creating a process_map(), I would expect to see the same number of events as is listed in the precedence_matrix(). However, that's not the case.

According to patients %>% precedence_matrix() %>% plot(), there are 492 cases/events between Discuss Results and Check-out.
However, running patients %>% filter_precedence("Discuss Results", "Check-out") %>% case_labels() shows only 3 cases.

After further analysis it appears that one must use the argument filter_method = "none" to achieve the expected outcome: getting only those case which do have these precedence activities included in their trace. Forgetting this argument results in an, for me unintended and unexpected, outcome (which actually shows the opposite: cases having not the provided filter argument).

So my question is: could this be a bug or am I misunderstanding the purpose of this function?

Thx!

Different activity orderings in case of identical timestamps

Hi Gert,

happy new year.

I'm working on bupaR and I noted some inconsistencies.

One of my issue last year was about the inconsistency of the information about the functions process_map() and precedence_matrix().

Now it's working well, but the information generated with the function start_activity() has not consistency yet.

Attached here some immages about the problem (Activities with letters - A, B, C).

Thanks for all.

map

matrix

start

throughput_time with strange output. Bug?

The throughput_time function shows some strange behaviour. I would assume that the following three code examples should produce the same output. However, all resulting quartiles and the mean are very different, except the Min. and the Max.

sepsis %>% throughput_time(level = "case") %>% summary()


sepsis %>% throughput_time(level = "case", append = TRUE) %>% 
    select(throughput_time_case, force_df = TRUE) %>% summary()


sepsis %>% throughput_time(level = "log") %>% summary()

Could there be a bug or do I misunderstand the function?

Be consistent about row ordering for resource metrics

As an example of inconsistency, number_of_repetitions() returns values ordered by resource whereas resource_frequency() returns values ordered by count.

library(edeaR)
data(sepsis, package = "eventdataR")
number_of_repetitions(sepsis, level = "resource")
Using default type: all
## # resource_metric [26 ร— 3]
## first_resource absolute relative
## <fct>             <dbl>    <dbl>
##   1 ?                     0  0      
## 2 A                     0  0      
## 3 B                  1536  0.189  
## 4 C                     3  0.00285
## 5 D                     0  0      
## 6 E                     0  0      
## 7 F                    16  0.0741 
## 8 G                    67  0.453  
## 9 H                     6  0.109  
## 10 I                    12  0.0952 
resource_frequency(sepsis, level = "resource")
# A tibble: 26 x 3
## resource absolute relative
## <fct>       <int>    <dbl>
##   1 B            8111  0.533  
## 2 A            3462  0.228  
## 3 C            1053  0.0692 
## 4 E             782  0.0514 
## 5 ?             294  0.0193 
## 6 F             216  0.0142 
## 7 L             213  0.0140 
## 8 O             186  0.0122 
## 9 G             148  0.00973
## 10 I             126  0.00828
## # ... with 16 more rows

I think it make sense to have every resource metric return values in the same order. You could take the approach of dplyr::count() and have a sort argument that determines whether or not to sort the rows by count.

error when setting trace_length @level='trace'

Hi I found this error useing level=trace on trace_length...

ciao=edeaR::trace_length(eventlog = evl, level="trace")
Error in eval(lhs, parent, parent) :
argument "eventlog" is missing, with no default

Actually, I gave "eventlog" parameter.

Sharper knives for filter_time_period in combination with trim

What I expected: When trimming to a specific time period is that events that are partly in the time period are also trimmed so that they stay in the result.

What I got: Events that are only partly in the trimmed period are discarded.

Why is this a problem?: We use trim mainly to slice a larger period in even parts so that we can measure what the total processing time is per part. This is only possible when events also sliced and attribute processing time to the right part. Which is why we need the sharper knife which also cuts the raisins in the cake.

Note: for the other filter options: "contained", "intersecting", "start", "complete" there is no problem.

eventlog function

It seems like the eventlog function is missing from the edeaR package, is it? I don't see it in the package or in the documention, there is only an eventlog_from_xes function. The eventlog function is referenced in the vignette on data preprocessing in the importing from csv section.

Filter events based on lifecycle_id logic

Thanks so much for this helpful R package! How would you suggest cleaning up events based on the lifecycle_id variable?

For example suppose I have an activity that should always have a "start" and "complete" event. However my event log is a bit messy and occasionally a case has only a "start" or only a "complete" event but not both. How would you suggest I filter activities to ensure that every activity has a "start" and "complete" event?

This seems similar to what the filter_precedence function does but I want filter events based on the ordering of the lifecycle_id within each activity.

I can create a reproducible example if that would be helpful.

Add a plot for number_of_repetitions when type is all and level is resource

This is currently not supported. It returns a function, which is confusing.

library(edeaR)
data(sepsis, package = "eventdataR")
n_reps <- number_of_repetitions(sepsis, level = "resource")
## Using default type: all
plot(n_reps)
## function (...) 
##   tags$p(...)
## <bytecode: 0x1022c4c50>
##   <environment: namespace:htmltools>

This ought to show a bar plot of the absolute number of repetitions by resource (to match the behavior when level = "activity").

start_activity and activity_frequency

Hi, thanks for the bupaR package, it is very useful!
I am having an issue using start_activity() and activity_frequency() functions:

  1. evl %>% start_activities(level = "case")
  2. evl %>% activity_frequency("case")

where evl is an eventlog object. The output is:

  1. a data frame with only one row;
  2. never stop running.

Any suggestion?

Thanks.

activity_frequency(level = "activity") vs n_events()

Hello Gert,

I'm using the activity_frequency (level = "activity") and n_events ().

What I see is that n_events () has twice the number of events compared to activity_frequency (level = "activity") (perhaps due to the start/full state).

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.