zukkari / nirdizati-training-ui Goto Github PK

View Code? Open in Web Editor NEW

1.0 6.0 5.0 6.26 MB

Home Page: http://nirdizati.org/

License: GNU Lesser General Public License v3.0

Kotlin 93.08% JavaScript 2.98% HTML 0.07% Sass 3.88%

nirdizati-training-ui's Introduction

Nirdizati Training UI

Project made as a bachelors thesis in University of Tartu.

This project is a part of a bigger system called Nirdizati.

Source code for Nirdizati can be found here.

What is Nirdizati?

Nirdizati is an open-source web-based predictive process monitoring engine for running business processes. The dashboard is updated periodically based on incoming streams of events. However, unlike classical monitoring dashboards, Nirdizati does not focus on showing the current state of business process executions, but also their future state (e.g. when will each case finish). On the backend, Nirdizati uses predictive models pre-trained using data about historical process execution.

More information on this can be found here.

What is the goal of this project?

Nirdizati Training component provides possibility for user to upload his own logs in .XES or .CSV format, analyze them, construct models using different parameters and then depoy them into Nirdizati Runtime component.

Goal of this project is to remake UI for Nirdizati training component. As a result, Nirdizati Training UI will be remade into more user-friendly and intuitive system.

About this project

This project contains UI of predictive monitoring web application that can be found here

Setting up

Prerequisites

Currently application building process is reliant on a plugin that compiles SASS to CSS. Once this plugin is installed in your local maven repository, you can configure the application.

Configuration

The main configuration file is config.xml found in resources folder of the project. It contains various settings that can be changed when running the application. Most notable of those are directories that will be used by the application (found under the node directories node). Those should be configured to be existing paths on the filesystem, otherwise application will not be able to start up.

Note that the project has only been built using Java 8 and building is not tested against Java 9 and beyond.

Building

Application is built using Maven build system. Once Maven is installed on your system, you can package the project into a war package by running mvn package inside the root directory of the project. Please note that the application is relying on ZK EE repository, which requires an access key.

Deploying

Once application is built it can be deployed to a regular Java servlet container. We are running this on Tomcat 8.5, so other applications are not tested.

Student project contest info

Poster repository can be found here.

nirdizati-training-ui's People

Contributors

Stargazers

Watchers

Forkers

nirdizati bankplus

nirdizati-training-ui's Issues

Implement bar chart for prefix bucketing

Prefix bucketing is special in that sense that it produces 15 result files. This means that user has to be able to switch between those 15 files since they all contain different results.

Need to think of a way how to present it to user.

Make it possible to exclude some attributes from predictions

when choosing a column type (static vs dynamic, categorical vs numeric), enable another, 5th, option - "Do not use this column". If selected, a column will not be in dataset_params.json, therefore will not be used for predictions. This may be desirable if a dataset contains some values that are not known at runtime. I already exclude some columns based on hardcoded keywords, but there may be other columns with other names, in general

Implement status tracker

Currently there is Jobs page that displays status of jobs.

A more modern approach would be to have either a sidebar menu which displays status of simulations or a notification area which will asynchronously display user when a simulation has been completed.

Better data for validation charts

Remove outliers from the detailed results
Remove values where nr_events==1 (too low accuracy anyway)
Replace R2 with RMSE. possibly add other statistics

Be more strict with file names

Currently when user uploads the file it is, we don't check if it has any weird symbols in it. So currently user can put file name to ../../xxx.xxx and this means that the file will be created in 2 directories above specified directory. This is a known bug (feature) that should be fixed.

Add ID for jobs

For script side:
ID will be passed as parameter for the script, based on this ID training file should be found and then produced validation files should also start with this ID: e.g. ID_file.csv

For UI side:

Create ID generator (probably will use hash for ID)
Add ID as a parameter for job
Use ID when fetching logs
Use ID when calling script
Some kind of mechanism to clean training parameters directory and result directory

Add more information to job tracker view

Hyperparameters should be visible in job queue window.

Also @verenich promised to think of a solution for unique file names.

Investigate failure with case outcome prediction

Investigate why case prediction outcome fails.

Clustering does not generate grid when is selected alone.

When hyperparameter that is not a learner is selected and this parameter has properties then grid is not generated for this parameter. How should this be handled? Grids are only generated for learners and then other parameters that have properties such as clustering are added as additional row for those grids.

If nothing is selected and clustering is selected, how should this be handled? Should we generate grids for all learners then with e.g. 1 row if clustering is checked?

Add hyperparameters to learners

Add hyperparameters to learners.

Those will be set by default but user will be able to change them based on his preferences.

Based on those settings json will be generated and used when training the model.

Fields should be added to the training view based on learner method selected.

Regression method == Random forest -> n_estimators + max_features

Regression method == Gradient boosting -> max_features + gbm_learning_rate

Implement log overview page

On this page use will be able to review uploaded logs.

Page consists of menu in which user will be able to:

Choose a log to overview
Will be able see graphs based on:
- Active traces
- Active resources
- Event occurencies
- To group all graphs into a single view

Task on implementing graph displays will come later.

Page preview can be found here

Style training view grid

Add proper styling to training view.

Implement proper caption for hyperparameter fields.

Add XGboost to the list of predictors

this may increase prediction accuracy

Date parsing

Implement date parsing in Kotlin since regular Java way that was used before does not work.

Use default parameters for basic mode

With recent grid rework this needs to be reimplemented.

Add threshold selection

User should be able to select threshold for case outcome prediction parameter

Add models for case outcome (label) prediction

Add model visualisation

Add model visualization with Chart.js library.

Right now it was agreed that 3 plots will be used:

Scatter plot (true value vs predicted value) (implement hexagonal binning if possibe)
Linear chart (number of events vs MAE) (data should be in days, not seconds so divide result with 86400)
Histogramm (feature importance) (name vs importance) (horizontal)

Implement validation view

This view would provide user information about completed runs.

Also this page should provide ability for user to export trained models into runtime component.

Will create separate issue for model importing since I feel this will quite a task, since we need suitable format for runtime component to be able to use the model.

Prototype for this page can be found here

Learn why job hangs

Currently when executing script job hangs without any output. Need to learn and understand why that happens.

Add case outcome prediction

Currently, only train_bpi12 has label field

Change default parameters

change default parameters according to README

Investigate failure with XGBoost

Whenever job with XGBoost is run, following error is produced:

/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py:4281: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
(53792, 27)
(13491, 27)
  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
Bucketing prefixes...
Fitting pipeline for bucket 1...
Traceback (most recent call last):
  File "train.py", line 165, in <module>
    pipelines[bucket].fit(dt_train_bucket, train_y)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 248, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 213, in _fit
    **fit_params_steps[name])
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
    return self.func(*args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 739, in fit_transform
    for name, trans, weight in self._iter())
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 520, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/stanislav/git/nirdizati-training-ui/PredictiveMethods/transformers/AggregateTransformer.py", line 28, in transform
    dt_numeric = X.groupby(self.case_id_col)[self.num_cols].agg({'mean':np.mean, 'max':np.max, 'min':np.min, 'sum':np.sum, 'std':np.std})
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py", line 4281, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py", line 3714, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 461, in _aggregate
    result = _agg(arg, lambda fname,
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 429, in _agg
    result[fname] = func(fname, agg_how)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 462, in <lambda>
    agg_how: _agg_1dim(self._selection, agg_how))
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 410, in _agg_1dim
    raise SpecificationError("nested dictionary is ambiguous "
pandas.core.base.SpecificationError: nested dictionary is ambiguous in aggregation
<2017-12-22 22:23:01,872> <DEBUG> <SimulationJob.class:85> <Script finished running...>```

Implement tooltips

Implement tooltips for users, so when hovering something user would be explained what does this or that button do.

This will lower the learning curve for using this tool.

Add proper validation for user input

Currently validation for hyperparemeter fields is not really present.

When user enters incorrect value, then the value will reset to 0.0 or 0 depending on the field.
Ideally it would work such if that the user leaves field empty then the validator would say that field cannot be empty and disable construct model button, until all of the required fields are not filled.

Improve logging

Improve logging functionality so it would be easier to debug when bugs occur.

Implement basic training view

Implement a view that will give user a quick way to run a simulation.

Default settings that will be used currently are:

Encoding: frequency
Bucketing method: single
Regression method: random forest

Prototype for this page can be found here

This page should provide following functionality for user:

Select log for analysis
Select prediction type
Also should give possibility to switch to advanced settings

Replace close button with 'Visualize model' button

In job tracker close button should be moved to the right upper corner of the screen as an X sign.

This button should be replaced with 'Visualize model' button that will change the content of the page to 'Validation' in read only mode with job attributes as an active choice that was clicked.

Implement more flexible system for adding learner hyperparemeters

Current system is not flexible because it requires alot of hand work when adding new hyperparameters.

It should be that when you define a new paramter in config, it would automatically generate option menu for that option

Allow csv files when uploading file

Should allow csv file format as well as xes when uploading log.

Documentation

Code should be documented.

This would make application more maintainable and identify unneeded code snippets or places where the code should be changed.

Work on look and feel

Work on the look and feel of the application. Application should have attractive and modern look, not like a website from 90-s (e.g. www.ordi.ee)

Probably should look into SASS for ZK here (ZUSS). This would provide a better way and more modern way to write CSS.

Also with ZK transition animations are also possible to give application a smoother look.

Implement landing page

This will be the page that user lands on when connecting to the application for the first time.
It will give user ability to either upload log or continue with existing log and redirect user to appropriate page.

Use optimized parameters for basic mode if available

Should reimplement because of new grid model

Automatic generation of dataset configs

Currently, these files are manually created. To make a step closer to a self-service, we need to generate them automatically from the users' logs, either XES or CSV Example

Two dimensions:

static vs dynamic attributes - depending on whether they change throughout the case
numeric vs categorical

label_num_cols label_cat_cols indicate columns that need to be predicted

Use hyperoptimized parameters if available

When a user selects a training config (or it is pre-selected), we need to check if we have the optimal learner's hyperparameters for that config in optimal_params/{log_name_wo_extension}.json. If so, these parameters should be displayed in the box and passed to training_params/{log_name_wo_extension}.json . Otherwise, this file should be created based on the default parameters

Implement advanced settings view

This should provide same functionality as #7 and in addition to that:

User must be able to specify encoding
User must be able to specify bucketing method
User must be able to select regression method

More detailed specification regarding that can be found here and can change over time.

I will try to make this section as abstract as possible so it will be easy to add options. Probably will be an xml with values that can be added or removed when functionality is added.

Think of a better way to represent user columns when generating dataset conf

Currently columns are split into pages.
This is not the best way to represent the data so there should be a better way to do this.

Update job information after job is completed in details view

Currently user has to reopen, should be updated automatically.

Unit/Integration tests

Write unit tests to cover as much of code as possible.

Create selenium integration tests that can cover already implemented use cases.

add predictions of next activity

predict which activity will be executed next

Rework train grid

Reword train grid using new GridGenerator. Current grid is really buggy, this needs to be reworked.

Get acquainted with ZK

Work through documentation found here. This would give quick overview of the framework and main components.

Add n_clustering parameter

Add n clusters parameter when clustering bucketing method is selected.

Implement script runner

Since @verenich proposed that we should use his python scripts then I will need to implement a script runner that will run python scripts in shell with user supplied parameters and logs.

Also need to implement results fetcher that will be able to collect predictions and modify it to suitable format for user interface.

This will be the component that connects UI and training backend.

Replace modal window in job tracker

Modal window in job tracker should be replaced with interactive layout change:

When user clicks a job in the tracker then the content of the grid should show job metadata with buttons 'Visualize model' and 'Deploy to runtime'. When user clicks the content of grid again, job tracker will show jobs like before.

Think of a better way to manage predictive data

Dataset reference ("tag") should probably be got rid of, so:
Instead keeping all training configs in a single training_params.json consider one json per dataset, with the name possibly matching the name of the log. Same goes for dataset_params.json

Implement upload page

This page will give user ability to upload his logs.
How this page should can be seen here

On this page user will be able to:

Choose a log for uploading
Upload his log in .XES format
Given instructions how to convert his log to .XES format.

After successful upload user will be prompted to continue to overview log in appropriate view.

Rework job running using coroutines

Kotlin allows the user of coroutines. Those can be used to complete async tasks which means whole Worker class can be omitted and replaced with few lines of coroutine code.

Fix issues discussed in meeting

~~move buttons closer together (upload and continue)~~
~~move timestamp col to activity and caseid columns window~~
~~add more possibilities to parse date~~
~~threshhold > 20~~
~~resource should be always dynamic categorical~~
upload log modal make into a single window
~~do not show activity column in select columns grid~~
~~Dynamic categorical -> event attribute - categorical, numeric~~
~~Static -> case attribute -> categorical, numeric~~
~~Advanced mode -> Multiple models~~
Remove evertyhing from basic -> still use params
Do not show log if no json is found
~~Job tracker should scale better on smaller screens~~
~~Move buttons straight to tracker~~
~~Move log file to tracker~~
~~Actual vs predicted divide x by 86400~~
~~Mean average error -> mean absolute error~~
~~Optimize data generation chart generation~~