Coder Social home page Coder Social logo

nirdizati-training-ui's Introduction

Nirdizati Training UI

Project made as a bachelors thesis in University of Tartu.

This project is a part of a bigger system called Nirdizati.

Source code for Nirdizati can be found here.

What is Nirdizati?

Nirdizati is an open-source web-based predictive process monitoring engine for running business processes. The dashboard is updated periodically based on incoming streams of events. However, unlike classical monitoring dashboards, Nirdizati does not focus on showing the current state of business process executions, but also their future state (e.g. when will each case finish). On the backend, Nirdizati uses predictive models pre-trained using data about historical process execution.

More information on this can be found here.

What is the goal of this project?

Nirdizati Training component provides possibility for user to upload his own logs in .XES or .CSV format, analyze them, construct models using different parameters and then depoy them into Nirdizati Runtime component.

Goal of this project is to remake UI for Nirdizati training component. As a result, Nirdizati Training UI will be remade into more user-friendly and intuitive system.

About this project

This project contains UI of predictive monitoring web application that can be found here

Setting up

Prerequisites

Currently application building process is reliant on a plugin that compiles SASS to CSS. Once this plugin is installed in your local maven repository, you can configure the application.

Configuration

The main configuration file is config.xml found in resources folder of the project. It contains various settings that can be changed when running the application. Most notable of those are directories that will be used by the application (found under the node directories node). Those should be configured to be existing paths on the filesystem, otherwise application will not be able to start up.

Note that the project has only been built using Java 8 and building is not tested against Java 9 and beyond.

Building

Application is built using Maven build system. Once Maven is installed on your system, you can package the project into a war package by running mvn package inside the root directory of the project. Please note that the application is relying on ZK EE repository, which requires an access key.

Deploying

Once application is built it can be deployed to a regular Java servlet container. We are running this on Tomcat 8.5, so other applications are not tested.

Student project contest info

Poster repository can be found here.

nirdizati-training-ui's People

Contributors

dependabot[bot] avatar raboczi avatar verenich avatar yesanton avatar zukkari avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nirdizati-training-ui's Issues

Implement bar chart for prefix bucketing

Prefix bucketing is special in that sense that it produces 15 result files. This means that user has to be able to switch between those 15 files since they all contain different results.

Need to think of a way how to present it to user.

Make it possible to exclude some attributes from predictions

when choosing a column type (static vs dynamic, categorical vs numeric), enable another, 5th, option - "Do not use this column". If selected, a column will not be in dataset_params.json, therefore will not be used for predictions. This may be desirable if a dataset contains some values that are not known at runtime. I already exclude some columns based on hardcoded keywords, but there may be other columns with other names, in general

Implement status tracker

Currently there is Jobs page that displays status of jobs.

A more modern approach would be to have either a sidebar menu which displays status of simulations or a notification area which will asynchronously display user when a simulation has been completed.

Better data for validation charts

  • Remove outliers from the detailed results
  • Remove values where nr_events==1 (too low accuracy anyway)
  • Replace R2 with RMSE. possibly add other statistics

Be more strict with file names

Currently when user uploads the file it is, we don't check if it has any weird symbols in it. So currently user can put file name to ../../xxx.xxx and this means that the file will be created in 2 directories above specified directory. This is a known bug (feature) that should be fixed.

Add ID for jobs

For script side:
ID will be passed as parameter for the script, based on this ID training file should be found and then produced validation files should also start with this ID: e.g. ID_file.csv

For UI side:

  • Create ID generator (probably will use hash for ID)
  • Add ID as a parameter for job
  • Use ID when fetching logs
  • Use ID when calling script
  • Some kind of mechanism to clean training parameters directory and result directory

Clustering does not generate grid when is selected alone.

When hyperparameter that is not a learner is selected and this parameter has properties then grid is not generated for this parameter. How should this be handled? Grids are only generated for learners and then other parameters that have properties such as clustering are added as additional row for those grids.

If nothing is selected and clustering is selected, how should this be handled? Should we generate grids for all learners then with e.g. 1 row if clustering is checked?

Add hyperparameters to learners

Add hyperparameters to learners.

Those will be set by default but user will be able to change them based on his preferences.

Based on those settings json will be generated and used when training the model.

Fields should be added to the training view based on learner method selected.

Regression method == Random forest -> n_estimators + max_features

Regression method == Gradient boosting -> max_features + gbm_learning_rate

Implement log overview page

On this page use will be able to review uploaded logs.

Page consists of menu in which user will be able to:

  • Choose a log to overview
  • Will be able see graphs based on:
    • Active traces
    • Active resources
    • Event occurencies
    • To group all graphs into a single view

Task on implementing graph displays will come later.

Page preview can be found here

Date parsing

Implement date parsing in Kotlin since regular Java way that was used before does not work.

Add model visualisation

Add model visualization with Chart.js library.

Right now it was agreed that 3 plots will be used:

  • Scatter plot (true value vs predicted value) (implement hexagonal binning if possibe)
  • Linear chart (number of events vs MAE) (data should be in days, not seconds so divide result with 86400)
  • Histogramm (feature importance) (name vs importance) (horizontal)

Implement validation view

This view would provide user information about completed runs.

Also this page should provide ability for user to export trained models into runtime component.

Will create separate issue for model importing since I feel this will quite a task, since we need suitable format for runtime component to be able to use the model.

Prototype for this page can be found here

Learn why job hangs

Currently when executing script job hangs without any output. Need to learn and understand why that happens.

Investigate failure with XGBoost

Whenever job with XGBoost is run, following error is produced:

/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py:4281: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
(53792, 27)
(13491, 27)
  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
Bucketing prefixes...
Fitting pipeline for bucket 1...
Traceback (most recent call last):
  File "train.py", line 165, in <module>
    pipelines[bucket].fit(dt_train_bucket, train_y)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 248, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 213, in _fit
    **fit_params_steps[name])
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
    return self.func(*args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 739, in fit_transform
    for name, trans, weight in self._iter())
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 520, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/stanislav/git/nirdizati-training-ui/PredictiveMethods/transformers/AggregateTransformer.py", line 28, in transform
    dt_numeric = X.groupby(self.case_id_col)[self.num_cols].agg({'mean':np.mean, 'max':np.max, 'min':np.min, 'sum':np.sum, 'std':np.std})
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py", line 4281, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py", line 3714, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 461, in _aggregate
    result = _agg(arg, lambda fname,
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 429, in _agg
    result[fname] = func(fname, agg_how)
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 462, in <lambda>
    agg_how: _agg_1dim(self._selection, agg_how))
  File "/home/stanislav/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 410, in _agg_1dim
    raise SpecificationError("nested dictionary is ambiguous "
pandas.core.base.SpecificationError: nested dictionary is ambiguous in aggregation
<2017-12-22 22:23:01,872> <DEBUG> <SimulationJob.class:85> <Script finished running...>```

Implement tooltips

Implement tooltips for users, so when hovering something user would be explained what does this or that button do.

This will lower the learning curve for using this tool.

Add proper validation for user input

Currently validation for hyperparemeter fields is not really present.

When user enters incorrect value, then the value will reset to 0.0 or 0 depending on the field.
Ideally it would work such if that the user leaves field empty then the validator would say that field cannot be empty and disable construct model button, until all of the required fields are not filled.

Improve logging

Improve logging functionality so it would be easier to debug when bugs occur.

Implement basic training view

Implement a view that will give user a quick way to run a simulation.

Default settings that will be used currently are:

  • Encoding: frequency
  • Bucketing method: single
  • Regression method: random forest

Prototype for this page can be found here

This page should provide following functionality for user:

  • Select log for analysis
  • Select prediction type
  • Also should give possibility to switch to advanced settings

Replace close button with 'Visualize model' button

In job tracker close button should be moved to the right upper corner of the screen as an X sign.

This button should be replaced with 'Visualize model' button that will change the content of the page to 'Validation' in read only mode with job attributes as an active choice that was clicked.

Documentation

Code should be documented.

This would make application more maintainable and identify unneeded code snippets or places where the code should be changed.

Work on look and feel

Work on the look and feel of the application. Application should have attractive and modern look, not like a website from 90-s (e.g. www.ordi.ee)

Probably should look into SASS for ZK here (ZUSS). This would provide a better way and more modern way to write CSS.

Also with ZK transition animations are also possible to give application a smoother look.

Implement landing page

This will be the page that user lands on when connecting to the application for the first time.
It will give user ability to either upload log or continue with existing log and redirect user to appropriate page.

Automatic generation of dataset configs

Currently, these files are manually created. To make a step closer to a self-service, we need to generate them automatically from the users' logs, either XES or CSV Example

Two dimensions:

  • static vs dynamic attributes - depending on whether they change throughout the case
  • numeric vs categorical

label_num_cols label_cat_cols indicate columns that need to be predicted

Use hyperoptimized parameters if available

When a user selects a training config (or it is pre-selected), we need to check if we have the optimal learner's hyperparameters for that config in optimal_params/{log_name_wo_extension}.json. If so, these parameters should be displayed in the box and passed to training_params/{log_name_wo_extension}.json . Otherwise, this file should be created based on the default parameters

Implement advanced settings view

This should provide same functionality as #7 and in addition to that:

  • User must be able to specify encoding
  • User must be able to specify bucketing method
  • User must be able to select regression method

More detailed specification regarding that can be found here and can change over time.

I will try to make this section as abstract as possible so it will be easy to add options. Probably will be an xml with values that can be added or removed when functionality is added.

Unit/Integration tests

Write unit tests to cover as much of code as possible.

Create selenium integration tests that can cover already implemented use cases.

Rework train grid

Reword train grid using new GridGenerator. Current grid is really buggy, this needs to be reworked.

Get acquainted with ZK

Work through documentation found here. This would give quick overview of the framework and main components.

Implement script runner

Since @verenich proposed that we should use his python scripts then I will need to implement a script runner that will run python scripts in shell with user supplied parameters and logs.

Also need to implement results fetcher that will be able to collect predictions and modify it to suitable format for user interface.

This will be the component that connects UI and training backend.

Replace modal window in job tracker

Modal window in job tracker should be replaced with interactive layout change:

When user clicks a job in the tracker then the content of the grid should show job metadata with buttons 'Visualize model' and 'Deploy to runtime'. When user clicks the content of grid again, job tracker will show jobs like before.

Think of a better way to manage predictive data

  • Dataset reference ("tag") should probably be got rid of, so:
  • Instead keeping all training configs in a single training_params.json consider one json per dataset, with the name possibly matching the name of the log. Same goes for dataset_params.json

Implement upload page

This page will give user ability to upload his logs.
How this page should can be seen here

On this page user will be able to:

  • Choose a log for uploading
  • Upload his log in .XES format
  • Given instructions how to convert his log to .XES format.

After successful upload user will be prompted to continue to overview log in appropriate view.

Rework job running using coroutines

Kotlin allows the user of coroutines. Those can be used to complete async tasks which means whole Worker class can be omitted and replaced with few lines of coroutine code.

Fix issues discussed in meeting

  1. move buttons closer together (upload and continue)
  2. move timestamp col to activity and caseid columns window
  3. add more possibilities to parse date
  4. threshhold > 20
  5. resource should be always dynamic categorical
  6. upload log modal make into a single window
  7. do not show activity column in select columns grid
  8. Dynamic categorical -> event attribute - categorical, numeric
  9. Static -> case attribute -> categorical, numeric
  10. Advanced mode -> Multiple models
  11. Remove evertyhing from basic -> still use params
  12. Do not show log if no json is found
  13. Job tracker should scale better on smaller screens
  14. Move buttons straight to tracker
  15. Move log file to tracker
  16. Actual vs predicted divide x by 86400
  17. Mean average error -> mean absolute error
  18. Optimize data generation chart generation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.