August 3, 2017

Nirdizati Training

Authors: Stanislav Mõškovski, Ilya Verenich, Simon Raboczi, Marlon Dumas, Marcello La Rosa

The Training plugin allows users to generate predictive models for different types of predictions. Specifically, it is able to predict generic process properties, such as remaining time until case completion, the next most likely activity to be executed and whether a case will take longer than a user-defined time threshold. Additionally, the tool can build models to predict log-specific case properties, for example, the total application cost in an insurance claims handling process.

Note. Before importing your log into Apromore, it needs to be pre-processed to add all the required attributes. Read more here.

  • As a minimum input, a user only needs to select the event log and the variable to be predicted.

  • Experienced users may switch to the advanced mode to fine-tune training configuration and even train multiple models at once.

  • Once the necessary models have been built, the tool assesses their accuracy with respect to multiple evaluation metrics using a held-out validation set.

  • Trained models are saved in the Apromore database and can be pushed to the Runtime component to make predictions for ongoing cases.

A screencast of this plugin can be found here.

For non-Apromore users, a stand-alone version of the plugin can be accessed at https://training.nirdizati.org


Architecture

When a user uploads their log, the tool extracts and categorizes data attributes of the log. In order to properly construct feature vectors from business process traces, the attributes need to be categorized into static case attributes and dynamic event attributes. On the other hand, each attribute needs to be designated as either numeric or categorical. These procedures are performed automatically upon the log uploading. Nevertheless, the user is given an option to override the automatic attribute definitions.

The log is then internally split into training and validation set. The former is used to train the model, while the latter is used to evaluate the predictive power of the model. Next, all traces of a business process need to be represented as fixed-size feature vectors in order to train a predictive model. To this end, we support four encoding techniques proposed in related work, namely last state encoding, frequency (aggregation) encoding, combined encoding and lossless index-based encoding.

While some of existing predictive process monitoring approaches train a single classifier on the whole event log, others employ a multi-classifier approach by dividing the prefix traces in the historical log into several buckets and fitting a separate classifier for each such bucket. At run-time, the most suitable bucket for the ongoing case is determined and the respective classifier is applied to make a prediction. We support four types of bucketing: zero bucketing (i.e. fitting a single classifier), state-based bucketing, clustering-based bucketing and prefix length-based bucketing.

For each bucket of feature vectors, we train a predictive model using one of four supported machine learning techniques: decision tree, random forest, gradient boosting and extreme gradient boosting (XGBoost). For each technique, a user may manually enter the values of the most important hyperparameters. For example, when fitting a gradient boosting model, a user may choose the number of weak learners (trees), the number of features to be used for each split and the learning rate.

The predictive power of the trained model(s) can be evaluated on a held-out validation set. By default, a user will see the average accuracy across all partial traces after a certain number of events have completed.


Supported predictions

Currently, the following prediction types are available in Nirdizati Training:

  • Remaining time prediction
  • Next activity prediction
  • Outcome prediction
  • Various other “static” case attributes, e.g. total cost