Dealing With Concept Drifts in Process Mining
Dealing With Concept Drifts in Process Mining
Although most business processes change over time, contemporary process mining techniques tend to analyze these processes as if they are in a steady state. Processes may change suddenly or gradually. The drift may be periodic (e.g., because of seasonal influences) or one-of-a-kind (e.g., the effects of new legislation). For the process management, it is crucial to discover and understand such concept drifts in processes. This paper presents a generic framework and specific techniques to detect when a process changes and to localize the parts of the process that have changed. Different features are proposed to characterize relationships among activities. These features are used to discover differences between successive populations. The approach has been implemented as a plug-in of the ProM process mining framework and has been evaluated using both simulated event data exhibiting controlled concept drifts and real-life event data from a Dutch municipality.
- The process is stable and enough example traces have been recorded in the event log, itis possible to discover a high quality process model that can be used for performance analysis, compliance checking, and prediction.
- Unfortunately, most processes are not in steady-state. In today’s dynamic marketplace, it is increasingly necessary for enterprises to streamline their processes so as to reduce costs and to improve performance.
DISADVANTAGES OF EXISTING SYSTEM:
- characterization in an offline setting.
- Change point detection: To detect concept drift in processes, i.e., to detect that a process change has taken place.
- Change localization and characterization.
- Change process discovery: Having identified, localized, and characterized the changes, it is necessary to put all of these in perspective.
- In this paper, we have introduced the topic of concept drift in process mining, i.e., analyzing process changes based on event logs.
- We proposed feature sets and techniques to effectively detect the changes in event logs and identify the regions of change in a process.
ADVANTAGES OF PROPOSED SYSTEM:
- Heterogeneity of cases arising because of process changes can be effectively dealt with by detecting concept drifts.
- Supporting or improving operational processes and to obtain an accurate insight on process executions at any instant of time.
- Feature extraction and selection
- Generate populations
- Compare populations
- Interactive visualization
- Analyze changes
Feature extraction and selection:
This step pertains in defining the characteristics of the traces in an event log. In this paper, we have defined four features that characterize the control-flow perspective of process instances n an event log. Depending on the focus of analysis, we may define additional features, e.g., if we are interested in analyzing changes in organizational/resource perspective, we may consider features derived from social networks as a means of characterizing the event log. In addition to feature extraction, this step also involves feature selection. Feature selection is important when the number of features extracted is large.
An event log can be transformed into a data stream based on the features selected in the previous step. This step deals with defining the sample populations for studying the changes in the characteristics of traces. Different criteria/scenarios may be considered for generating these populations from the data stream. We have considered non-overlapping, continuous, and fixed-size windows for defining the populations. We may also consider, for example, non-continuous windows (there is a gap between two populations), adaptive windows (windows can be of different lengths), and so on, which are more appropriate for dealing with gradual and recurring drifts.
Once the sample populations are generated, the next step is to analyze these populations for any change in characteristics. In this paper, we advocate the use of statistical hypothesis tests for comparing populations. The null hypothesis in statistical tests states that distributions (or means, or standard deviations) of the two sample populations are equal. Depending on desired assumptions and the focus of analysis, different statistical tests can be used.
The results of comparative studies on the populations of trace characteristics can be intuitively presented to an analyst. For example, the significance probabilities of the hypothesis tests can be visualized as a drift plot. Troughs in such a drift plot signify a change in the significance probability thereby implying a change in the characteristics of traces.
Visualization techniques such as the drift plot can assist in identifying the change points. Having identified that a change had taken place, this step deals with techniques that assist an analyst in characterizing and localizing the change and in discovering the change process.
- System : Pentium IV 2.4 GHz.
- Hard Disk :40 GB.
- Floppy Drive : 44 Mb.
- Monitor : 15 VGA Colour.
- Mouse :Logitech
- Ram : 512 Mb.
- Operating system : Windows XP/7.
- Coding Language : JAVA/J2EE
- IDE : Netbeans 7.4
- Database : MYSQL
- P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indr ̇ Žliobait ̇ , and Mykola Pechenizkiy,“Dealing With Concept Drifts in Process Mining”, VOL. 25, NO. 1, JANUARY 2014.