{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Conformance Checking\n", "*by: Sebastiaan J. van Zelst*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "In this tutorial, we'll be focusing on *conformance checking*.\n", "The conceptual idea of conformance checking is rather easy, i.e., computing to what degree a given process model conforms to the exeuction of a process, as recorded by the event data.\n", "We are going to use the same process model as we have seen before, i.e., based on our [running example event log](data/running_example.csv):" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "![Running example BPMN-based process model describing the behavior of the simple process that we use in this tutorial](img/bpmn_running_example.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "However, to check conformance w.r.t. the model, we're going to use a slightly [different event log](data/running_example_broken.csv).\n", "In this tutorial, we'll consider two types of techniques, i.e., *token-based-replay*, and, *alignments*.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Token-Based-Replay" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "In order to understand token-based-replay, we first need to cover a bit of Petri net theory.\n", "Let's use the Petri net based on the clean [running example event log](data/running_example.csv), as an example." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "import pm4py\n", "df = pm4py.format_dataframe(pd.read_csv('data/running_example.csv', sep=';'), case_id='case_id',activity_key='activity',\n", " timestamp_key='timestamp')\n", "pn, im, fm = pm4py.discover_petri_net_inductive(df)\n", "pm4py.view_petri_net(pn, im, fm)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "### Places and Transitions\n", "Observe that the Petri net consists of two different type of nodes, i.e., cirlces and rectangles.\n", "We refer to the circles as *places* and we refer to the rectangles as *transitions*.\n", "Furthermore, notice that, a place can only be connected (by means of an arc) to a transition.\n", "Similarly, a transition can only be connected (by means of an arc) to a place.\n", "Hence, *places never connect directly to places* and *transitions never connect directly to transitions*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "### Tokens, Enabledness and Transition Firing\n", "There is one place in the model containing a black 'dot'.\n", "This dot is referred to as a *token*.\n", "For convienence, let's call the place containing the token 'source'.\n", "A transition can consume and produce tokens, referred to as *firing a transition*.\n", "A transition is allowed to fire, if all of its 'incoming places' contain at least one token.\n", "Any transition for which this property holds is referred to as an *enabled transition*.\n", "In the example net, only the 'source' place contains a token.\n", "Consequently, the only transition that has a token in all of its 'incoming places' is the transition *register request*, i.e., it is enabled.\n", "If we diced to fire the an enabled transition, it consumes one token from each of its 'incoming places' and it produces a token in each of its 'outgoing places'.\n", "For example, if we *fire* the *register request* transition, it consumes the token in the source place and it produces a fresh token in its outgoing place (i.e., the place connected to it by means of an outgoing arc).\n", "\n", "*It is extremely important to note that there is no relationship between token production and consumption, i.e., tokens that are consumed cease to exist, tokens that are produced are always \"fresh tokens\"*.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "### Token-Based-Replay - The Basics\n", "When we use token-based-based-replay, we are effecitvely mimicking behavior observed in the event log in the context of a given process model.\n", "\n", "Let's assume that in the event log, we observe the trace: \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "$\\langle \\text{register request, examine casually, check ticket, decide, reject request} \\rangle$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "The token-based replay algorithm will simply mimick the trace in the model and keep track of the number of tokens we need to produce, respectively consume to *replay* the trace in the model.\n", "For example, the first activity in the trace, i.e., *register request*, can be directly mimicked by consuming the token in the source place.\n", "To subsequently fire the *examine causally* activity, we need the token produced by firing the *register request* transition.\n", "The token needs to be consumed by the *black* transition (this is referred to as an invisible transition) that connects to the output place of the *register request* transition.\n", "Said transition will produce two fresh tokens (observe that it has two outgoing places), one of which can subsequently be consumed by the *examine casually* transition.\n", "Essentially, the token-based-replay algorithm keeps repeating this rationale, until it has mimicked the complete trace.\n", "\n", "In the previous example, the trace can be completely mimicked (or *replayed*) by the model.\n", "However, consider the trace:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "$\\langle \\text{register request, examine casually, check ticket, reject request} \\rangle$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "When analyzing the trace, we observe that a decision is missing.\n", "The token-based-replay algorithm can detect this, i.e., it can detect that due to the lack of the *decide* transition in the trace, when mimicking it, tokens would remain in the input places of the *decide* transition, and, similarly, tokens would be missing in the input place of the *reject request* transition.\n", "\n", "For a given event log, the token-based-replay algorithm simply mimicks every trace in the event log, and, keeps track of the number of detected problems (missing and remaining tokens when mimicking the bahvior).\n", "It subsequently compares the dected number of problems with the total amount of 'correct behavior' and produces a 'conformity score' (often referred to as a 'fitness' score) between $0$ and $1$.\n", "If the score is $1$, no problems were detected.\n", "If the score is $0$, no normal behavior was detected." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Token-Based-Replay in pm4py" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.00988912582397461, "initial": 0, "n": 0, "ncols": null, "nrows": 15, "postfix": null, "prefix": "replaying log with TBR, completed variants :: ", "rate": null, "total": 6, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "fa4996a01e5c444896cb5920ac3f460b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "replaying log with TBR, completed variants :: 0%| | 0/6 [00:00>', 'register request'),\n", " ('>>', None),\n", " ('examine thoroughly', 'examine thoroughly'),\n", " ('check ticket', 'check ticket'),\n", " ('decide', 'decide'),\n", " ('>>', None),\n", " ('reject request', 'reject request')],\n", " 'cost': 10002,\n", " 'visited_states': 7,\n", " 'queued_states': 22,\n", " 'traversed_arcs': 22,\n", " 'lp_solved': 1,\n", " 'fitness': 0.8888888888888888,\n", " 'bwc': 90002},\n", " {'alignment': [('register request', 'register request'),\n", " ('>>', None),\n", " ('check ticket', 'check ticket'),\n", " ('examine casually', 'examine casually'),\n", " ('>>', 'decide'),\n", " ('>>', None),\n", " ('pay compensation', 'pay compensation')],\n", " 'cost': 10002,\n", " 'visited_states': 7,\n", " 'queued_states': 23,\n", " 'traversed_arcs': 23,\n", " 'lp_solved': 2,\n", " 'fitness': 0.8888888888888888,\n", " 'bwc': 90002},\n", " {'alignment': [('register request', 'register request'),\n", " ('>>', None),\n", " ('>>', 'examine casually'),\n", " ('check ticket', 'check ticket'),\n", " ('decide', 'decide'),\n", " ('reinitiate request', '>>'),\n", " ('>>', None),\n", " ('pay compensation', 'pay compensation')],\n", " 'cost': 20002,\n", " 'visited_states': 8,\n", " 'queued_states': 27,\n", " 'traversed_arcs': 27,\n", " 'lp_solved': 5,\n", " 'fitness': 0.8,\n", " 'bwc': 100002},\n", " {'alignment': [('register request', 'register request'),\n", " ('>>', None),\n", " ('check ticket', 'check ticket'),\n", " ('examine thoroughly', 'examine thoroughly'),\n", " ('decide', 'decide'),\n", " ('>>', None),\n", " ('reject request', 'reject request')],\n", " 'cost': 2,\n", " 'visited_states': 7,\n", " 'queued_states': 24,\n", " 'traversed_arcs': 24,\n", " 'lp_solved': 1,\n", " 'fitness': 1.0,\n", " 'bwc': 100002},\n", " {'alignment': [('register request', 'register request'),\n", " ('>>', None),\n", " ('examine casually', 'examine casually'),\n", " ('check ticket', 'check ticket'),\n", " ('decide', 'decide'),\n", " ('reinitiate the request for real', '>>'),\n", " ('>>', 'reinitiate request'),\n", " ('>>', None),\n", " ('check ticket', 'check ticket'),\n", " ('examine casually', 'examine casually'),\n", " ('decide', 'decide'),\n", " ('>>', 'reinitiate request'),\n", " ('>>', None),\n", " ('examine casually', 'examine casually'),\n", " ('check ticket', 'check ticket'),\n", " ('decide', 'decide'),\n", " ('>>', None),\n", " ('reject request', 'reject request')],\n", " 'cost': 30004,\n", " 'visited_states': 18,\n", " 'queued_states': 59,\n", " 'traversed_arcs': 59,\n", " 'lp_solved': 19,\n", " 'fitness': 0.8235294117647058,\n", " 'bwc': 170002},\n", " {'alignment': [('register request', 'register request'),\n", " ('>>', None),\n", " ('>>', 'examine casually'),\n", " ('check ticket', 'check ticket'),\n", " ('>>', 'decide'),\n", " ('decide something', '>>'),\n", " ('>>', None),\n", " ('pay compensation', 'pay compensation')],\n", " 'cost': 30002,\n", " 'visited_states': 8,\n", " 'queued_states': 25,\n", " 'traversed_arcs': 25,\n", " 'lp_solved': 2,\n", " 'fitness': 0.6666666666666667,\n", " 'bwc': 90002}]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pn, im, fm = pm4py.discover_petri_net_inductive(df)\n", "pm4py.conformance_diagnostics_alignments(df_problems, pn, im, fm)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ "Like token-based-replay, alignments can also be used to quantify 'fitness':" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "outputs": [ { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.011507749557495117, "initial": 0, "n": 0, "ncols": null, "nrows": 15, "postfix": null, "prefix": "aligning log, completed variants :: ", "rate": null, "total": 6, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "85b88bb2093547bcb6586352566c96a0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "aligning log, completed variants :: 0%| | 0/6 [00:00