{
"cells": [
{
"cell_type": "markdown",
"id": "98045d10-2877-4635-8792-530bcfa1b0fc",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Process Mining in Python: Basics and Integrations to Other Data Science Libraries\n",
"## ICPM'22 ML4PM\n",
"###### 2022-10-24; Sebastiaan J. van Zelst (with credits to Alessandro Berti)"
]
},
{
"cell_type": "markdown",
"id": "69e3a68b-fece-485d-92e9-f80888f8ae93",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Process Mining\n",
"\n",
"
\n", " | org:group | \n", "concept:instance | \n", "org:resource | \n", "concept:name | \n", "time:timestamp | \n", "lifecycle:transition | \n", "case:startdate | \n", "case:responsible | \n", "case:enddate_planned | \n", "case:department | \n", "case:group | \n", "case:concept:name | \n", "case:deadline | \n", "case:channel | \n", "case:enddate | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Group 1 | \n", "task-42933 | \n", "Resource21 | \n", "Confirmation of receipt | \n", "2011-10-11 11:45:40.276000+00:00 | \n", "complete | \n", "2011-10-11 13:42:22.688000+02:00 | \n", "Resource21 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "General | \n", "Group 2 | \n", "case-10011 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "Internet | \n", "NaN | \n", "
1 | \n", "Group 4 | \n", "task-42935 | \n", "Resource10 | \n", "T02 Check confirmation of receipt | \n", "2011-10-12 06:26:25.398000+00:00 | \n", "complete | \n", "2011-10-11 13:42:22.688000+02:00 | \n", "Resource21 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "General | \n", "Group 2 | \n", "case-10011 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "Internet | \n", "NaN | \n", "
2 | \n", "Group 1 | \n", "task-42957 | \n", "Resource21 | \n", "T03 Adjust confirmation of receipt | \n", "2011-11-24 14:36:51.302000+00:00 | \n", "complete | \n", "2011-10-11 13:42:22.688000+02:00 | \n", "Resource21 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "General | \n", "Group 2 | \n", "case-10011 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "Internet | \n", "NaN | \n", "
3 | \n", "Group 4 | \n", "task-47958 | \n", "Resource21 | \n", "T02 Check confirmation of receipt | \n", "2011-11-24 14:37:16.553000+00:00 | \n", "complete | \n", "2011-10-11 13:42:22.688000+02:00 | \n", "Resource21 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "General | \n", "Group 2 | \n", "case-10011 | \n", "2011-12-06 13:41:31.788000+01:00 | \n", "Internet | \n", "NaN | \n", "
4 | \n", "EMPTY | \n", "task-43021 | \n", "Resource30 | \n", "Confirmation of receipt | \n", "2011-10-18 11:46:39.679000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
\n", " | org:group | \n", "concept:instance | \n", "org:resource | \n", "concept:name | \n", "time:timestamp | \n", "lifecycle:transition | \n", "case:startdate | \n", "case:responsible | \n", "case:enddate_planned | \n", "case:department | \n", "case:group | \n", "case:concept:name | \n", "case:deadline | \n", "case:channel | \n", "case:enddate | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | \n", "EMPTY | \n", "task-43021 | \n", "Resource30 | \n", "Confirmation of receipt | \n", "2011-10-18 11:46:39.679000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
5 | \n", "Group 1 | \n", "task-43672 | \n", "Resource30 | \n", "T06 Determine necessity of stop advice | \n", "2011-10-18 11:47:06.950000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
6 | \n", "Group 4 | \n", "task-43671 | \n", "Resource30 | \n", "T02 Check confirmation of receipt | \n", "2011-10-18 11:47:26.235000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
7 | \n", "Group 1 | \n", "task-43674 | \n", "Resource30 | \n", "T03 Adjust confirmation of receipt | \n", "2011-10-18 11:47:41.811000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
8 | \n", "Group 4 | \n", "task-43675 | \n", "Resource30 | \n", "T02 Check confirmation of receipt | \n", "2011-10-18 11:47:57.979000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
9 | \n", "Group 1 | \n", "task-43673 | \n", "Resource30 | \n", "T10 Determine necessity to stop indication | \n", "2011-10-18 11:48:15.357000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
10 | \n", "Group 1 | \n", "task-43676 | \n", "Resource30 | \n", "T03 Adjust confirmation of receipt | \n", "2011-10-18 11:48:30.632000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
11 | \n", "Group 4 | \n", "task-43679 | \n", "Resource30 | \n", "T02 Check confirmation of receipt | \n", "2011-10-18 11:51:01.525000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
12 | \n", "Group 1 | \n", "task-43686 | \n", "admin2 | \n", "T03 Adjust confirmation of receipt | \n", "2011-10-18 11:56:57.603000+00:00 | \n", "complete | \n", "2011-10-11 01:06:40.020000+02:00 | \n", "Resource04 | \n", "2011-12-06 01:06:40.010000+01:00 | \n", "General | \n", "Group 5 | \n", "case-10017 | \n", "2011-12-06 01:06:40+01:00 | \n", "Internet | \n", "2011-10-18 13:56:55.943000+02:00 | \n", "
\n", " | trace:case:channel@UNDEFINED | \n", "trace:case:responsible@UNDEFINED | \n", "trace:case:group@UNDEFINED | \n", "trace:case:department@UNDEFINED | \n", "event:concept:name@Confirmation of receipt | \n", "event:concept:name@T02 Check confirmation of receipt | \n", "event:concept:name@T03 Adjust confirmation of receipt | \n", "event:concept:name@T04 Determine confirmation of receipt | \n", "event:concept:name@T05 Print and send confirmation of receipt | \n", "event:concept:name@T06 Determine necessity of stop advice | \n", "... | \n", "succession:org:resource@admin1#Resource33 | \n", "succession:org:resource@admin1#Resource35 | \n", "succession:org:resource@admin1#admin1 | \n", "succession:org:resource@admin1#admin2 | \n", "succession:org:resource@admin2#TEST | \n", "succession:org:resource@admin2#admin2 | \n", "succession:org:resource@admin3#Resource18 | \n", "succession:org:resource@admin3#admin1 | \n", "succession:org:resource@test#Resource26 | \n", "succession:org:resource@test#test | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1429 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1430 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1431 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1432 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1433 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1434 rows × 466 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "
---|---|---|---|---|---|
0 | \n", "1.708022 | \n", "-0.098254 | \n", "0.510475 | \n", "-0.202320 | \n", "-0.087188 | \n", "
1 | \n", "1.864585 | \n", "0.224361 | \n", "-0.652616 | \n", "0.290643 | \n", "0.361289 | \n", "
2 | \n", "-0.587899 | \n", "-0.536829 | \n", "-0.071282 | \n", "-0.350070 | \n", "-0.075767 | \n", "
3 | \n", "-0.587899 | \n", "-0.536829 | \n", "-0.071282 | \n", "-0.350070 | \n", "-0.075767 | \n", "
4 | \n", "-0.630247 | \n", "-0.552485 | \n", "-0.177900 | \n", "-0.420901 | \n", "0.000310 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1429 | \n", "-0.587899 | \n", "-0.536829 | \n", "-0.071282 | \n", "-0.350070 | \n", "-0.075767 | \n", "
1430 | \n", "-0.587899 | \n", "-0.536829 | \n", "-0.071282 | \n", "-0.350070 | \n", "-0.075767 | \n", "
1431 | \n", "-0.587899 | \n", "-0.536829 | \n", "-0.071282 | \n", "-0.350070 | \n", "-0.075767 | \n", "
1432 | \n", "-0.546977 | \n", "-0.470138 | \n", "-0.076281 | \n", "-0.281238 | \n", "-0.064669 | \n", "
1433 | \n", "-0.564049 | \n", "-0.462921 | \n", "-0.057423 | \n", "-0.353011 | \n", "0.009570 | \n", "
1434 rows × 5 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "scores | \n", "
---|---|---|---|---|---|---|
812 | \n", "1.751406 | \n", "-0.331821 | \n", "0.794158 | \n", "0.156213 | \n", "0.504005 | \n", "-0.117487 | \n", "
40 | \n", "1.413179 | \n", "-0.406050 | \n", "0.839251 | \n", "0.761958 | \n", "0.555376 | \n", "-0.108282 | \n", "
1164 | \n", "1.752707 | \n", "-0.389016 | \n", "0.781480 | \n", "0.208615 | \n", "0.375662 | \n", "-0.108206 | \n", "
1093 | \n", "1.752707 | \n", "-0.389016 | \n", "0.781480 | \n", "0.208615 | \n", "0.375662 | \n", "-0.108206 | \n", "
317 | \n", "-0.096639 | \n", "1.483958 | \n", "-0.754600 | \n", "1.159447 | \n", "-0.805189 | \n", "-0.107012 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1087 | \n", "-0.558603 | \n", "-0.457019 | \n", "-0.059373 | \n", "-0.369335 | \n", "-0.042482 | \n", "0.149078 | \n", "
1268 | \n", "-0.558603 | \n", "-0.457019 | \n", "-0.059373 | \n", "-0.369335 | \n", "-0.042482 | \n", "0.149078 | \n", "
1068 | \n", "-0.558603 | \n", "-0.457019 | \n", "-0.059373 | \n", "-0.369335 | \n", "-0.042482 | \n", "0.149078 | \n", "
1130 | \n", "-0.558603 | \n", "-0.457019 | \n", "-0.059373 | \n", "-0.369335 | \n", "-0.042482 | \n", "0.149078 | \n", "
1203 | \n", "-0.558603 | \n", "-0.457019 | \n", "-0.059373 | \n", "-0.369335 | \n", "-0.042482 | \n", "0.149078 | \n", "
1434 rows × 6 columns
\n", "\n", " | org:group_EMPTY | \n", "org:group_Group 4 | \n", "
---|---|---|
0 | \n", "0 | \n", "1 | \n", "
1 | \n", "0 | \n", "1 | \n", "
2 | \n", "0 | \n", "1 | \n", "
3 | \n", "1 | \n", "0 | \n", "
4 | \n", "0 | \n", "1 | \n", "
5 | \n", "0 | \n", "1 | \n", "
6 | \n", "0 | \n", "1 | \n", "
7 | \n", "0 | \n", "1 | \n", "
8 | \n", "0 | \n", "1 | \n", "
9 | \n", "0 | \n", "1 | \n", "
10 | \n", "1 | \n", "0 | \n", "
11 | \n", "0 | \n", "1 | \n", "
12 | \n", "0 | \n", "1 | \n", "
13 | \n", "0 | \n", "1 | \n", "
14 | \n", "0 | \n", "1 | \n", "
15 | \n", "0 | \n", "1 | \n", "
16 | \n", "0 | \n", "1 | \n", "
17 | \n", "0 | \n", "1 | \n", "
18 | \n", "1 | \n", "0 | \n", "
19 | \n", "1 | \n", "0 | \n", "
20 | \n", "1 | \n", "0 | \n", "
21 | \n", "0 | \n", "1 | \n", "
22 | \n", "0 | \n", "1 | \n", "
23 | \n", "0 | \n", "1 | \n", "
24 | \n", "0 | \n", "1 | \n", "
25 | \n", "0 | \n", "1 | \n", "
KNeighborsRegressor(n_neighbors=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsRegressor(n_neighbors=3)