Spaces:
Running
Running
Merge branch 'main' into demo-icpm24
Browse files* main: (72 commits)
Test huggingface CI
Update README.md
Update README.md
Update README.md
Triggers CI
Update README.md
Adds preprint citation
Update .conda.yml
Updates conda
Updates conda
Updates conda yml
Erases windows package
Update .conda.yml
Update setup.py
Updates paper figs notebooks
Update README.md
Update README.md
Update README.md
Update README.md
Removes unused libs
...
- .github/workflows/test_gedi.yml +3 -6
- README.md +41 -15
- config_files/pipeline_steps/benchmark.json +4 -0
- data/test/grid_experiments/rt10v.csv +0 -12
- data/validation/genELexperiment1_04_02.json +1 -1
- data/validation/genELexperiment2_07_04.json +1 -1
- data/validation/test_benchmark.csv +3 -0
- gedi/generator.py +1 -0
- notebooks/experiment_grid_2obj_configfiles_fabric.ipynb +1184 -0
- notebooks/gedi_fig6_benchmark_boxplots.ipynb +0 -0
- notebooks/gedi_figs4and5_representativeness.ipynb +0 -0
- notebooks/gedi_figs7and8_benchmarking_statisticalTests.ipynb +36 -28
- setup.py +0 -1
.github/workflows/test_gedi.yml
CHANGED
@@ -36,7 +36,6 @@ jobs:
|
|
36 |
- name: Compare output
|
37 |
run: diff data/validation/test_feat.csv data/test_feat.csv
|
38 |
|
39 |
-
|
40 |
test_generation:
|
41 |
runs-on: ubuntu-latest
|
42 |
|
@@ -72,7 +71,7 @@ jobs:
|
|
72 |
diff data/validation/genELexperiment2_07_04.json output/features/grid_feat/2_enself_rt20v/genELexperiment2_07_04.json
|
73 |
|
74 |
- name: Compare output 3
|
75 |
-
run:
|
76 |
diff data/validation/genELexperiment3_04_nan.json output/features/grid_feat/2_enself_rt20v/genELexperiment3_04_nan.json
|
77 |
|
78 |
- name: Compare output 4
|
@@ -109,7 +108,6 @@ jobs:
|
|
109 |
- name: Compare output
|
110 |
run: diff data/validation/test_benchmark.csv output/benchmark/test_benchmark.csv
|
111 |
|
112 |
-
|
113 |
test_augmentation:
|
114 |
runs-on: ubuntu-latest
|
115 |
|
@@ -156,7 +154,6 @@ jobs:
|
|
156 |
|
157 |
- name: Run test
|
158 |
run:
|
159 |
-
|
160 |
python main.py -a config_files/pipeline_steps/evaluation_plotter.json
|
161 |
|
162 |
test_integration:
|
@@ -244,5 +241,5 @@ jobs:
|
|
244 |
python main.py -a config_files/test/test_abbrv_generation.json
|
245 |
|
246 |
- name: Compare output
|
247 |
-
run:
|
248 |
-
diff data/validation/2_ense_rmcv_feat.csv output/test/igedi_table_1/2_ense_rmcv_feat.csv
|
|
|
36 |
- name: Compare output
|
37 |
run: diff data/validation/test_feat.csv data/test_feat.csv
|
38 |
|
|
|
39 |
test_generation:
|
40 |
runs-on: ubuntu-latest
|
41 |
|
|
|
71 |
diff data/validation/genELexperiment2_07_04.json output/features/grid_feat/2_enself_rt20v/genELexperiment2_07_04.json
|
72 |
|
73 |
- name: Compare output 3
|
74 |
+
run:
|
75 |
diff data/validation/genELexperiment3_04_nan.json output/features/grid_feat/2_enself_rt20v/genELexperiment3_04_nan.json
|
76 |
|
77 |
- name: Compare output 4
|
|
|
108 |
- name: Compare output
|
109 |
run: diff data/validation/test_benchmark.csv output/benchmark/test_benchmark.csv
|
110 |
|
|
|
111 |
test_augmentation:
|
112 |
runs-on: ubuntu-latest
|
113 |
|
|
|
154 |
|
155 |
- name: Run test
|
156 |
run:
|
|
|
157 |
python main.py -a config_files/pipeline_steps/evaluation_plotter.json
|
158 |
|
159 |
test_integration:
|
|
|
241 |
python main.py -a config_files/test/test_abbrv_generation.json
|
242 |
|
243 |
- name: Compare output
|
244 |
+
run:
|
245 |
+
diff data/validation/2_ense_rmcv_feat.csv output/test/igedi_table_1/2_ense_rmcv_feat.csv
|
README.md
CHANGED
@@ -12,10 +12,10 @@ license: mit
|
|
12 |
|
13 |
<p>
|
14 |
<img src="gedi/utils/logo.png" alt="Logo" width="100" align="left" />
|
15 |
-
<h1 style="display: inline;">
|
16 |
</p>
|
17 |
|
18 |
-
**i**nteractive **G**enerating **E**vent **D**ata with **I**ntentional Features for Benchmarking Process Mining<br />
|
19 |
This repository contains the codebase for the interactive web application tool (iGEDI) as well as for the [GEDI paper](https://mcml.ai/publications/gedi.pdf) accepted at the BPM'24 conference.
|
20 |
|
21 |
## Table of Contents
|
@@ -87,7 +87,6 @@ The JSON file consists of the following key-value pairs:
|
|
87 |
- font_size: label font size of the output plot
|
88 |
- boxplot_width: width of the violinplot/boxplot
|
89 |
|
90 |
-
|
91 |
### Generation
|
92 |
---
|
93 |
After having extracted meta features from the files, the next step is to generate event log data accordingly. Generally, there are two settings on how the targets are defined: i) meta feature targets are defined by the meta features from the real event log data; ii) a configuration space is defined which resembles the feasible meta features space.
|
@@ -389,7 +388,7 @@ python main.py -a config_files/experiment_real_targets.json
|
|
389 |
To execute the experiments with grid targets, a single [configuration](config_files/grid_2obj) can be selected or all [grid objectives](data/grid_2obj) can be run with one command using the following script. This script will output the [generated event logs (GenED)](data/event_logs/GenED), alongside their respectively measured [feature values](data/GenED_feat.csv) and [benchmark metrics values](data/GenED_bench.csv).
|
390 |
```
|
391 |
conda activate gedi
|
392 |
-
python execute_grid_experiments.py config_files/
|
393 |
```
|
394 |
We employ the [experiment_grid_2obj_configfiles_fabric.ipynb](notebooks/experiment_grid_2obj_configfiles_fabric.ipynb) to create all necessary [configuration](config_files/grid_2obj) and [objective](data/grid_2obj) files for this experiment.
|
395 |
For more details about these config_files, please refer to [Feature Extraction](#feature-extraction), [Generation](#generation), and [Benchmark](#benchmark).
|
@@ -401,6 +400,7 @@ streamlit run utils/config_fabric.py # To tunnel to local machine add: --server.
|
|
401 |
ssh -N -f -L 9000:localhost:8501 <user@remote_machine.com>
|
402 |
open "http://localhost:9000/"
|
403 |
```
|
|
|
404 |
### Visualizations
|
405 |
To run the visualizations, we employ [jupyter notebooks](https://jupyter.org/install) and [add the installed environment to the jupyter notebook](https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084). We then start all visualizations by running e.g.: `jupyter noteboook`. In the following, we describe the `.ipynb`-files in the folder `\notebooks` to reproduce the figures from our paper.
|
406 |
|
@@ -418,17 +418,43 @@ This notebook is used to answer the question if there is a statistically signifi
|
|
418 |
Likewise to the evaluation on the statistical tests in notebook `gedi_figs7and8_benchmarking_statisticalTests.ipynb`, this notebook is used to compute the differences between two correlation matrices $\Delta C = C_1 - C_2$. This logic is employed to evaluate and visualize the distance of two correlation matrices. Furthermore, we show how significant scores are retained from the correlations being evaluated on real-world datasets coompared to synthesized event log datasets with real-world targets. In Fig. 9 and 10 in the paper, the results of the notebook are shown.
|
419 |
|
420 |
## Citation
|
421 |
-
The `GEDI` framework is taken directly from the original paper by [Maldonado](mailto:[email protected]), Frey, Tavares, Rehwald and Seidl
|
422 |
-
|
423 |
-
```
|
424 |
-
@
|
425 |
-
|
426 |
-
|
427 |
-
|
428 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
429 |
year = {2024},
|
430 |
-
|
431 |
-
doi = {},
|
432 |
-
eprinttype = {website},
|
433 |
}
|
434 |
```
|
|
|
12 |
|
13 |
<p>
|
14 |
<img src="gedi/utils/logo.png" alt="Logo" width="100" align="left" />
|
15 |
+
<h1 style="display: inline;">(i)GEDI</h1>
|
16 |
</p>
|
17 |
|
18 |
+
(**i**nteractive) **G**enerating **E**vent **D**ata with **I**ntentional Features for Benchmarking Process Mining<br />
|
19 |
This repository contains the codebase for the interactive web application tool (iGEDI) as well as for the [GEDI paper](https://mcml.ai/publications/gedi.pdf) accepted at the BPM'24 conference.
|
20 |
|
21 |
## Table of Contents
|
|
|
87 |
- font_size: label font size of the output plot
|
88 |
- boxplot_width: width of the violinplot/boxplot
|
89 |
|
|
|
90 |
### Generation
|
91 |
---
|
92 |
After having extracted meta features from the files, the next step is to generate event log data accordingly. Generally, there are two settings on how the targets are defined: i) meta feature targets are defined by the meta features from the real event log data; ii) a configuration space is defined which resembles the feasible meta features space.
|
|
|
388 |
To execute the experiments with grid targets, a single [configuration](config_files/grid_2obj) can be selected or all [grid objectives](data/grid_2obj) can be run with one command using the following script. This script will output the [generated event logs (GenED)](data/event_logs/GenED), alongside their respectively measured [feature values](data/GenED_feat.csv) and [benchmark metrics values](data/GenED_bench.csv).
|
389 |
```
|
390 |
conda activate gedi
|
391 |
+
python gedi/utils/execute_grid_experiments.py config_files/test
|
392 |
```
|
393 |
We employ the [experiment_grid_2obj_configfiles_fabric.ipynb](notebooks/experiment_grid_2obj_configfiles_fabric.ipynb) to create all necessary [configuration](config_files/grid_2obj) and [objective](data/grid_2obj) files for this experiment.
|
394 |
For more details about these config_files, please refer to [Feature Extraction](#feature-extraction), [Generation](#generation), and [Benchmark](#benchmark).
|
|
|
400 |
ssh -N -f -L 9000:localhost:8501 <user@remote_machine.com>
|
401 |
open "http://localhost:9000/"
|
402 |
```
|
403 |
+
|
404 |
### Visualizations
|
405 |
To run the visualizations, we employ [jupyter notebooks](https://jupyter.org/install) and [add the installed environment to the jupyter notebook](https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084). We then start all visualizations by running e.g.: `jupyter noteboook`. In the following, we describe the `.ipynb`-files in the folder `\notebooks` to reproduce the figures from our paper.
|
406 |
|
|
|
418 |
Likewise to the evaluation on the statistical tests in notebook `gedi_figs7and8_benchmarking_statisticalTests.ipynb`, this notebook is used to compute the differences between two correlation matrices $\Delta C = C_1 - C_2$. This logic is employed to evaluate and visualize the distance of two correlation matrices. Furthermore, we show how significant scores are retained from the correlations being evaluated on real-world datasets coompared to synthesized event log datasets with real-world targets. In Fig. 9 and 10 in the paper, the results of the notebook are shown.
|
419 |
|
420 |
## Citation
|
421 |
+
The `GEDI` framework is taken directly from the original paper by [Maldonado](mailto:[email protected]), Frey, Tavares, Rehwald and Seidl on BPM'24.
|
422 |
+
|
423 |
+
```
|
424 |
+
@InProceedings{maldonado2024gedi,
|
425 |
+
author="Maldonado, Andrea
|
426 |
+
and Frey, Christian M. M.
|
427 |
+
and Tavares, Gabriel Marques
|
428 |
+
and Rehwald, Nikolina
|
429 |
+
and Seidl, Thomas",
|
430 |
+
editor="Marrella, Andrea
|
431 |
+
and Resinas, Manuel
|
432 |
+
and Jans, Mieke
|
433 |
+
and Rosemann, Michael",
|
434 |
+
title="GEDI: Generating Event Data with Intentional Features for Benchmarking Process Mining",
|
435 |
+
booktitle="Business Process Management",
|
436 |
+
year="2024",
|
437 |
+
publisher="Springer Nature Switzerland",
|
438 |
+
address="Cham",
|
439 |
+
pages="221--237",
|
440 |
+
abstract="Process mining solutions include enhancing performance, conserving resources, and alleviating bottlenecks in organizational contexts. However, as in other data mining fields, success hinges on data quality and availability. Existing analyses for process mining solutions lack diverse and ample data for rigorous testing, hindering insights' generalization. To address this, we propose Generating Event Data with Intentional features, a framework producing event data sets satisfying specific meta-features. Considering the meta-feature space that defines feasible event logs, we observe that existing real-world datasets describe only local areas within the overall space. Hence, our framework aims at providing the capability to generate an event data benchmark, which covers unexplored regions. Therefore, our approach leverages a discretization of the meta-feature space to steer generated data towards regions, where a combination of meta-features is not met yet by existing benchmark datasets. Providing a comprehensive data pool enriches process mining analyses, enables methods to capture a wider range of real-world scenarios, and improves evaluation quality. Moreover, it empowers analysts to uncover correlations between meta-features and evaluation metrics, enhancing explainability and solution effectiveness. Experiments demonstrate GEDI's ability to produce a benchmark of intentional event data sets and robust analyses for process mining tasks.",
|
441 |
+
isbn="978-3-031-70396-6"
|
442 |
+
}
|
443 |
+
```
|
444 |
+
|
445 |
+
Furthermore, the `iGEDI` web application is taken directly from the original paper by [Maldonado](mailto:[email protected]), Aryasomayajula, Frey, and Seidl and is *to appear on Demos@ICPM'24*.
|
446 |
+
```
|
447 |
+
@inproceedings{maldonado2024igedi,
|
448 |
+
author = {Andrea Maldonado and
|
449 |
+
Sai Anirudh Aryasomayajula and
|
450 |
+
Christian M. M. Frey and
|
451 |
+
Thomas Seidl},
|
452 |
+
editor = {Jochen De Weerdt, Giovanni Meroni, Han van der Aa, and Karolin Winter},
|
453 |
+
title = {iGEDI: interactive Generating Event Data with Intentional Features},
|
454 |
+
booktitle = {ICPM 2024 Tool Demonstration Track, October 14-18, 2024, Kongens Lyngby, Denmark},
|
455 |
+
series = {{CEUR} Workshop Proceedings},
|
456 |
+
publisher = {CEUR-WS.org},
|
457 |
year = {2024},
|
458 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
|
|
|
|
459 |
}
|
460 |
```
|
config_files/pipeline_steps/benchmark.json
CHANGED
@@ -4,6 +4,10 @@
|
|
4 |
"benchmark_test": "discovery",
|
5 |
"input_path":"data/test",
|
6 |
"output_path":"output",
|
|
|
7 |
"miners" : ["ind", "heu", "imf", "ilp"]
|
|
|
|
|
|
|
8 |
}
|
9 |
]
|
|
|
4 |
"benchmark_test": "discovery",
|
5 |
"input_path":"data/test",
|
6 |
"output_path":"output",
|
7 |
+
<<<<<<<< HEAD:config_files/pipeline_steps/benchmark.json
|
8 |
"miners" : ["ind", "heu", "imf", "ilp"]
|
9 |
+
========
|
10 |
+
"miners" : ["inductive", "heu", "imf", "ilp"]
|
11 |
+
>>>>>>>> main:config_files/algorithm/pipeline_steps/benchmark.json
|
12 |
}
|
13 |
]
|
data/test/grid_experiments/rt10v.csv
DELETED
@@ -1,12 +0,0 @@
|
|
1 |
-
task,ratio_top_10_variants
|
2 |
-
task_1,0.0
|
3 |
-
task_2,0.1
|
4 |
-
task_3,0.2
|
5 |
-
task_4,0.3
|
6 |
-
task_5,0.4
|
7 |
-
task_6,0.5
|
8 |
-
task_7,0.6
|
9 |
-
task_8,0.7
|
10 |
-
task_9,0.8
|
11 |
-
task_10,0.9
|
12 |
-
task_11,1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data/validation/genELexperiment1_04_02.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"ratio_top_20_variants": 0.20017714791851196, "epa_normalized_sequence_entropy_linear_forgetting": 0.052097205658647734, "log": "genELexperiment1_04_02", "target_similarity": 0.7418932364693804}
|
|
|
1 |
+
{"ratio_top_20_variants": 0.20017714791851196, "epa_normalized_sequence_entropy_linear_forgetting": 0.052097205658647734, "log": "genELexperiment1_04_02", "target_similarity": 0.7418932364693804}
|
data/validation/genELexperiment2_07_04.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"ratio_top_20_variants": 0.38863337713534823, "epa_normalized_sequence_entropy_linear_forgetting": 0.052097205658647734, "log": "genELexperiment2_07_04", "target_similarity": 0.6067951985524301}
|
|
|
1 |
+
{"ratio_top_20_variants": 0.38863337713534823, "epa_normalized_sequence_entropy_linear_forgetting": 0.052097205658647734, "log": "genELexperiment2_07_04", "target_similarity": 0.6067951985524301}
|
data/validation/test_benchmark.csv
CHANGED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
log,fitness_inductive,precision_inductive,fscore_inductive,size_inductive,pnsize_inductive,cfc_inductive,fitness_heu,precision_heu,fscore_heu,size_heu,pnsize_heu,cfc_heu,fitness_imf,precision_imf,fscore_imf,size_imf,pnsize_imf,cfc_imf,fitness_ilp,precision_ilp,fscore_ilp,size_ilp,pnsize_ilp,cfc_ilp
|
2 |
+
gen_el_169,0.9998052420892378,0.6662312989788649,0.7996241723917423,34,24,22,0.9383563249832565,0.5979149389882715,0.7304143193451293,22,14,13,0.9358843752091403,0.6513022517490741,0.7680805654451066,28,18,16,0.9999637006454563,0.432690150325331,0.6040181215566763,27,7,9
|
3 |
+
gen_el_168,0.9997678338833808,0.6033523537803138,0.7525477883058467,61,34,20,0.48155419290534085,0.9449078138718174,0.6379760800037585,60,35,32,0.9479094601490539,0.5169524053224155,0.669037930473001,67,38,24,0.9999513902099882,0.4283471743974073,0.5997714527549697,93,30,28
|
gedi/generator.py
CHANGED
@@ -152,6 +152,7 @@ class GenerateEventLogs():
|
|
152 |
|
153 |
self.params = params.get(GENERATOR_PARAMS)
|
154 |
experiment = self.params.get(EXPERIMENT)
|
|
|
155 |
if experiment is not None:
|
156 |
tasks, output_path = get_tasks(experiment, self.output_path)
|
157 |
columns_to_rename = {col: column_mappings()[col] for col in tasks.columns if col in column_mappings()}
|
|
|
152 |
|
153 |
self.params = params.get(GENERATOR_PARAMS)
|
154 |
experiment = self.params.get(EXPERIMENT)
|
155 |
+
|
156 |
if experiment is not None:
|
157 |
tasks, output_path = get_tasks(experiment, self.output_path)
|
158 |
columns_to_rename = {col: column_mappings()[col] for col in tasks.columns if col in column_mappings()}
|
notebooks/experiment_grid_2obj_configfiles_fabric.ipynb
ADDED
@@ -0,0 +1,1184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"id": "08ee6ee0",
|
6 |
+
"metadata": {},
|
7 |
+
"source": [
|
8 |
+
"## Grid Objectives\n",
|
9 |
+
"Iterating between min and max for each column\n",
|
10 |
+
"\n",
|
11 |
+
"### Glossary\n",
|
12 |
+
"- **task**: Refers to the set of values (row) and corresponding keys to be aimed at sequentially.\n",
|
13 |
+
"- **objective**: Refers to one key (column) and respective value to be aimed at simultaneously during a task.\n",
|
14 |
+
"- **experiment**: Refers to one file containing a multiple of objectives and tasks for a fixed number of each, respectively. "
|
15 |
+
]
|
16 |
+
},
|
17 |
+
{
|
18 |
+
"cell_type": "code",
|
19 |
+
"execution_count": 1,
|
20 |
+
"id": "e5aa7223",
|
21 |
+
"metadata": {},
|
22 |
+
"outputs": [],
|
23 |
+
"source": [
|
24 |
+
"import itertools\n",
|
25 |
+
"import json\n",
|
26 |
+
"import numpy as np\n",
|
27 |
+
"import os\n",
|
28 |
+
"import pandas as pd"
|
29 |
+
]
|
30 |
+
},
|
31 |
+
{
|
32 |
+
"cell_type": "code",
|
33 |
+
"execution_count": 2,
|
34 |
+
"id": "472fd031",
|
35 |
+
"metadata": {},
|
36 |
+
"outputs": [],
|
37 |
+
"source": [
|
38 |
+
"#Features between 0 and 1: \n",
|
39 |
+
"normalized_feature_names = ['ratio_variants_per_number_of_traces', 'trace_len_hist1', 'trace_len_hist2',\n",
|
40 |
+
" 'trace_len_hist3', 'trace_len_hist4', 'trace_len_hist5', 'trace_len_hist7',\n",
|
41 |
+
" 'trace_len_hist8', 'trace_len_hist9', 'ratio_most_common_variant', \n",
|
42 |
+
" 'ratio_top_1_variants', 'ratio_top_5_variants', 'ratio_top_10_variants', \n",
|
43 |
+
" 'ratio_top_20_variants', 'ratio_top_50_variants', 'ratio_top_75_variants', \n",
|
44 |
+
" 'epa_normalized_variant_entropy', 'epa_normalized_sequence_entropy', \n",
|
45 |
+
" 'epa_normalized_sequence_entropy_linear_forgetting', 'epa_normalized_sequence_entropy_exponential_forgetting']\n",
|
46 |
+
"\n",
|
47 |
+
"normalized_feature_names = ['ratio_variants_per_number_of_traces', 'ratio_most_common_variant', \n",
|
48 |
+
" 'ratio_top_10_variants', 'epa_normalized_variant_entropy', 'epa_normalized_sequence_entropy', \n",
|
49 |
+
" 'epa_normalized_sequence_entropy_linear_forgetting', 'epa_normalized_sequence_entropy_exponential_forgetting']\n",
|
50 |
+
"\n",
|
51 |
+
"def abbrev_obj_keys(obj_keys):\n",
|
52 |
+
" abbreviated_keys = []\n",
|
53 |
+
" for obj_key in obj_keys:\n",
|
54 |
+
" key_slices = obj_key.split(\"_\")\n",
|
55 |
+
" chars = []\n",
|
56 |
+
" for key_slice in key_slices:\n",
|
57 |
+
" for idx, single_char in enumerate(key_slice):\n",
|
58 |
+
" if idx == 0 or single_char.isdigit():\n",
|
59 |
+
" chars.append(single_char)\n",
|
60 |
+
" abbreviated_key = ''.join(chars)\n",
|
61 |
+
" abbreviated_keys.append(abbreviated_key)\n",
|
62 |
+
" return '_'.join(abbreviated_keys) "
|
63 |
+
]
|
64 |
+
},
|
65 |
+
{
|
66 |
+
"cell_type": "code",
|
67 |
+
"execution_count": 3,
|
68 |
+
"id": "2be119c8",
|
69 |
+
"metadata": {},
|
70 |
+
"outputs": [
|
71 |
+
{
|
72 |
+
"name": "stdout",
|
73 |
+
"output_type": "stream",
|
74 |
+
"text": [
|
75 |
+
"21 [('epa_normalized_sequence_entropy_linear_forgetting', 'ratio_top_10_variants'), ('epa_normalized_sequence_entropy_exponential_forgetting', 'epa_normalized_variant_entropy'), ('epa_normalized_variant_entropy', 'ratio_variants_per_number_of_traces'), ('epa_normalized_sequence_entropy_linear_forgetting', 'ratio_most_common_variant'), ('epa_normalized_sequence_entropy', 'ratio_variants_per_number_of_traces'), ('epa_normalized_sequence_entropy_exponential_forgetting', 'ratio_top_10_variants'), ('epa_normalized_sequence_entropy_exponential_forgetting', 'epa_normalized_sequence_entropy_linear_forgetting'), ('epa_normalized_sequence_entropy', 'epa_normalized_variant_entropy'), ('epa_normalized_sequence_entropy_exponential_forgetting', 'ratio_most_common_variant'), ('ratio_top_10_variants', 'ratio_variants_per_number_of_traces'), ('epa_normalized_sequence_entropy', 'ratio_top_10_variants'), ('epa_normalized_variant_entropy', 'ratio_top_10_variants'), ('epa_normalized_sequence_entropy', 'epa_normalized_sequence_entropy_linear_forgetting'), ('ratio_most_common_variant', 'ratio_variants_per_number_of_traces'), ('epa_normalized_variant_entropy', 'ratio_most_common_variant'), ('epa_normalized_sequence_entropy', 'ratio_most_common_variant'), ('epa_normalized_sequence_entropy_linear_forgetting', 'ratio_variants_per_number_of_traces'), ('epa_normalized_sequence_entropy', 'epa_normalized_sequence_entropy_exponential_forgetting'), ('epa_normalized_sequence_entropy_linear_forgetting', 'epa_normalized_variant_entropy'), ('epa_normalized_sequence_entropy_exponential_forgetting', 'ratio_variants_per_number_of_traces'), ('ratio_most_common_variant', 'ratio_top_10_variants')]\n",
|
76 |
+
"121\n",
|
77 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enself_rt10v.csv\n",
|
78 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enself_rt10v.json\n",
|
79 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enseef_enve.csv\n",
|
80 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enseef_enve.json\n",
|
81 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enve_rvpnot.csv\n",
|
82 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enve_rvpnot.json\n",
|
83 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enself_rmcv.csv\n",
|
84 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enself_rmcv.json\n",
|
85 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_rvpnot.csv\n",
|
86 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_rvpnot.json\n",
|
87 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enseef_rt10v.csv\n",
|
88 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enseef_rt10v.json\n",
|
89 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enseef_enself.csv\n",
|
90 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enseef_enself.json\n",
|
91 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_enve.csv\n",
|
92 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_enve.json\n",
|
93 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enseef_rmcv.csv\n",
|
94 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enseef_rmcv.json\n",
|
95 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_rt10v_rvpnot.csv\n",
|
96 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_rt10v_rvpnot.json\n",
|
97 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_rt10v.csv\n",
|
98 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_rt10v.json\n",
|
99 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enve_rt10v.csv\n",
|
100 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enve_rt10v.json\n",
|
101 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_enself.csv\n",
|
102 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_enself.json\n",
|
103 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_rmcv_rvpnot.csv\n",
|
104 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_rmcv_rvpnot.json\n",
|
105 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enve_rmcv.csv\n",
|
106 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enve_rmcv.json\n",
|
107 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_rmcv.csv\n",
|
108 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_rmcv.json\n",
|
109 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enself_rvpnot.csv\n",
|
110 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enself_rvpnot.json\n",
|
111 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_ense_enseef.csv\n",
|
112 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_ense_enseef.json\n",
|
113 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enself_enve.csv\n",
|
114 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enself_enve.json\n",
|
115 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_enseef_rvpnot.csv\n",
|
116 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_enseef_rvpnot.json\n",
|
117 |
+
"Saved experiment in ../data/grid_2obj/grid_2objectives_rmcv_rt10v.csv\n",
|
118 |
+
"Saved experiment config in ../config_files/algorithm/grid_2obj/generator_grid_2objectives_rmcv_rt10v.json\n",
|
119 |
+
"None\n"
|
120 |
+
]
|
121 |
+
}
|
122 |
+
],
|
123 |
+
"source": [
|
124 |
+
"def write_generator_experiment(experiment_path, objectives=[\"ratio_top_20_variants\", \"epa_normalized_sequence_entropy_linear_forgetting\"]):\n",
|
125 |
+
" first_dir = os.path.split(experiment_path[3:])[-1].replace(\".csv\",\"\")\n",
|
126 |
+
" second_dir = first_dir.replace(\"grid_\",\"\").replace(\"objectives\",\"\")\n",
|
127 |
+
"\n",
|
128 |
+
" experiment = [\n",
|
129 |
+
" {\n",
|
130 |
+
" 'pipeline_step': 'event_logs_generation',\n",
|
131 |
+
" 'output_path':'output/generated/grid_2obj',\n",
|
132 |
+
" 'generator_params': {\n",
|
133 |
+
" \"experiment\": {\"input_path\": experiment_path[3:],\n",
|
134 |
+
" \"objectives\": objectives},\n",
|
135 |
+
" 'config_space': {\n",
|
136 |
+
" 'mode': [5, 20],\n",
|
137 |
+
" 'sequence': [0.01, 1],\n",
|
138 |
+
" 'choice': [0.01, 1],\n",
|
139 |
+
" 'parallel': [0.01, 1],\n",
|
140 |
+
" 'loop': [0.01, 1],\n",
|
141 |
+
" 'silent': [0.01, 1],\n",
|
142 |
+
" 'lt_dependency': [0.01, 1],\n",
|
143 |
+
" 'num_traces': [10, 10001],\n",
|
144 |
+
" 'duplicate': [0],\n",
|
145 |
+
" 'or': [0]\n",
|
146 |
+
" },\n",
|
147 |
+
" 'n_trials': 200\n",
|
148 |
+
" }\n",
|
149 |
+
" },\n",
|
150 |
+
" {\n",
|
151 |
+
" 'pipeline_step': 'feature_extraction',\n",
|
152 |
+
" 'input_path': os.path.join('output','features', 'generated', 'grid_2obj', first_dir, second_dir),\n",
|
153 |
+
" \"feature_params\": {\"feature_set\":[\"ratio_variants_per_number_of_traces\",\"ratio_most_common_variant\",\"ratio_top_10_variants\",\"epa_normalized_variant_entropy\",\"epa_normalized_sequence_entropy\",\"epa_normalized_sequence_entropy_linear_forgetting\",\"epa_normalized_sequence_entropy_exponential_forgetting\"]},\n",
|
154 |
+
" 'output_path': 'output/plots',\n",
|
155 |
+
" 'real_eventlog_path': 'data/BaselineED_feat.csv',\n",
|
156 |
+
" 'plot_type': 'boxplot'\n",
|
157 |
+
" },\n",
|
158 |
+
" {\n",
|
159 |
+
" \"pipeline_step\": \"benchmark_test\",\n",
|
160 |
+
" \"benchmark_test\": \"discovery\",\n",
|
161 |
+
" \"input_path\": os.path.join('output', 'generated', 'grid_2obj', first_dir, second_dir),\n",
|
162 |
+
" \"output_path\":\"output\",\n",
|
163 |
+
" \"miners\" : [\"heu\", \"imf\", \"ilp\"]\n",
|
164 |
+
" }\n",
|
165 |
+
" ]\n",
|
166 |
+
"\n",
|
167 |
+
" #print(\"EXPERIMENT:\", experiment[1]['input_path'])\n",
|
168 |
+
" output_path = os.path.join('..', 'config_files','algorithm','grid_2obj')\n",
|
169 |
+
" os.makedirs(output_path, exist_ok=True)\n",
|
170 |
+
" output_path = os.path.join(output_path, f'generator_{os.path.split(experiment_path)[-1].split(\".\")[0]}.json') \n",
|
171 |
+
" with open(output_path, 'w') as f:\n",
|
172 |
+
" json.dump(experiment, f, ensure_ascii=False)\n",
|
173 |
+
" print(f\"Saved experiment config in {output_path}\")\n",
|
174 |
+
" \n",
|
175 |
+
" return experiment\n",
|
176 |
+
"\n",
|
177 |
+
"def create_objectives_grid(objectives, n_para_obj=2):\n",
|
178 |
+
" parameters_o = \"objectives, \"\n",
|
179 |
+
" if n_para_obj==1:\n",
|
180 |
+
" experiments = [[exp] for exp in objectives]\n",
|
181 |
+
" else:\n",
|
182 |
+
" experiments = eval(f\"[exp for exp in list(itertools.product({(parameters_o*n_para_obj)[:-2]})) if exp[0]!=exp[1]]\")\n",
|
183 |
+
" experiments = list(set([tuple(sorted(exp)) for exp in experiments]))\n",
|
184 |
+
" print(len(experiments), experiments)\n",
|
185 |
+
" \n",
|
186 |
+
" parameters = \"np.around(np.arange(0, 1.1,0.1),2), \"\n",
|
187 |
+
" tasks = eval(f\"list(itertools.product({(parameters*n_para_obj)[:-2]}))\")\n",
|
188 |
+
" tasks = [(f'task_{i+1}',)+task for i, task in enumerate(tasks)]\n",
|
189 |
+
" print(len(tasks))\n",
|
190 |
+
" for exp in experiments:\n",
|
191 |
+
" df = pd.DataFrame(data=tasks, columns=[\"task\", *exp])\n",
|
192 |
+
" experiment_path = os.path.join('..','data', 'grid_2obj')\n",
|
193 |
+
" os.makedirs(experiment_path, exist_ok=True)\n",
|
194 |
+
" experiment_path = os.path.join(experiment_path, f\"grid_{len(df.columns)-1}objectives_{abbrev_obj_keys(exp)}.csv\") \n",
|
195 |
+
" df.to_csv(experiment_path, index=False)\n",
|
196 |
+
" print(f\"Saved experiment in {experiment_path}\")\n",
|
197 |
+
" write_generator_experiment(experiment_path, objectives=exp)\n",
|
198 |
+
" #df.to_csv(f\"../data/grid_{}objectives_{abbrev_obj_keys(objectives.tolist())}.csv\" ,index=False)\n",
|
199 |
+
" \n",
|
200 |
+
"exp_test = create_objectives_grid(normalized_feature_names, n_para_obj=2) \n",
|
201 |
+
"print(exp_test)"
|
202 |
+
]
|
203 |
+
},
|
204 |
+
{
|
205 |
+
"cell_type": "markdown",
|
206 |
+
"id": "56ab613b",
|
207 |
+
"metadata": {},
|
208 |
+
"source": [
|
209 |
+
"### Helper prototypes"
|
210 |
+
]
|
211 |
+
},
|
212 |
+
{
|
213 |
+
"cell_type": "code",
|
214 |
+
"execution_count": 4,
|
215 |
+
"id": "dfd1a302",
|
216 |
+
"metadata": {},
|
217 |
+
"outputs": [],
|
218 |
+
"source": [
|
219 |
+
"df = pd.DataFrame(columns=[\"log\",\"ratio_top_20_variants\", \"epa_normalized_sequence_entropy_linear_forgetting\"]) "
|
220 |
+
]
|
221 |
+
},
|
222 |
+
{
|
223 |
+
"cell_type": "code",
|
224 |
+
"execution_count": 5,
|
225 |
+
"id": "218946b7",
|
226 |
+
"metadata": {},
|
227 |
+
"outputs": [],
|
228 |
+
"source": [
|
229 |
+
"k=0\n",
|
230 |
+
"for i in np.arange(0, 1.1,0.2):\n",
|
231 |
+
" for j in np.arange(0,0.55,0.1):\n",
|
232 |
+
" k+=1\n",
|
233 |
+
" new_entry = pd.Series({'log':f\"objective_{k}\", \"ratio_top_20_variants\":round(i,1),\n",
|
234 |
+
" \"epa_normalized_sequence_entropy_linear_forgetting\":round(j,1)})\n",
|
235 |
+
" df = pd.concat([\n",
|
236 |
+
" df, \n",
|
237 |
+
" pd.DataFrame([new_entry], columns=new_entry.index)]\n",
|
238 |
+
" ).reset_index(drop=True)\n",
|
239 |
+
" "
|
240 |
+
]
|
241 |
+
},
|
242 |
+
{
|
243 |
+
"cell_type": "code",
|
244 |
+
"execution_count": 6,
|
245 |
+
"id": "b1e3bb5a",
|
246 |
+
"metadata": {},
|
247 |
+
"outputs": [],
|
248 |
+
"source": [
|
249 |
+
"df.to_csv(\"../data/grid_objectives.csv\" ,index=False)"
|
250 |
+
]
|
251 |
+
},
|
252 |
+
{
|
253 |
+
"cell_type": "markdown",
|
254 |
+
"id": "c12bc19d",
|
255 |
+
"metadata": {},
|
256 |
+
"source": [
|
257 |
+
"## Objectives from real logs\n",
|
258 |
+
"(Feature selection)"
|
259 |
+
]
|
260 |
+
},
|
261 |
+
{
|
262 |
+
"cell_type": "code",
|
263 |
+
"execution_count": 7,
|
264 |
+
"id": "39ac74bb",
|
265 |
+
"metadata": {},
|
266 |
+
"outputs": [
|
267 |
+
{
|
268 |
+
"name": "stdout",
|
269 |
+
"output_type": "stream",
|
270 |
+
"text": [
|
271 |
+
"(26, 8)\n",
|
272 |
+
"26 Event-Logs: ['BPIC12' 'BPIC13cp' 'BPIC13inc' 'BPIC13op' 'BPIC14dc_p' 'BPIC14di_p'\n",
|
273 |
+
" 'BPIC14dia_p' 'BPIC15f1' 'BPIC15f2' 'BPIC15f3' 'BPIC15f4' 'BPIC15f5'\n",
|
274 |
+
" 'BPIC16c_p' 'BPIC16wm_p' 'BPIC17' 'BPIC17ol' 'BPIC19' 'BPIC20a' 'BPIC20b'\n",
|
275 |
+
" 'BPIC20c' 'BPIC20d' 'BPIC20e' 'HD' 'RTFMP' 'RWABOCSL' 'SEPSIS']\n"
|
276 |
+
]
|
277 |
+
},
|
278 |
+
{
|
279 |
+
"data": {
|
280 |
+
"text/html": [
|
281 |
+
"<div>\n",
|
282 |
+
"<style scoped>\n",
|
283 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
284 |
+
" vertical-align: middle;\n",
|
285 |
+
" }\n",
|
286 |
+
"\n",
|
287 |
+
" .dataframe tbody tr th {\n",
|
288 |
+
" vertical-align: top;\n",
|
289 |
+
" }\n",
|
290 |
+
"\n",
|
291 |
+
" .dataframe thead th {\n",
|
292 |
+
" text-align: right;\n",
|
293 |
+
" }\n",
|
294 |
+
"</style>\n",
|
295 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
296 |
+
" <thead>\n",
|
297 |
+
" <tr style=\"text-align: right;\">\n",
|
298 |
+
" <th></th>\n",
|
299 |
+
" <th>log</th>\n",
|
300 |
+
" <th>ratio_variants_per_number_of_traces</th>\n",
|
301 |
+
" <th>ratio_most_common_variant</th>\n",
|
302 |
+
" <th>ratio_top_10_variants</th>\n",
|
303 |
+
" <th>epa_normalized_variant_entropy</th>\n",
|
304 |
+
" <th>epa_normalized_sequence_entropy</th>\n",
|
305 |
+
" <th>epa_normalized_sequence_entropy_linear_forgetting</th>\n",
|
306 |
+
" <th>epa_normalized_sequence_entropy_exponential_forgetting</th>\n",
|
307 |
+
" </tr>\n",
|
308 |
+
" </thead>\n",
|
309 |
+
" <tbody>\n",
|
310 |
+
" <tr>\n",
|
311 |
+
" <th>0</th>\n",
|
312 |
+
" <td>BPIC16wm_p</td>\n",
|
313 |
+
" <td>0.002882</td>\n",
|
314 |
+
" <td>0.295803</td>\n",
|
315 |
+
" <td>0.714106</td>\n",
|
316 |
+
" <td>0.000000</td>\n",
|
317 |
+
" <td>0.000000</td>\n",
|
318 |
+
" <td>0.000000</td>\n",
|
319 |
+
" <td>0.000000</td>\n",
|
320 |
+
" </tr>\n",
|
321 |
+
" <tr>\n",
|
322 |
+
" <th>1</th>\n",
|
323 |
+
" <td>BPIC15f5</td>\n",
|
324 |
+
" <td>0.997405</td>\n",
|
325 |
+
" <td>0.001730</td>\n",
|
326 |
+
" <td>0.102076</td>\n",
|
327 |
+
" <td>0.648702</td>\n",
|
328 |
+
" <td>0.603260</td>\n",
|
329 |
+
" <td>0.342410</td>\n",
|
330 |
+
" <td>0.404580</td>\n",
|
331 |
+
" </tr>\n",
|
332 |
+
" <tr>\n",
|
333 |
+
" <th>2</th>\n",
|
334 |
+
" <td>BPIC15f1</td>\n",
|
335 |
+
" <td>0.975813</td>\n",
|
336 |
+
" <td>0.006672</td>\n",
|
337 |
+
" <td>0.121768</td>\n",
|
338 |
+
" <td>0.652855</td>\n",
|
339 |
+
" <td>0.610294</td>\n",
|
340 |
+
" <td>0.270241</td>\n",
|
341 |
+
" <td>0.363928</td>\n",
|
342 |
+
" </tr>\n",
|
343 |
+
" <tr>\n",
|
344 |
+
" <th>3</th>\n",
|
345 |
+
" <td>BPIC19</td>\n",
|
346 |
+
" <td>0.047562</td>\n",
|
347 |
+
" <td>0.199758</td>\n",
|
348 |
+
" <td>0.946368</td>\n",
|
349 |
+
" <td>0.645530</td>\n",
|
350 |
+
" <td>0.328029</td>\n",
|
351 |
+
" <td>0.320185</td>\n",
|
352 |
+
" <td>0.320282</td>\n",
|
353 |
+
" </tr>\n",
|
354 |
+
" <tr>\n",
|
355 |
+
" <th>4</th>\n",
|
356 |
+
" <td>BPIC14dia_p</td>\n",
|
357 |
+
" <td>0.496847</td>\n",
|
358 |
+
" <td>0.037455</td>\n",
|
359 |
+
" <td>0.552836</td>\n",
|
360 |
+
" <td>0.774743</td>\n",
|
361 |
+
" <td>0.608350</td>\n",
|
362 |
+
" <td>0.305614</td>\n",
|
363 |
+
" <td>0.377416</td>\n",
|
364 |
+
" </tr>\n",
|
365 |
+
" </tbody>\n",
|
366 |
+
"</table>\n",
|
367 |
+
"</div>"
|
368 |
+
],
|
369 |
+
"text/plain": [
|
370 |
+
" log ratio_variants_per_number_of_traces \n",
|
371 |
+
"0 BPIC16wm_p 0.002882 \\\n",
|
372 |
+
"1 BPIC15f5 0.997405 \n",
|
373 |
+
"2 BPIC15f1 0.975813 \n",
|
374 |
+
"3 BPIC19 0.047562 \n",
|
375 |
+
"4 BPIC14dia_p 0.496847 \n",
|
376 |
+
"\n",
|
377 |
+
" ratio_most_common_variant ratio_top_10_variants \n",
|
378 |
+
"0 0.295803 0.714106 \\\n",
|
379 |
+
"1 0.001730 0.102076 \n",
|
380 |
+
"2 0.006672 0.121768 \n",
|
381 |
+
"3 0.199758 0.946368 \n",
|
382 |
+
"4 0.037455 0.552836 \n",
|
383 |
+
"\n",
|
384 |
+
" epa_normalized_variant_entropy epa_normalized_sequence_entropy \n",
|
385 |
+
"0 0.000000 0.000000 \\\n",
|
386 |
+
"1 0.648702 0.603260 \n",
|
387 |
+
"2 0.652855 0.610294 \n",
|
388 |
+
"3 0.645530 0.328029 \n",
|
389 |
+
"4 0.774743 0.608350 \n",
|
390 |
+
"\n",
|
391 |
+
" epa_normalized_sequence_entropy_linear_forgetting \n",
|
392 |
+
"0 0.000000 \\\n",
|
393 |
+
"1 0.342410 \n",
|
394 |
+
"2 0.270241 \n",
|
395 |
+
"3 0.320185 \n",
|
396 |
+
"4 0.305614 \n",
|
397 |
+
"\n",
|
398 |
+
" epa_normalized_sequence_entropy_exponential_forgetting \n",
|
399 |
+
"0 0.000000 \n",
|
400 |
+
"1 0.404580 \n",
|
401 |
+
"2 0.363928 \n",
|
402 |
+
"3 0.320282 \n",
|
403 |
+
"4 0.377416 "
|
404 |
+
]
|
405 |
+
},
|
406 |
+
"execution_count": 7,
|
407 |
+
"metadata": {},
|
408 |
+
"output_type": "execute_result"
|
409 |
+
}
|
410 |
+
],
|
411 |
+
"source": [
|
412 |
+
"bpic_features = pd.read_csv(\"../data/BaselineED_feat.csv\", index_col=None)\n",
|
413 |
+
"#bpic_features = pd.read_csv(\"../gedi/output/features/real_event_logs.csv\", index_col=None)\n",
|
414 |
+
"\n",
|
415 |
+
"#bpic_features = bpic_features.drop(['Unnamed: 0'], axis=1)\n",
|
416 |
+
"print(bpic_features.shape)\n",
|
417 |
+
"print(len(bpic_features), \" Event-Logs: \", bpic_features.sort_values('log')['log'].unique())\n",
|
418 |
+
"\n",
|
419 |
+
"#bpic_features.rename(columns={\"variant_entropy\":\"epa_variant_entropy\", \"normalized_variant_entropy\":\"epa_normalized_variant_entropy\", \"sequence_entropy\":\"epa_sequence_entropy\", \"normalized_sequence_entropy\":\"epa_normalized_sequence_entropy\", \"sequence_entropy_linear_forgetting\":\"epa_sequence_entropy_linear_forgetting\", \"normalized_sequence_entropy_linear_forgetting\":\"epa_normalized_sequence_entropy_linear_forgetting\", \"sequence_entropy_exponential_forgetting\":\"epa_sequence_entropy_exponential_forgetting\", \"normalized_sequence_entropy_exponential_forgetting\":\"epa_normalized_sequence_entropy_exponential_forgetting\"},\n",
|
420 |
+
"# errors=\"raise\", inplace=True)\n",
|
421 |
+
"\n",
|
422 |
+
"bpic_features.head()\n",
|
423 |
+
"#bpic_features.to_csv(\"../data/BaselineED_feat.csv\", index=False)"
|
424 |
+
]
|
425 |
+
},
|
426 |
+
{
|
427 |
+
"cell_type": "code",
|
428 |
+
"execution_count": 8,
|
429 |
+
"id": "ef0df0b9",
|
430 |
+
"metadata": {},
|
431 |
+
"outputs": [
|
432 |
+
{
|
433 |
+
"name": "stdout",
|
434 |
+
"output_type": "stream",
|
435 |
+
"text": [
|
436 |
+
"['ratio_variants_per_number_of_traces', 'ratio_most_common_variant', 'ratio_top_10_variants', 'epa_normalized_variant_entropy', 'epa_normalized_sequence_entropy', 'epa_normalized_sequence_entropy_linear_forgetting', 'epa_normalized_sequence_entropy_exponential_forgetting']\n"
|
437 |
+
]
|
438 |
+
},
|
439 |
+
{
|
440 |
+
"data": {
|
441 |
+
"text/html": [
|
442 |
+
"<div>\n",
|
443 |
+
"<style scoped>\n",
|
444 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
445 |
+
" vertical-align: middle;\n",
|
446 |
+
" }\n",
|
447 |
+
"\n",
|
448 |
+
" .dataframe tbody tr th {\n",
|
449 |
+
" vertical-align: top;\n",
|
450 |
+
" }\n",
|
451 |
+
"\n",
|
452 |
+
" .dataframe thead th {\n",
|
453 |
+
" text-align: right;\n",
|
454 |
+
" }\n",
|
455 |
+
"</style>\n",
|
456 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
457 |
+
" <thead>\n",
|
458 |
+
" <tr style=\"text-align: right;\">\n",
|
459 |
+
" <th></th>\n",
|
460 |
+
" <th>log</th>\n",
|
461 |
+
" <th>ratio_variants_per_number_of_traces</th>\n",
|
462 |
+
" <th>ratio_most_common_variant</th>\n",
|
463 |
+
" <th>ratio_top_10_variants</th>\n",
|
464 |
+
" <th>epa_normalized_variant_entropy</th>\n",
|
465 |
+
" <th>epa_normalized_sequence_entropy</th>\n",
|
466 |
+
" <th>epa_normalized_sequence_entropy_linear_forgetting</th>\n",
|
467 |
+
" <th>epa_normalized_sequence_entropy_exponential_forgetting</th>\n",
|
468 |
+
" </tr>\n",
|
469 |
+
" </thead>\n",
|
470 |
+
" <tbody>\n",
|
471 |
+
" <tr>\n",
|
472 |
+
" <th>0</th>\n",
|
473 |
+
" <td>BPIC16wm_p</td>\n",
|
474 |
+
" <td>0.002882</td>\n",
|
475 |
+
" <td>0.295803</td>\n",
|
476 |
+
" <td>0.714106</td>\n",
|
477 |
+
" <td>0.000000</td>\n",
|
478 |
+
" <td>0.000000</td>\n",
|
479 |
+
" <td>0.000000</td>\n",
|
480 |
+
" <td>0.000000</td>\n",
|
481 |
+
" </tr>\n",
|
482 |
+
" <tr>\n",
|
483 |
+
" <th>1</th>\n",
|
484 |
+
" <td>BPIC15f5</td>\n",
|
485 |
+
" <td>0.997405</td>\n",
|
486 |
+
" <td>0.001730</td>\n",
|
487 |
+
" <td>0.102076</td>\n",
|
488 |
+
" <td>0.648702</td>\n",
|
489 |
+
" <td>0.603260</td>\n",
|
490 |
+
" <td>0.342410</td>\n",
|
491 |
+
" <td>0.404580</td>\n",
|
492 |
+
" </tr>\n",
|
493 |
+
" <tr>\n",
|
494 |
+
" <th>2</th>\n",
|
495 |
+
" <td>BPIC15f1</td>\n",
|
496 |
+
" <td>0.975813</td>\n",
|
497 |
+
" <td>0.006672</td>\n",
|
498 |
+
" <td>0.121768</td>\n",
|
499 |
+
" <td>0.652855</td>\n",
|
500 |
+
" <td>0.610294</td>\n",
|
501 |
+
" <td>0.270241</td>\n",
|
502 |
+
" <td>0.363928</td>\n",
|
503 |
+
" </tr>\n",
|
504 |
+
" <tr>\n",
|
505 |
+
" <th>3</th>\n",
|
506 |
+
" <td>BPIC19</td>\n",
|
507 |
+
" <td>0.047562</td>\n",
|
508 |
+
" <td>0.199758</td>\n",
|
509 |
+
" <td>0.946368</td>\n",
|
510 |
+
" <td>0.645530</td>\n",
|
511 |
+
" <td>0.328029</td>\n",
|
512 |
+
" <td>0.320185</td>\n",
|
513 |
+
" <td>0.320282</td>\n",
|
514 |
+
" </tr>\n",
|
515 |
+
" <tr>\n",
|
516 |
+
" <th>4</th>\n",
|
517 |
+
" <td>BPIC14dia_p</td>\n",
|
518 |
+
" <td>0.496847</td>\n",
|
519 |
+
" <td>0.037455</td>\n",
|
520 |
+
" <td>0.552836</td>\n",
|
521 |
+
" <td>0.774743</td>\n",
|
522 |
+
" <td>0.608350</td>\n",
|
523 |
+
" <td>0.305614</td>\n",
|
524 |
+
" <td>0.377416</td>\n",
|
525 |
+
" </tr>\n",
|
526 |
+
" <tr>\n",
|
527 |
+
" <th>5</th>\n",
|
528 |
+
" <td>BPIC15f2</td>\n",
|
529 |
+
" <td>0.995192</td>\n",
|
530 |
+
" <td>0.002404</td>\n",
|
531 |
+
" <td>0.103365</td>\n",
|
532 |
+
" <td>0.627973</td>\n",
|
533 |
+
" <td>0.602371</td>\n",
|
534 |
+
" <td>0.317217</td>\n",
|
535 |
+
" <td>0.390473</td>\n",
|
536 |
+
" </tr>\n",
|
537 |
+
" <tr>\n",
|
538 |
+
" <th>6</th>\n",
|
539 |
+
" <td>BPIC15f3</td>\n",
|
540 |
+
" <td>0.957417</td>\n",
|
541 |
+
" <td>0.010646</td>\n",
|
542 |
+
" <td>0.137686</td>\n",
|
543 |
+
" <td>0.661781</td>\n",
|
544 |
+
" <td>0.605676</td>\n",
|
545 |
+
" <td>0.341521</td>\n",
|
546 |
+
" <td>0.404934</td>\n",
|
547 |
+
" </tr>\n",
|
548 |
+
" <tr>\n",
|
549 |
+
" <th>7</th>\n",
|
550 |
+
" <td>BPIC13cp</td>\n",
|
551 |
+
" <td>0.123067</td>\n",
|
552 |
+
" <td>0.331540</td>\n",
|
553 |
+
" <td>0.840619</td>\n",
|
554 |
+
" <td>0.705383</td>\n",
|
555 |
+
" <td>0.310940</td>\n",
|
556 |
+
" <td>0.286515</td>\n",
|
557 |
+
" <td>0.288383</td>\n",
|
558 |
+
" </tr>\n",
|
559 |
+
" <tr>\n",
|
560 |
+
" <th>8</th>\n",
|
561 |
+
" <td>BPIC14dc_p</td>\n",
|
562 |
+
" <td>0.048444</td>\n",
|
563 |
+
" <td>0.074944</td>\n",
|
564 |
+
" <td>0.765056</td>\n",
|
565 |
+
" <td>0.470758</td>\n",
|
566 |
+
" <td>0.419266</td>\n",
|
567 |
+
" <td>0.312599</td>\n",
|
568 |
+
" <td>0.326719</td>\n",
|
569 |
+
" </tr>\n",
|
570 |
+
" <tr>\n",
|
571 |
+
" <th>9</th>\n",
|
572 |
+
" <td>BPIC20a</td>\n",
|
573 |
+
" <td>0.009429</td>\n",
|
574 |
+
" <td>0.439810</td>\n",
|
575 |
+
" <td>0.950095</td>\n",
|
576 |
+
" <td>0.696474</td>\n",
|
577 |
+
" <td>0.164758</td>\n",
|
578 |
+
" <td>0.085439</td>\n",
|
579 |
+
" <td>0.104389</td>\n",
|
580 |
+
" </tr>\n",
|
581 |
+
" <tr>\n",
|
582 |
+
" <th>10</th>\n",
|
583 |
+
" <td>BPIC14di_p</td>\n",
|
584 |
+
" <td>0.000041</td>\n",
|
585 |
+
" <td>0.787081</td>\n",
|
586 |
+
" <td>0.000000</td>\n",
|
587 |
+
" <td>1.000000</td>\n",
|
588 |
+
" <td>0.044018</td>\n",
|
589 |
+
" <td>0.033322</td>\n",
|
590 |
+
" <td>0.034685</td>\n",
|
591 |
+
" </tr>\n",
|
592 |
+
" <tr>\n",
|
593 |
+
" <th>11</th>\n",
|
594 |
+
" <td>BPIC17ol</td>\n",
|
595 |
+
" <td>0.000372</td>\n",
|
596 |
+
" <td>0.380626</td>\n",
|
597 |
+
" <td>0.380626</td>\n",
|
598 |
+
" <td>0.813479</td>\n",
|
599 |
+
" <td>0.105130</td>\n",
|
600 |
+
" <td>0.052672</td>\n",
|
601 |
+
" <td>0.066000</td>\n",
|
602 |
+
" </tr>\n",
|
603 |
+
" <tr>\n",
|
604 |
+
" <th>12</th>\n",
|
605 |
+
" <td>BPIC13op</td>\n",
|
606 |
+
" <td>0.131868</td>\n",
|
607 |
+
" <td>0.217338</td>\n",
|
608 |
+
" <td>0.769231</td>\n",
|
609 |
+
" <td>0.702960</td>\n",
|
610 |
+
" <td>0.276771</td>\n",
|
611 |
+
" <td>0.262094</td>\n",
|
612 |
+
" <td>0.263029</td>\n",
|
613 |
+
" </tr>\n",
|
614 |
+
" <tr>\n",
|
615 |
+
" <th>13</th>\n",
|
616 |
+
" <td>RTFMP</td>\n",
|
617 |
+
" <td>0.001536</td>\n",
|
618 |
+
" <td>0.375620</td>\n",
|
619 |
+
" <td>0.993104</td>\n",
|
620 |
+
" <td>0.769353</td>\n",
|
621 |
+
" <td>0.111932</td>\n",
|
622 |
+
" <td>0.052586</td>\n",
|
623 |
+
" <td>0.068442</td>\n",
|
624 |
+
" </tr>\n",
|
625 |
+
" <tr>\n",
|
626 |
+
" <th>14</th>\n",
|
627 |
+
" <td>BPIC20d</td>\n",
|
628 |
+
" <td>0.096236</td>\n",
|
629 |
+
" <td>0.271081</td>\n",
|
630 |
+
" <td>0.822773</td>\n",
|
631 |
+
" <td>0.723785</td>\n",
|
632 |
+
" <td>0.317044</td>\n",
|
633 |
+
" <td>0.184879</td>\n",
|
634 |
+
" <td>0.214387</td>\n",
|
635 |
+
" </tr>\n",
|
636 |
+
" <tr>\n",
|
637 |
+
" <th>15</th>\n",
|
638 |
+
" <td>BPIC12</td>\n",
|
639 |
+
" <td>0.333614</td>\n",
|
640 |
+
" <td>0.262016</td>\n",
|
641 |
+
" <td>0.686254</td>\n",
|
642 |
+
" <td>0.708280</td>\n",
|
643 |
+
" <td>0.423074</td>\n",
|
644 |
+
" <td>0.226133</td>\n",
|
645 |
+
" <td>0.275551</td>\n",
|
646 |
+
" </tr>\n",
|
647 |
+
" <tr>\n",
|
648 |
+
" <th>16</th>\n",
|
649 |
+
" <td>RWABOCSL</td>\n",
|
650 |
+
" <td>0.080893</td>\n",
|
651 |
+
" <td>0.497211</td>\n",
|
652 |
+
" <td>0.887029</td>\n",
|
653 |
+
" <td>0.689363</td>\n",
|
654 |
+
" <td>0.235532</td>\n",
|
655 |
+
" <td>0.100603</td>\n",
|
656 |
+
" <td>0.138113</td>\n",
|
657 |
+
" </tr>\n",
|
658 |
+
" <tr>\n",
|
659 |
+
" <th>17</th>\n",
|
660 |
+
" <td>BPIC20e</td>\n",
|
661 |
+
" <td>0.012925</td>\n",
|
662 |
+
" <td>0.437264</td>\n",
|
663 |
+
" <td>0.933488</td>\n",
|
664 |
+
" <td>0.703735</td>\n",
|
665 |
+
" <td>0.189048</td>\n",
|
666 |
+
" <td>0.097572</td>\n",
|
667 |
+
" <td>0.118744</td>\n",
|
668 |
+
" </tr>\n",
|
669 |
+
" <tr>\n",
|
670 |
+
" <th>18</th>\n",
|
671 |
+
" <td>BPIC16c_p</td>\n",
|
672 |
+
" <td>0.438053</td>\n",
|
673 |
+
" <td>0.101770</td>\n",
|
674 |
+
" <td>0.424779</td>\n",
|
675 |
+
" <td>0.899497</td>\n",
|
676 |
+
" <td>0.683796</td>\n",
|
677 |
+
" <td>0.404685</td>\n",
|
678 |
+
" <td>0.470116</td>\n",
|
679 |
+
" </tr>\n",
|
680 |
+
" <tr>\n",
|
681 |
+
" <th>19</th>\n",
|
682 |
+
" <td>BPIC13inc</td>\n",
|
683 |
+
" <td>0.200026</td>\n",
|
684 |
+
" <td>0.232195</td>\n",
|
685 |
+
" <td>0.794414</td>\n",
|
686 |
+
" <td>0.717846</td>\n",
|
687 |
+
" <td>0.404651</td>\n",
|
688 |
+
" <td>0.391097</td>\n",
|
689 |
+
" <td>0.391625</td>\n",
|
690 |
+
" </tr>\n",
|
691 |
+
" <tr>\n",
|
692 |
+
" <th>20</th>\n",
|
693 |
+
" <td>BPIC15f4</td>\n",
|
694 |
+
" <td>0.996201</td>\n",
|
695 |
+
" <td>0.002849</td>\n",
|
696 |
+
" <td>0.102564</td>\n",
|
697 |
+
" <td>0.652985</td>\n",
|
698 |
+
" <td>0.603866</td>\n",
|
699 |
+
" <td>0.355927</td>\n",
|
700 |
+
" <td>0.412835</td>\n",
|
701 |
+
" </tr>\n",
|
702 |
+
" <tr>\n",
|
703 |
+
" <th>21</th>\n",
|
704 |
+
" <td>BPIC17</td>\n",
|
705 |
+
" <td>0.505570</td>\n",
|
706 |
+
" <td>0.033514</td>\n",
|
707 |
+
" <td>0.531340</td>\n",
|
708 |
+
" <td>0.741706</td>\n",
|
709 |
+
" <td>0.461565</td>\n",
|
710 |
+
" <td>0.231922</td>\n",
|
711 |
+
" <td>0.290464</td>\n",
|
712 |
+
" </tr>\n",
|
713 |
+
" <tr>\n",
|
714 |
+
" <th>22</th>\n",
|
715 |
+
" <td>BPIC20c</td>\n",
|
716 |
+
" <td>0.209200</td>\n",
|
717 |
+
" <td>0.135315</td>\n",
|
718 |
+
" <td>0.757537</td>\n",
|
719 |
+
" <td>0.733653</td>\n",
|
720 |
+
" <td>0.420150</td>\n",
|
721 |
+
" <td>0.137287</td>\n",
|
722 |
+
" <td>0.215490</td>\n",
|
723 |
+
" </tr>\n",
|
724 |
+
" <tr>\n",
|
725 |
+
" <th>23</th>\n",
|
726 |
+
" <td>BPIC20b</td>\n",
|
727 |
+
" <td>0.116762</td>\n",
|
728 |
+
" <td>0.212281</td>\n",
|
729 |
+
" <td>0.811289</td>\n",
|
730 |
+
" <td>0.758268</td>\n",
|
731 |
+
" <td>0.339380</td>\n",
|
732 |
+
" <td>0.145611</td>\n",
|
733 |
+
" <td>0.193753</td>\n",
|
734 |
+
" </tr>\n",
|
735 |
+
" <tr>\n",
|
736 |
+
" <th>24</th>\n",
|
737 |
+
" <td>HD</td>\n",
|
738 |
+
" <td>0.049345</td>\n",
|
739 |
+
" <td>0.516594</td>\n",
|
740 |
+
" <td>0.906332</td>\n",
|
741 |
+
" <td>0.799120</td>\n",
|
742 |
+
" <td>0.254066</td>\n",
|
743 |
+
" <td>0.118478</td>\n",
|
744 |
+
" <td>0.154576</td>\n",
|
745 |
+
" </tr>\n",
|
746 |
+
" <tr>\n",
|
747 |
+
" <th>25</th>\n",
|
748 |
+
" <td>SEPSIS</td>\n",
|
749 |
+
" <td>0.805714</td>\n",
|
750 |
+
" <td>0.033333</td>\n",
|
751 |
+
" <td>0.274286</td>\n",
|
752 |
+
" <td>0.695759</td>\n",
|
753 |
+
" <td>0.522343</td>\n",
|
754 |
+
" <td>0.219365</td>\n",
|
755 |
+
" <td>0.299505</td>\n",
|
756 |
+
" </tr>\n",
|
757 |
+
" </tbody>\n",
|
758 |
+
"</table>\n",
|
759 |
+
"</div>"
|
760 |
+
],
|
761 |
+
"text/plain": [
|
762 |
+
" log ratio_variants_per_number_of_traces \n",
|
763 |
+
"0 BPIC16wm_p 0.002882 \\\n",
|
764 |
+
"1 BPIC15f5 0.997405 \n",
|
765 |
+
"2 BPIC15f1 0.975813 \n",
|
766 |
+
"3 BPIC19 0.047562 \n",
|
767 |
+
"4 BPIC14dia_p 0.496847 \n",
|
768 |
+
"5 BPIC15f2 0.995192 \n",
|
769 |
+
"6 BPIC15f3 0.957417 \n",
|
770 |
+
"7 BPIC13cp 0.123067 \n",
|
771 |
+
"8 BPIC14dc_p 0.048444 \n",
|
772 |
+
"9 BPIC20a 0.009429 \n",
|
773 |
+
"10 BPIC14di_p 0.000041 \n",
|
774 |
+
"11 BPIC17ol 0.000372 \n",
|
775 |
+
"12 BPIC13op 0.131868 \n",
|
776 |
+
"13 RTFMP 0.001536 \n",
|
777 |
+
"14 BPIC20d 0.096236 \n",
|
778 |
+
"15 BPIC12 0.333614 \n",
|
779 |
+
"16 RWABOCSL 0.080893 \n",
|
780 |
+
"17 BPIC20e 0.012925 \n",
|
781 |
+
"18 BPIC16c_p 0.438053 \n",
|
782 |
+
"19 BPIC13inc 0.200026 \n",
|
783 |
+
"20 BPIC15f4 0.996201 \n",
|
784 |
+
"21 BPIC17 0.505570 \n",
|
785 |
+
"22 BPIC20c 0.209200 \n",
|
786 |
+
"23 BPIC20b 0.116762 \n",
|
787 |
+
"24 HD 0.049345 \n",
|
788 |
+
"25 SEPSIS 0.805714 \n",
|
789 |
+
"\n",
|
790 |
+
" ratio_most_common_variant ratio_top_10_variants \n",
|
791 |
+
"0 0.295803 0.714106 \\\n",
|
792 |
+
"1 0.001730 0.102076 \n",
|
793 |
+
"2 0.006672 0.121768 \n",
|
794 |
+
"3 0.199758 0.946368 \n",
|
795 |
+
"4 0.037455 0.552836 \n",
|
796 |
+
"5 0.002404 0.103365 \n",
|
797 |
+
"6 0.010646 0.137686 \n",
|
798 |
+
"7 0.331540 0.840619 \n",
|
799 |
+
"8 0.074944 0.765056 \n",
|
800 |
+
"9 0.439810 0.950095 \n",
|
801 |
+
"10 0.787081 0.000000 \n",
|
802 |
+
"11 0.380626 0.380626 \n",
|
803 |
+
"12 0.217338 0.769231 \n",
|
804 |
+
"13 0.375620 0.993104 \n",
|
805 |
+
"14 0.271081 0.822773 \n",
|
806 |
+
"15 0.262016 0.686254 \n",
|
807 |
+
"16 0.497211 0.887029 \n",
|
808 |
+
"17 0.437264 0.933488 \n",
|
809 |
+
"18 0.101770 0.424779 \n",
|
810 |
+
"19 0.232195 0.794414 \n",
|
811 |
+
"20 0.002849 0.102564 \n",
|
812 |
+
"21 0.033514 0.531340 \n",
|
813 |
+
"22 0.135315 0.757537 \n",
|
814 |
+
"23 0.212281 0.811289 \n",
|
815 |
+
"24 0.516594 0.906332 \n",
|
816 |
+
"25 0.033333 0.274286 \n",
|
817 |
+
"\n",
|
818 |
+
" epa_normalized_variant_entropy epa_normalized_sequence_entropy \n",
|
819 |
+
"0 0.000000 0.000000 \\\n",
|
820 |
+
"1 0.648702 0.603260 \n",
|
821 |
+
"2 0.652855 0.610294 \n",
|
822 |
+
"3 0.645530 0.328029 \n",
|
823 |
+
"4 0.774743 0.608350 \n",
|
824 |
+
"5 0.627973 0.602371 \n",
|
825 |
+
"6 0.661781 0.605676 \n",
|
826 |
+
"7 0.705383 0.310940 \n",
|
827 |
+
"8 0.470758 0.419266 \n",
|
828 |
+
"9 0.696474 0.164758 \n",
|
829 |
+
"10 1.000000 0.044018 \n",
|
830 |
+
"11 0.813479 0.105130 \n",
|
831 |
+
"12 0.702960 0.276771 \n",
|
832 |
+
"13 0.769353 0.111932 \n",
|
833 |
+
"14 0.723785 0.317044 \n",
|
834 |
+
"15 0.708280 0.423074 \n",
|
835 |
+
"16 0.689363 0.235532 \n",
|
836 |
+
"17 0.703735 0.189048 \n",
|
837 |
+
"18 0.899497 0.683796 \n",
|
838 |
+
"19 0.717846 0.404651 \n",
|
839 |
+
"20 0.652985 0.603866 \n",
|
840 |
+
"21 0.741706 0.461565 \n",
|
841 |
+
"22 0.733653 0.420150 \n",
|
842 |
+
"23 0.758268 0.339380 \n",
|
843 |
+
"24 0.799120 0.254066 \n",
|
844 |
+
"25 0.695759 0.522343 \n",
|
845 |
+
"\n",
|
846 |
+
" epa_normalized_sequence_entropy_linear_forgetting \n",
|
847 |
+
"0 0.000000 \\\n",
|
848 |
+
"1 0.342410 \n",
|
849 |
+
"2 0.270241 \n",
|
850 |
+
"3 0.320185 \n",
|
851 |
+
"4 0.305614 \n",
|
852 |
+
"5 0.317217 \n",
|
853 |
+
"6 0.341521 \n",
|
854 |
+
"7 0.286515 \n",
|
855 |
+
"8 0.312599 \n",
|
856 |
+
"9 0.085439 \n",
|
857 |
+
"10 0.033322 \n",
|
858 |
+
"11 0.052672 \n",
|
859 |
+
"12 0.262094 \n",
|
860 |
+
"13 0.052586 \n",
|
861 |
+
"14 0.184879 \n",
|
862 |
+
"15 0.226133 \n",
|
863 |
+
"16 0.100603 \n",
|
864 |
+
"17 0.097572 \n",
|
865 |
+
"18 0.404685 \n",
|
866 |
+
"19 0.391097 \n",
|
867 |
+
"20 0.355927 \n",
|
868 |
+
"21 0.231922 \n",
|
869 |
+
"22 0.137287 \n",
|
870 |
+
"23 0.145611 \n",
|
871 |
+
"24 0.118478 \n",
|
872 |
+
"25 0.219365 \n",
|
873 |
+
"\n",
|
874 |
+
" epa_normalized_sequence_entropy_exponential_forgetting \n",
|
875 |
+
"0 0.000000 \n",
|
876 |
+
"1 0.404580 \n",
|
877 |
+
"2 0.363928 \n",
|
878 |
+
"3 0.320282 \n",
|
879 |
+
"4 0.377416 \n",
|
880 |
+
"5 0.390473 \n",
|
881 |
+
"6 0.404934 \n",
|
882 |
+
"7 0.288383 \n",
|
883 |
+
"8 0.326719 \n",
|
884 |
+
"9 0.104389 \n",
|
885 |
+
"10 0.034685 \n",
|
886 |
+
"11 0.066000 \n",
|
887 |
+
"12 0.263029 \n",
|
888 |
+
"13 0.068442 \n",
|
889 |
+
"14 0.214387 \n",
|
890 |
+
"15 0.275551 \n",
|
891 |
+
"16 0.138113 \n",
|
892 |
+
"17 0.118744 \n",
|
893 |
+
"18 0.470116 \n",
|
894 |
+
"19 0.391625 \n",
|
895 |
+
"20 0.412835 \n",
|
896 |
+
"21 0.290464 \n",
|
897 |
+
"22 0.215490 \n",
|
898 |
+
"23 0.193753 \n",
|
899 |
+
"24 0.154576 \n",
|
900 |
+
"25 0.299505 "
|
901 |
+
]
|
902 |
+
},
|
903 |
+
"execution_count": 8,
|
904 |
+
"metadata": {},
|
905 |
+
"output_type": "execute_result"
|
906 |
+
}
|
907 |
+
],
|
908 |
+
"source": [
|
909 |
+
"bpic_stats = bpic_features.describe().transpose()\n",
|
910 |
+
"normalized_feature_names = bpic_stats[(bpic_stats['min']>=0)&(bpic_stats['max']<=1)].index.to_list() \n",
|
911 |
+
"normalized_feature_names = ['ratio_variants_per_number_of_traces', 'ratio_most_common_variant', \n",
|
912 |
+
" 'ratio_top_10_variants', 'epa_normalized_variant_entropy', 'epa_normalized_sequence_entropy', \n",
|
913 |
+
" 'epa_normalized_sequence_entropy_linear_forgetting', 'epa_normalized_sequence_entropy_exponential_forgetting']\n",
|
914 |
+
"print(normalized_feature_names)\n",
|
915 |
+
"bpic_features[['log']+normalized_feature_names]"
|
916 |
+
]
|
917 |
+
},
|
918 |
+
{
|
919 |
+
"cell_type": "code",
|
920 |
+
"execution_count": 9,
|
921 |
+
"id": "44909860",
|
922 |
+
"metadata": {},
|
923 |
+
"outputs": [
|
924 |
+
{
|
925 |
+
"name": "stdout",
|
926 |
+
"output_type": "stream",
|
927 |
+
"text": [
|
928 |
+
"21\n",
|
929 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enself_rt10v.json\n",
|
930 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enseef_enve.json\n",
|
931 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enve_rvpnot.json\n",
|
932 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enself_rmcv.json\n",
|
933 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_rvpnot.json\n",
|
934 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enseef_rt10v.json\n",
|
935 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enseef_enself.json\n",
|
936 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_enve.json\n",
|
937 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enseef_rmcv.json\n",
|
938 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_rt10v_rvpnot.json\n",
|
939 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_rt10v.json\n",
|
940 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enve_rt10v.json\n",
|
941 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_enself.json\n",
|
942 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_rmcv_rvpnot.json\n",
|
943 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enve_rmcv.json\n",
|
944 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_rmcv.json\n",
|
945 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enself_rvpnot.json\n",
|
946 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_ense_enseef.json\n",
|
947 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enself_enve.json\n",
|
948 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_enseef_rvpnot.json\n",
|
949 |
+
"Saved experiment config in ../config_files/algorithm/BaselineED_feat/generator_2_rmcv_rt10v.json\n",
|
950 |
+
"None\n"
|
951 |
+
]
|
952 |
+
}
|
953 |
+
],
|
954 |
+
"source": [
|
955 |
+
"#Features between 0 and 1: \n",
|
956 |
+
"def write_generator_bpic_experiment(objectives, n_para_obj=2):\n",
|
957 |
+
" parameters_o = \"objectives, \"\n",
|
958 |
+
" experiments = eval(f\"[exp for exp in list(itertools.product({(parameters_o*n_para_obj)[:-2]})) if exp[0]!=exp[1]]\")\n",
|
959 |
+
" experiments = list(set([tuple(sorted(exp)) for exp in experiments]))\n",
|
960 |
+
" for exp in experiments:\n",
|
961 |
+
" experiment_path = os.path.join('..','data', 'BaselineED_feat')\n",
|
962 |
+
" os.makedirs(experiment_path, exist_ok=True)\n",
|
963 |
+
" experiment_path = os.path.join(experiment_path, f\"{len(exp)}_{abbrev_obj_keys(exp)}.csv\") \n",
|
964 |
+
"\n",
|
965 |
+
"\n",
|
966 |
+
" first_dir = os.path.split(experiment_path[3:])[-1].replace(\".csv\",\"\")\n",
|
967 |
+
" second_dir = first_dir.replace(\"grid_\",\"\").replace(\"objectives\",\"\")\n",
|
968 |
+
"\n",
|
969 |
+
" experiment = [\n",
|
970 |
+
" {\n",
|
971 |
+
" 'pipeline_step': 'event_logs_generation',\n",
|
972 |
+
" 'output_path':'output/generated',\n",
|
973 |
+
" 'generator_params': {\n",
|
974 |
+
" \"experiment\": {\"input_path\": \"data/BaselineED_feat.csv\",\n",
|
975 |
+
" \"objectives\": exp},\n",
|
976 |
+
" 'config_space': {\n",
|
977 |
+
" 'mode': [5, 20],\n",
|
978 |
+
" 'sequence': [0.01, 1],\n",
|
979 |
+
" 'choice': [0.01, 1],\n",
|
980 |
+
" 'parallel': [0.01, 1],\n",
|
981 |
+
" 'loop': [0.01, 1],\n",
|
982 |
+
" 'silent': [0.01, 1],\n",
|
983 |
+
" 'lt_dependency': [0.01, 1],\n",
|
984 |
+
" 'num_traces': [10, 10001],\n",
|
985 |
+
" 'duplicate': [0],\n",
|
986 |
+
" 'or': [0]\n",
|
987 |
+
" },\n",
|
988 |
+
" 'n_trials': 200\n",
|
989 |
+
" }\n",
|
990 |
+
" },\n",
|
991 |
+
" {\n",
|
992 |
+
" 'pipeline_step': 'feature_extraction',\n",
|
993 |
+
" 'input_path': os.path.join('output', 'features', 'generated', 'BaselineED_feat', first_dir),\n",
|
994 |
+
" 'input_path': os.path.join('output', 'generated', 'BaselineED_feat', first_dir),\n",
|
995 |
+
" 'feature_params': {'feature_set':['simple_stats', 'trace_length', 'trace_variant', 'activities', 'start_activities', 'end_activities', 'eventropies', 'epa_based']},\n",
|
996 |
+
" 'feature_params': {\"feature_set\":[\"ratio_variants_per_number_of_traces\",\"ratio_most_common_variant\",\"ratio_top_10_variants\",\"epa_normalized_variant_entropy\",\"epa_normalized_sequence_entropy\",\"epa_normalized_sequence_entropy_linear_forgetting\",\"epa_normalized_sequence_entropy_exponential_forgetting\"]},\n",
|
997 |
+
" 'output_path': 'output/plots',\n",
|
998 |
+
" 'real_eventlog_path': 'data/BaselineED_feat.csv',\n",
|
999 |
+
" 'plot_type': 'boxplot'\n",
|
1000 |
+
" },\n",
|
1001 |
+
" {\n",
|
1002 |
+
" \"pipeline_step\": \"benchmark_test\",\n",
|
1003 |
+
" \"benchmark_test\": \"discovery\",\n",
|
1004 |
+
" \"input_path\": os.path.join('output', 'generated', 'BaselineED_feat', first_dir),\n",
|
1005 |
+
" \"output_path\":\"output\",\n",
|
1006 |
+
" \"miners\" : [\"heu\", \"imf\", \"ilp\"]\n",
|
1007 |
+
" }\n",
|
1008 |
+
" ]\n",
|
1009 |
+
"\n",
|
1010 |
+
" output_path = os.path.join('..', 'config_files','algorithm','BaselineED_feat')\n",
|
1011 |
+
" os.makedirs(output_path, exist_ok=True)\n",
|
1012 |
+
" output_path = os.path.join(output_path, f'generator_{os.path.split(experiment_path)[-1].split(\".\")[0]}.json') \n",
|
1013 |
+
"\n",
|
1014 |
+
" with open(output_path, 'w') as f:\n",
|
1015 |
+
" json.dump(experiment, f, ensure_ascii=False)\n",
|
1016 |
+
" print(f\"Saved experiment config in {output_path}\")\n",
|
1017 |
+
" return experiment\n",
|
1018 |
+
"\n",
|
1019 |
+
"\n",
|
1020 |
+
"def create_objectives_grid(objectives, n_para_obj=2):\n",
|
1021 |
+
" parameters_o = \"objectives, \"\n",
|
1022 |
+
" experiments = eval(f\"[exp for exp in list(itertools.product({(parameters_o*n_para_obj)[:-2]})) if exp[0]!=exp[1]]\")\n",
|
1023 |
+
" experiments = list(set([tuple(sorted(exp)) for exp in experiments]))\n",
|
1024 |
+
" print(len(experiments))\n",
|
1025 |
+
" \n",
|
1026 |
+
" for exp in experiments:\n",
|
1027 |
+
" write_generator_bpic_experiment(objectives=exp)\n",
|
1028 |
+
" \n",
|
1029 |
+
"exp_test = create_objectives_grid(normalized_feature_names, n_para_obj=2) \n",
|
1030 |
+
"print(exp_test)"
|
1031 |
+
]
|
1032 |
+
},
|
1033 |
+
{
|
1034 |
+
"cell_type": "markdown",
|
1035 |
+
"id": "b07e9753",
|
1036 |
+
"metadata": {},
|
1037 |
+
"source": [
|
1038 |
+
"## Single objective from real logs\n",
|
1039 |
+
"(Feature selection)"
|
1040 |
+
]
|
1041 |
+
},
|
1042 |
+
{
|
1043 |
+
"cell_type": "code",
|
1044 |
+
"execution_count": 10,
|
1045 |
+
"id": "d759a677",
|
1046 |
+
"metadata": {},
|
1047 |
+
"outputs": [
|
1048 |
+
{
|
1049 |
+
"name": "stdout",
|
1050 |
+
"output_type": "stream",
|
1051 |
+
"text": [
|
1052 |
+
"7 experiments: [('epa_normalized_sequence_entropy_exponential_forgetting',), ('ratio_variants_per_number_of_traces',), ('ratio_most_common_variant',), ('epa_normalized_sequence_entropy',), ('ratio_top_10_variants',), ('epa_normalized_sequence_entropy_linear_forgetting',), ('epa_normalized_variant_entropy',)]\n",
|
1053 |
+
"11\n",
|
1054 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_enseef.csv\n",
|
1055 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_enseef.json\n",
|
1056 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_rvpnot.csv\n",
|
1057 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_rvpnot.json\n",
|
1058 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_rmcv.csv\n",
|
1059 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_rmcv.json\n",
|
1060 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_ense.csv\n",
|
1061 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_ense.json\n",
|
1062 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_rt10v.csv\n",
|
1063 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_rt10v.json\n",
|
1064 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_enself.csv\n",
|
1065 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_enself.json\n",
|
1066 |
+
"Saved experiment in ../data/grid_experiments/grid_1objectives_enve.csv\n",
|
1067 |
+
"Saved experiment config in ../config_files/algorithm/grid_experiments/generator_grid_1objectives_enve.json\n",
|
1068 |
+
"None\n"
|
1069 |
+
]
|
1070 |
+
}
|
1071 |
+
],
|
1072 |
+
"source": [
|
1073 |
+
"def write_single_objective_experiment(experiment_path, objectives=[\"ratio_top_20_variants\", \"epa_normalized_sequence_entropy_linear_forgetting\"]):\n",
|
1074 |
+
" first_dir = os.path.split(experiment_path[3:])[-1].replace(\".csv\",\"\")\n",
|
1075 |
+
" second_dir = first_dir.replace(\"grid_\",\"\").replace(\"objectives\",\"\")\n",
|
1076 |
+
"\n",
|
1077 |
+
" experiment = [\n",
|
1078 |
+
" {\n",
|
1079 |
+
" 'pipeline_step': 'event_logs_generation',\n",
|
1080 |
+
" 'output_path':os.path.join('output','generated', 'grid_1obj'),\n",
|
1081 |
+
" 'generator_params': {\n",
|
1082 |
+
" \"experiment\": {\"input_path\": experiment_path[3:],\n",
|
1083 |
+
" \"objectives\": objectives},\n",
|
1084 |
+
" 'config_space': {\n",
|
1085 |
+
" 'mode': [5, 20],\n",
|
1086 |
+
" 'sequence': [0.01, 1],\n",
|
1087 |
+
" 'choice': [0.01, 1],\n",
|
1088 |
+
" 'parallel': [0.01, 1],\n",
|
1089 |
+
" 'loop': [0.01, 1],\n",
|
1090 |
+
" 'silent': [0.01, 1],\n",
|
1091 |
+
" 'lt_dependency': [0.01, 1],\n",
|
1092 |
+
" 'num_traces': [10, 10001],\n",
|
1093 |
+
" 'duplicate': [0],\n",
|
1094 |
+
" 'or': [0]\n",
|
1095 |
+
" },\n",
|
1096 |
+
" 'n_trials': 200\n",
|
1097 |
+
" }\n",
|
1098 |
+
" },\n",
|
1099 |
+
" {\n",
|
1100 |
+
" 'pipeline_step': 'feature_extraction',\n",
|
1101 |
+
" 'input_path': os.path.join('output','features', 'generated', 'grid_1obj', first_dir, second_dir),\n",
|
1102 |
+
" 'feature_params': {'feature_set':['simple_stats', 'trace_length', 'trace_variant', 'activities', 'start_activities', 'end_activities', 'eventropies', 'epa_based']},\n",
|
1103 |
+
" 'feature_params': {\"feature_set\":[\"ratio_variants_per_number_of_traces\",\"ratio_most_common_variant\",\"ratio_top_10_variants\",\"epa_normalized_variant_entropy\",\"epa_normalized_sequence_entropy\",\"epa_normalized_sequence_entropy_linear_forgetting\",\"epa_normalized_sequence_entropy_exponential_forgetting\"]},\n",
|
1104 |
+
" 'output_path': 'output/plots',\n",
|
1105 |
+
" 'real_eventlog_path': 'data/BaselineED_feat.csv',\n",
|
1106 |
+
" 'plot_type': 'boxplot'\n",
|
1107 |
+
" },\n",
|
1108 |
+
" {\n",
|
1109 |
+
" \"pipeline_step\": \"benchmark_test\",\n",
|
1110 |
+
" \"benchmark_test\": \"discovery\",\n",
|
1111 |
+
" \"input_path\": os.path.join('output', 'generated', 'grid_1obj', first_dir, second_dir),\n",
|
1112 |
+
" \"output_path\":\"output\",\n",
|
1113 |
+
" \"miners\" : [\"heu\", \"imf\", \"ilp\"]\n",
|
1114 |
+
" }\n",
|
1115 |
+
" ]\n",
|
1116 |
+
"\n",
|
1117 |
+
" #print(\"EXPERIMENT:\", experiment)\n",
|
1118 |
+
" output_path = os.path.join('..', 'config_files','algorithm','grid_experiments')\n",
|
1119 |
+
" os.makedirs(output_path, exist_ok=True)\n",
|
1120 |
+
" output_path = os.path.join(output_path, f'generator_{os.path.split(experiment_path)[-1].split(\".\")[0]}.json') \n",
|
1121 |
+
" with open(output_path, 'w') as f:\n",
|
1122 |
+
" json.dump(experiment, f, ensure_ascii=False)\n",
|
1123 |
+
" print(f\"Saved experiment config in {output_path}\")\n",
|
1124 |
+
" \n",
|
1125 |
+
" return experiment\n",
|
1126 |
+
"\n",
|
1127 |
+
"def create_objectives_grid(objectives, n_para_obj=2):\n",
|
1128 |
+
" parameters_o = \"objectives, \"\n",
|
1129 |
+
" if n_para_obj==1:\n",
|
1130 |
+
" experiments = [[exp] for exp in objectives]\n",
|
1131 |
+
" else:\n",
|
1132 |
+
" experiments = eval(f\"[exp for exp in list(itertools.product({(parameters_o*n_para_obj)[:-2]})) if exp[0]!=exp[1]]\")\n",
|
1133 |
+
" experiments = list(set([tuple(sorted(exp)) for exp in experiments]))\n",
|
1134 |
+
" print(len(experiments), \"experiments: \", experiments)\n",
|
1135 |
+
" \n",
|
1136 |
+
" parameters = \"np.around(np.arange(0, 1.1,0.1),2), \"\n",
|
1137 |
+
" tasks = eval(f\"list(itertools.product({(parameters*n_para_obj)[:-2]}))\")\n",
|
1138 |
+
" tasks = [(f'task_{i+1}',)+task for i, task in enumerate(tasks)]\n",
|
1139 |
+
" print(len(tasks))\n",
|
1140 |
+
" for exp in experiments:\n",
|
1141 |
+
" df = pd.DataFrame(data=tasks, columns=[\"task\", *exp])\n",
|
1142 |
+
" experiment_path = os.path.join('..','data', 'grid_experiments')\n",
|
1143 |
+
" os.makedirs(experiment_path, exist_ok=True)\n",
|
1144 |
+
" experiment_path = os.path.join(experiment_path, f\"grid_{len(df.columns)-1}objectives_{abbrev_obj_keys(exp)}.csv\") \n",
|
1145 |
+
" df.to_csv(experiment_path, index=False)\n",
|
1146 |
+
" print(f\"Saved experiment in {experiment_path}\")\n",
|
1147 |
+
" write_single_objective_experiment(experiment_path, objectives=exp)\n",
|
1148 |
+
" #df.to_csv(f\"../data/grid_{}objectives_{abbrev_obj_keys(objectives.tolist())}.csv\" ,index=False)\n",
|
1149 |
+
" \n",
|
1150 |
+
"exp_test = create_objectives_grid(normalized_feature_names, n_para_obj=1) \n",
|
1151 |
+
"print(exp_test)"
|
1152 |
+
]
|
1153 |
+
},
|
1154 |
+
{
|
1155 |
+
"cell_type": "code",
|
1156 |
+
"execution_count": null,
|
1157 |
+
"id": "f9886f44",
|
1158 |
+
"metadata": {},
|
1159 |
+
"outputs": [],
|
1160 |
+
"source": []
|
1161 |
+
}
|
1162 |
+
],
|
1163 |
+
"metadata": {
|
1164 |
+
"kernelspec": {
|
1165 |
+
"display_name": "Python 3 (ipykernel)",
|
1166 |
+
"language": "python",
|
1167 |
+
"name": "python3"
|
1168 |
+
},
|
1169 |
+
"language_info": {
|
1170 |
+
"codemirror_mode": {
|
1171 |
+
"name": "ipython",
|
1172 |
+
"version": 3
|
1173 |
+
},
|
1174 |
+
"file_extension": ".py",
|
1175 |
+
"mimetype": "text/x-python",
|
1176 |
+
"name": "python",
|
1177 |
+
"nbconvert_exporter": "python",
|
1178 |
+
"pygments_lexer": "ipython3",
|
1179 |
+
"version": "3.9.12"
|
1180 |
+
}
|
1181 |
+
},
|
1182 |
+
"nbformat": 4,
|
1183 |
+
"nbformat_minor": 5
|
1184 |
+
}
|
notebooks/gedi_fig6_benchmark_boxplots.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/gedi_figs4and5_representativeness.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/gedi_figs7and8_benchmarking_statisticalTests.ipynb
CHANGED
@@ -1,5 +1,21 @@
|
|
1 |
{
|
2 |
"cells": [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
{
|
4 |
"cell_type": "code",
|
5 |
"execution_count": 8,
|
@@ -64,6 +80,14 @@
|
|
64 |
" return data"
|
65 |
]
|
66 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
{
|
68 |
"cell_type": "code",
|
69 |
"execution_count": 11,
|
@@ -110,7 +134,7 @@
|
|
110 |
"id": "07370d54",
|
111 |
"metadata": {},
|
112 |
"source": [
|
113 |
-
"
|
114 |
]
|
115 |
},
|
116 |
{
|
@@ -192,6 +216,14 @@
|
|
192 |
"#df_tmp = statistical_test(DATA_SOURCE+\"_feat\", \"Gen\"+DATA_SOURCE+\"_bench\", TEST, IMPUTE)"
|
193 |
]
|
194 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
195 |
{
|
196 |
"cell_type": "code",
|
197 |
"execution_count": 62,
|
@@ -466,37 +498,13 @@
|
|
466 |
" plot_stat_test(masked_results, data_source+\"_feat\", data_source+\"_bench\", test, IMPUTE, cbar=cbar, ylabels=ylabels)\n",
|
467 |
" plt.clf()"
|
468 |
]
|
469 |
-
},
|
470 |
-
{
|
471 |
-
"cell_type": "code",
|
472 |
-
"execution_count": null,
|
473 |
-
"id": "52c58c64",
|
474 |
-
"metadata": {},
|
475 |
-
"outputs": [],
|
476 |
-
"source": []
|
477 |
-
},
|
478 |
-
{
|
479 |
-
"cell_type": "code",
|
480 |
-
"execution_count": null,
|
481 |
-
"id": "3717a694",
|
482 |
-
"metadata": {},
|
483 |
-
"outputs": [],
|
484 |
-
"source": []
|
485 |
-
},
|
486 |
-
{
|
487 |
-
"cell_type": "code",
|
488 |
-
"execution_count": null,
|
489 |
-
"id": "c6afe4d9",
|
490 |
-
"metadata": {},
|
491 |
-
"outputs": [],
|
492 |
-
"source": []
|
493 |
}
|
494 |
],
|
495 |
"metadata": {
|
496 |
"kernelspec": {
|
497 |
-
"display_name": "
|
498 |
"language": "python",
|
499 |
-
"name": "
|
500 |
},
|
501 |
"language_info": {
|
502 |
"codemirror_mode": {
|
@@ -508,7 +516,7 @@
|
|
508 |
"name": "python",
|
509 |
"nbconvert_exporter": "python",
|
510 |
"pygments_lexer": "ipython3",
|
511 |
-
"version": "3.9.
|
512 |
}
|
513 |
},
|
514 |
"nbformat": 4,
|
|
|
1 |
{
|
2 |
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"id": "32241302-7f73-4756-b8a5-27f752de0dea",
|
6 |
+
"metadata": {},
|
7 |
+
"source": [
|
8 |
+
"# Plot - Statistical Tests"
|
9 |
+
]
|
10 |
+
},
|
11 |
+
{
|
12 |
+
"cell_type": "markdown",
|
13 |
+
"id": "51cee5d6-2d4c-4bdd-bdbf-4b3a3b76e6d6",
|
14 |
+
"metadata": {},
|
15 |
+
"source": [
|
16 |
+
"#### Load Data"
|
17 |
+
]
|
18 |
+
},
|
19 |
{
|
20 |
"cell_type": "code",
|
21 |
"execution_count": 8,
|
|
|
80 |
" return data"
|
81 |
]
|
82 |
},
|
83 |
+
{
|
84 |
+
"cell_type": "markdown",
|
85 |
+
"id": "f0d6e731-5f46-4747-82f8-a2f308d150ee",
|
86 |
+
"metadata": {},
|
87 |
+
"source": [
|
88 |
+
"#### Data Preprocessing"
|
89 |
+
]
|
90 |
+
},
|
91 |
{
|
92 |
"cell_type": "code",
|
93 |
"execution_count": 11,
|
|
|
134 |
"id": "07370d54",
|
135 |
"metadata": {},
|
136 |
"source": [
|
137 |
+
"#### Statistical test: Is there a statistical significant relation between feature similarity and performance metrics?"
|
138 |
]
|
139 |
},
|
140 |
{
|
|
|
216 |
"#df_tmp = statistical_test(DATA_SOURCE+\"_feat\", \"Gen\"+DATA_SOURCE+\"_bench\", TEST, IMPUTE)"
|
217 |
]
|
218 |
},
|
219 |
+
{
|
220 |
+
"cell_type": "markdown",
|
221 |
+
"id": "5e6ecc81-c14d-4859-ab04-49bbf458f7eb",
|
222 |
+
"metadata": {},
|
223 |
+
"source": [
|
224 |
+
"#### Plot - statistical Test of features vs metrics"
|
225 |
+
]
|
226 |
+
},
|
227 |
{
|
228 |
"cell_type": "code",
|
229 |
"execution_count": 62,
|
|
|
498 |
" plot_stat_test(masked_results, data_source+\"_feat\", data_source+\"_bench\", test, IMPUTE, cbar=cbar, ylabels=ylabels)\n",
|
499 |
" plt.clf()"
|
500 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
501 |
}
|
502 |
],
|
503 |
"metadata": {
|
504 |
"kernelspec": {
|
505 |
+
"display_name": "Python 3 (ipykernel)",
|
506 |
"language": "python",
|
507 |
+
"name": "python3"
|
508 |
},
|
509 |
"language_info": {
|
510 |
"codemirror_mode": {
|
|
|
516 |
"name": "python",
|
517 |
"nbconvert_exporter": "python",
|
518 |
"pygments_lexer": "ipython3",
|
519 |
+
"version": "3.9.19"
|
520 |
}
|
521 |
},
|
522 |
"nbformat": 4,
|
setup.py
CHANGED
@@ -88,4 +88,3 @@ setup(
|
|
88 |
'Programming Language :: Python :: 3.9',
|
89 |
],
|
90 |
)
|
91 |
-
|
|
|
88 |
'Programming Language :: Python :: 3.9',
|
89 |
],
|
90 |
)
|
|