Andrea Maldonado commited on
Commit
353129b
·
1 Parent(s): 9013f63

Release cr

Browse files
README.md CHANGED
@@ -17,18 +17,12 @@ license: mit
17
 
18
  **i**nteractive **G**enerating **E**vent **D**ata with **I**ntentional Features for Benchmarking Process Mining<br />
19
  This repository contains the codebase for the interactive web application tool (iGEDI) as well as for the [GEDI paper](https://mcml.ai/publications/gedi.pdf) accepted at the BPM'24 conference.
20
- Our documentation also includes both frameworks. From [General Usage](#general-usage) and beyond, documentation refers especially to reproducibility of the [GEDI paper](https://mcml.ai/publications/gedi.pdf).
21
-
22
- A video tutorial on how to use this tool can be found [here](https://youtu.be/9iQhaYwyQ9E).
23
-
24
 
25
  ## Table of Contents
26
 
27
  - [Interactive Web Application (iGEDI)](#interactive-web-application)
 
28
  - [Installation](#installation)
29
- - [as PyPi Package](#install-as-pypi-package)
30
- - [of iGEDI](#install-igedi)
31
- - [as local repository](#install-as-local-repository)
32
  - [General Usage](#general-usage)
33
  - [Experiments](#experiments)
34
  - [Citation](#citation)
@@ -37,8 +31,7 @@ A video tutorial on how to use this tool can be found [here](https://youtu.be/9i
37
  Our [interactive web application](https://huggingface.co/spaces/andreamalhera/gedi) (iGEDI) guides you through the specification process, runs GEDI for you. You can directly download the resulting generated logs or the configuration file to run GEDI locally.
38
  ![Interface Screenshot](gedi/utils/iGEDI_interface.png)
39
 
40
- ## Installation
41
- ### Requirements
42
  - [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
43
  - Graphviz on your OS e.g.
44
  For MacOS:
@@ -50,30 +43,13 @@ brew install swig
50
  ```console
51
  conda install pyrfr swig
52
  ```
53
- ### Install as PyPi package
54
- To directly use GEDI methods via `import`, install directly from [PyPi](https://pypi.org/project/gedi/) with
55
- ```shell
56
- pip install gedi
57
- ```
58
- and run:
59
- ```shell
60
- python -c "from gedi import gedi; gedi('config_files/pipeline_steps/generation.json')"
61
- ```
62
- ### Install iGEDI
63
- Our [interactive GEDI (iGEDI)](https://huggingface.co/spaces/andreamalhera/gedi) can be employed to create all necessary [configuration files](config_files) to reproduce our experiements.
64
- Users can directly use our [web application service](https://huggingface.co/spaces/andreamalhera/gedi) or locally start the following dashboard:
65
- ```
66
- streamlit run utils/config_fabric.py # To tunnel to local machine add: --server.port 8501 --server.headless true
67
-
68
- # In local machine (only in case you are tunneling):
69
- ssh -N -f -L 9000:localhost:8501 <user@remote_machine.com>
70
- open "http://localhost:9000/"
71
- ```
72
 
73
- ### Install as local repository
74
  ```console
75
- conda env create -f .conda.yml
76
- from gedi import gedi; gedi('config_files/test/experiment_test.json
77
  ```
78
  The last step should take only a few minutes to run.
79
 
@@ -85,8 +61,9 @@ Our pipeline offers several pipeline steps, which can be run sequentially or par
85
  - [Evaluation Plotter](https://github.com/lmu-dbs/gedi/blob/16-documentation-update-readme/README.md#evaluation-plotting)
86
 
87
  To run different steps of the GEDI pipeline, please adapt the `.json` accordingly.
88
- ```python
89
- from gedi import gedi; gedi('config_files/pipeline_steps/<pipeline-step>.json')
 
90
  ```
91
  For reference of possible keys and values for each step, please see `config_files/test/experiment_test.json`.
92
  To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
@@ -95,8 +72,9 @@ To reproduce results from our paper, please refer to [Experiments](#experiments)
95
  ### Feature Extraction
96
  ---
97
  To extract the features on the event-log level and use them for hyperparameter optimization, we employ the following script:
98
- ```python
99
- from gedi import gedi; gedi('config_files/pipeline_steps/feature_extraction.json')
 
100
  ```
101
  The JSON file consists of the following key-value pairs:
102
 
@@ -116,8 +94,9 @@ After having extracted meta features from the files, the next step is to generat
116
 
117
  The command to execute the generation step is given by a exemplarily generation.json file:
118
 
119
- ```python
120
- from gedi import gedi; gedi('config_files/pipeline_steps/generation.json')
 
121
  ```
122
 
123
  In the `generation.json`, we have the following key-value pairs:
@@ -144,11 +123,228 @@ In the `generation.json`, we have the following key-value pairs:
144
 
145
  - plot_reference_feature: defines the feature, which is used on the x-axis on the output plots, i.e., each feature defined in the 'objectives' of the 'experiment' is plotted against the reference feature being defined in this value
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  ### Benchmark
148
  The benchmarking defines the downstream task which is used for evaluating the goodness of the synthesized event log datasets with the metrics of real-world datasets. The command to execute a benchmarking is shown in the following script:
149
 
150
- ```python
151
- from gedi import gedi; gedi('config_files/pipeline_steps/benchmark.json')
 
152
  ```
153
 
154
  In the `benchmark.json`, we have the following key-value pairs:
@@ -164,8 +360,9 @@ In the `benchmark.json`, we have the following key-value pairs:
164
  The purpose of the evaluation plotting step is used just for visualization. Some examples of how the plotter can be used is shown in the following exemplarily script:
165
 
166
 
167
- ```python
168
- from gedi import gedi; gedi('config_files/pipeline_steps/evaluation_plotter.json')
 
169
  ```
170
 
171
  Generally, in the `evaluation_plotter.json`, we have the following key-value pairs:
@@ -183,8 +380,9 @@ We present two settings for generating intentional event logs, using [real targe
183
  ### Generating data with real targets
184
  To execute the experiments with real targets, we employ the [experiment_real_targets.json](config_files/experiment_real_targets.json). The script's pipeline will output the [generated event logs (GenBaselineED)](data/event_logs/GenBaselineED), which optimize their feature values towards [real-world event data features](data/BaselineED_feat.csv), alongside their respectively measured [feature values](data/GenBaselineED_feat.csv) and [benchmark metrics values](data/GenBaselineED_bench.csv).
185
 
186
- ```python
187
- from gedi import gedi; gedi('config_files/experiment_real_targets.json')
 
188
  ```
189
 
190
  ### Generating data with grid targets
@@ -195,10 +393,15 @@ python execute_grid_experiments.py config_files/grid_2obj
195
  ```
196
  We employ the [experiment_grid_2obj_configfiles_fabric.ipynb](notebooks/experiment_grid_2obj_configfiles_fabric.ipynb) to create all necessary [configuration](config_files/grid_2obj) and [objective](data/grid_2obj) files for this experiment.
197
  For more details about these config_files, please refer to [Feature Extraction](#feature-extraction), [Generation](#generation), and [Benchmark](#benchmark).
198
- To create configuration files for grid objectives interactively, you can use iGEDI(https://huggingface.co/spaces/andreamalhera/gedi).
 
 
199
 
 
 
 
 
200
  ### Visualizations
201
- Visualizations correspond to the [GEDI paper](https://mcml.ai/publications/gedi.pdf).
202
  To run the visualizations, we employ [jupyter notebooks](https://jupyter.org/install) and [add the installed environment to the jupyter notebook](https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084). We then start all visualizations by running e.g.: `jupyter noteboook`. In the following, we describe the `.ipynb`-files in the folder `\notebooks` to reproduce the figures from our paper.
203
 
204
  #### [Fig. 4 and fig. 5 Representativeness](notebooks/gedi_figs4and5_representativeness.ipynb)
@@ -218,23 +421,14 @@ Likewise to the evaluation on the statistical tests in notebook `gedi_figs7and8_
218
  The `GEDI` framework is taken directly from the original paper by [Maldonado](mailto:[email protected]), Frey, Tavares, Rehwald and Seidl and is *to appear on BPM'24*.
219
 
220
  ```bibtex
221
- @InProceedings{10.1007/978-3-031-70396-6_13,
222
- author="Maldonado, Andrea
223
- and Frey, Christian M. M.
224
- and Tavares, Gabriel Marques
225
- and Rehwald, Nikolina
226
- and Seidl, Thomas",
227
- editor="Marrella, Andrea
228
- and Resinas, Manuel
229
- and Jans, Mieke
230
- and Rosemann, Michael",
231
- title="GEDI: Generating Event Data with Intentional Features for Benchmarking Process Mining",
232
- booktitle="Business Process Management",
233
- year="2024",
234
- publisher="Springer Nature Switzerland",
235
- address="Cham",
236
- pages="221--237",
237
- abstract="Process mining solutions include enhancing performance, conserving resources, and alleviating bottlenecks in organizational contexts. However, as in other data mining fields, success hinges on data quality and availability. Existing analyses for process mining solutions lack diverse and ample data for rigorous testing, hindering insights' generalization. To address this, we propose Generating Event Data with Intentional features, a framework producing event data sets satisfying specific meta-features. Considering the meta-feature space that defines feasible event logs, we observe that existing real-world datasets describe only local areas within the overall space. Hence, our framework aims at providing the capability to generate an event data benchmark, which covers unexplored regions. Therefore, our approach leverages a discretization of the meta-feature space to steer generated data towards regions, where a combination of meta-features is not met yet by existing benchmark datasets. Providing a comprehensive data pool enriches process mining analyses, enables methods to capture a wider range of real-world scenarios, and improves evaluation quality. Moreover, it empowers analysts to uncover correlations between meta-features and evaluation metrics, enhancing explainability and solution effectiveness. Experiments demonstrate GEDI's ability to produce a benchmark of intentional event data sets and robust analyses for process mining tasks.",
238
- isbn="978-3-031-70396-6"
239
  }
240
  ```
 
17
 
18
  **i**nteractive **G**enerating **E**vent **D**ata with **I**ntentional Features for Benchmarking Process Mining<br />
19
  This repository contains the codebase for the interactive web application tool (iGEDI) as well as for the [GEDI paper](https://mcml.ai/publications/gedi.pdf) accepted at the BPM'24 conference.
 
 
 
 
20
 
21
  ## Table of Contents
22
 
23
  - [Interactive Web Application (iGEDI)](#interactive-web-application)
24
+ - [Requirements](#requirements)
25
  - [Installation](#installation)
 
 
 
26
  - [General Usage](#general-usage)
27
  - [Experiments](#experiments)
28
  - [Citation](#citation)
 
31
  Our [interactive web application](https://huggingface.co/spaces/andreamalhera/gedi) (iGEDI) guides you through the specification process, runs GEDI for you. You can directly download the resulting generated logs or the configuration file to run GEDI locally.
32
  ![Interface Screenshot](gedi/utils/iGEDI_interface.png)
33
 
34
+ ## Requirements
 
35
  - [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
36
  - Graphviz on your OS e.g.
37
  For MacOS:
 
43
  ```console
44
  conda install pyrfr swig
45
  ```
46
+ ## Installation
47
+ - `conda env create -f .conda.yml`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
+ ### Startup
50
  ```console
51
+ conda activate gedi
52
+ python main.py -a config_files/test/experiment_test.json
53
  ```
54
  The last step should take only a few minutes to run.
55
 
 
61
  - [Evaluation Plotter](https://github.com/lmu-dbs/gedi/blob/16-documentation-update-readme/README.md#evaluation-plotting)
62
 
63
  To run different steps of the GEDI pipeline, please adapt the `.json` accordingly.
64
+ ```console
65
+ conda activate gedi
66
+ python main.py -a config_files/pipeline_steps/<pipeline-step>.json
67
  ```
68
  For reference of possible keys and values for each step, please see `config_files/test/experiment_test.json`.
69
  To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
 
72
  ### Feature Extraction
73
  ---
74
  To extract the features on the event-log level and use them for hyperparameter optimization, we employ the following script:
75
+ ```console
76
+ conda activate gedi
77
+ python main.py -a config_files/pipeline_steps/feature_extraction.json
78
  ```
79
  The JSON file consists of the following key-value pairs:
80
 
 
94
 
95
  The command to execute the generation step is given by a exemplarily generation.json file:
96
 
97
+ ```console
98
+ conda activate gedi
99
+ python main.py -a config_files/pipeline_steps/generation.json
100
  ```
101
 
102
  In the `generation.json`, we have the following key-value pairs:
 
123
 
124
  - plot_reference_feature: defines the feature, which is used on the x-axis on the output plots, i.e., each feature defined in the 'objectives' of the 'experiment' is plotted against the reference feature being defined in this value
125
 
126
+ In case of manually defining the targets for the features in config space, the following table shows the range of the features in the real-world event log data (BPIC's) for reference:
127
+ <div style="overflow-x:auto;">
128
+ <table border="1" class="dataframe">
129
+ <thead>
130
+ <tr style="text-align: right;">
131
+ <th></th>
132
+ <th>n_traces</th>
133
+ <th>n_unique_traces</th>
134
+ <th>ratio_variants_per_number_of_traces</th>
135
+ <th>trace_len_min</th>
136
+ <th>trace_len_max</th>
137
+ <th>trace_len_mean</th>
138
+ <th>trace_len_median</th>
139
+ <th>trace_len_mode</th>
140
+ <th>trace_len_std</th>
141
+ <th>trace_len_variance</th>
142
+ <th>trace_len_q1</th>
143
+ <th>trace_len_q3</th>
144
+ <th>trace_len_iqr</th>
145
+ <th>trace_len_geometric_mean</th>
146
+ <th>trace_len_geometric_std</th>
147
+ <th>trace_len_harmonic_mean</th>
148
+ <th>trace_len_skewness</th>
149
+ <th>trace_len_kurtosis</th>
150
+ <th>trace_len_coefficient_variation</th>
151
+ <th>trace_len_entropy</th>
152
+ <th>trace_len_hist1</th>
153
+ <th>trace_len_hist2</th>
154
+ <th>trace_len_hist3</th>
155
+ <th>trace_len_hist4</th>
156
+ <th>trace_len_hist5</th>
157
+ <th>trace_len_hist6</th>
158
+ <th>trace_len_hist7</th>
159
+ <th>trace_len_hist8</th>
160
+ <th>trace_len_hist9</th>
161
+ <th>trace_len_hist10</th>
162
+ <th>trace_len_skewness_hist</th>
163
+ <th>trace_len_kurtosis_hist</th>
164
+ <th>ratio_most_common_variant</th>
165
+ <th>ratio_top_1_variants</th>
166
+ <th>ratio_top_5_variants</th>
167
+ <th>ratio_top_10_variants</th>
168
+ <th>ratio_top_20_variants</th>
169
+ <th>ratio_top_50_variants</th>
170
+ <th>ratio_top_75_variants</th>
171
+ <th>mean_variant_occurrence</th>
172
+ <th>std_variant_occurrence</th>
173
+ <th>skewness_variant_occurrence</th>
174
+ <th>kurtosis_variant_occurrence</th>
175
+ <th>n_unique_activities</th>
176
+ <th>activities_min</th>
177
+ <th>activities_max</th>
178
+ <th>activities_mean</th>
179
+ <th>activities_median</th>
180
+ <th>activities_std</th>
181
+ <th>activities_variance</th>
182
+ <th>activities_q1</th>
183
+ <th>activities_q3</th>
184
+ <th>activities_iqr</th>
185
+ <th>activities_skewness</th>
186
+ <th>activities_kurtosis</th>
187
+ <th>n_unique_start_activities</th>
188
+ <th>start_activities_min</th>
189
+ <th>start_activities_max</th>
190
+ <th>start_activities_mean</th>
191
+ <th>start_activities_median</th>
192
+ <th>start_activities_std</th>
193
+ <th>start_activities_variance</th>
194
+ <th>start_activities_q1</th>
195
+ <th>start_activities_q3</th>
196
+ <th>start_activities_iqr</th>
197
+ <th>start_activities_skewness</th>
198
+ <th>start_activities_kurtosis</th>
199
+ <th>n_unique_end_activities</th>
200
+ <th>end_activities_min</th>
201
+ <th>end_activities_max</th>
202
+ <th>end_activities_mean</th>
203
+ <th>end_activities_median</th>
204
+ <th>end_activities_std</th>
205
+ <th>end_activities_variance</th>
206
+ <th>end_activities_q1</th>
207
+ <th>end_activities_q3</th>
208
+ <th>end_activities_iqr</th>
209
+ <th>end_activities_skewness</th>
210
+ <th>end_activities_kurtosis</th>
211
+ <th>eventropy_trace</th>
212
+ <th>eventropy_prefix</th>
213
+ <th>eventropy_global_block</th>
214
+ <th>eventropy_lempel_ziv</th>
215
+ <th>eventropy_k_block_diff_1</th>
216
+ <th>eventropy_k_block_diff_3</th>
217
+ <th>eventropy_k_block_diff_5</th>
218
+ <th>eventropy_k_block_ratio_1</th>
219
+ <th>eventropy_k_block_ratio_3</th>
220
+ <th>eventropy_k_block_ratio_5</th>
221
+ <th>eventropy_knn_3</th>
222
+ <th>eventropy_knn_5</th>
223
+ <th>eventropy_knn_7</th>
224
+ <th>epa_variant_entropy</th>
225
+ <th>epa_normalized_variant_entropy</th>
226
+ <th>epa_sequence_entropy</th>
227
+ <th>epa_normalized_sequence_entropy</th>
228
+ <th>epa_sequence_entropy_linear_forgetting</th>
229
+ <th>epa_normalized_sequence_entropy_linear_forgetting</th>
230
+ <th>epa_sequence_entropy_exponential_forgetting</th>
231
+ <th>epa_normalized_sequence_entropy_exponential_forgetting</th>
232
+ </tr>
233
+ </thead>
234
+ <tbody>
235
+ <tr>
236
+ <td>[ min, max ]</td>
237
+ <td>[ 226.0, 251734.0 ]</td>
238
+ <td>[ 6.0, 28457.0 ]</td>
239
+ <td>[ 0.0, 1.0 ]</td>
240
+ <td>[ 1.0, 24.0 ]</td>
241
+ <td>[ 1.0, 2973.0 ]</td>
242
+ <td>[ 1.0, 131.49 ]</td>
243
+ <td>[ 1.0, 55.0 ]</td>
244
+ <td>[ 1.0, 61.0 ]</td>
245
+ <td>[ 0.0, 202.53 ]</td>
246
+ <td>[ 0.0, 41017.89 ]</td>
247
+ <td>[ 1.0, 44.0 ]</td>
248
+ <td>[ 1.0, 169.0 ]</td>
249
+ <td>[ 0.0, 161.0 ]</td>
250
+ <td>[ 1.0, 53.78 ]</td>
251
+ <td>[ 1.0, 5.65 ]</td>
252
+ <td>[ 1.0, 51.65 ]</td>
253
+ <td>[ -0.58, 111.97 ]</td>
254
+ <td>[ -0.97, 14006.75 ]</td>
255
+ <td>[ 0.0, 4.74 ]</td>
256
+ <td>[ 5.33, 12.04 ]</td>
257
+ <td>[ 0.0, 1.99 ]</td>
258
+ <td>[ 0.0, 0.42 ]</td>
259
+ <td>[ 0.0, 0.4 ]</td>
260
+ <td>[ 0.0, 0.19 ]</td>
261
+ <td>[ 0.0, 0.14 ]</td>
262
+ <td>[ 0.0, 10.0 ]</td>
263
+ <td>[ 0.0, 0.02 ]</td>
264
+ <td>[ 0.0, 0.04 ]</td>
265
+ <td>[ 0.0, 0.0 ]</td>
266
+ <td>[ 0.0, 2.7 ]</td>
267
+ <td>[ -0.58, 111.97 ]</td>
268
+ <td>[ -0.97, 14006.75 ]</td>
269
+ <td>[ 0.0, 0.79 ]</td>
270
+ <td>[ 0.0, 0.87 ]</td>
271
+ <td>[ 0.0, 0.98 ]</td>
272
+ <td>[ 0.0, 0.99 ]</td>
273
+ <td>[ 0.2, 1.0 ]</td>
274
+ <td>[ 0.5, 1.0 ]</td>
275
+ <td>[ 0.75, 1.0 ]</td>
276
+ <td>[ 1.0, 24500.67 ]</td>
277
+ <td>[ 0.04, 42344.04 ]</td>
278
+ <td>[ 1.54, 64.77 ]</td>
279
+ <td>[ 0.66, 5083.46 ]</td>
280
+ <td>[ 1.0, 1152.0 ]</td>
281
+ <td>[ 1.0, 66058.0 ]</td>
282
+ <td>[ 34.0, 466141.0 ]</td>
283
+ <td>[ 4.13, 66058.0 ]</td>
284
+ <td>[ 2.0, 66058.0 ]</td>
285
+ <td>[ 0.0, 120522.25 ]</td>
286
+ <td>[ 0.0, 14525612122.34 ]</td>
287
+ <td>[ 1.0, 66058.0 ]</td>
288
+ <td>[ 4.0, 79860.0 ]</td>
289
+ <td>[ 0.0, 77290.0 ]</td>
290
+ <td>[ -0.06, 15.21 ]</td>
291
+ <td>[ -1.5, 315.84 ]</td>
292
+ <td>[ 1.0, 809.0 ]</td>
293
+ <td>[ 1.0, 150370.0 ]</td>
294
+ <td>[ 27.0, 199867.0 ]</td>
295
+ <td>[ 3.7, 150370.0 ]</td>
296
+ <td>[ 1.0, 150370.0 ]</td>
297
+ <td>[ 0.0, 65387.49 ]</td>
298
+ <td>[ 0.0, 4275524278.19 ]</td>
299
+ <td>[ 1.0, 150370.0 ]</td>
300
+ <td>[ 4.0, 150370.0 ]</td>
301
+ <td>[ 0.0, 23387.25 ]</td>
302
+ <td>[ 0.0, 9.3 ]</td>
303
+ <td>[ -2.0, 101.82 ]</td>
304
+ <td>[ 1.0, 757.0 ]</td>
305
+ <td>[ 1.0, 16653.0 ]</td>
306
+ <td>[ 28.0, 181328.0 ]</td>
307
+ <td>[ 3.53, 24500.67 ]</td>
308
+ <td>[ 1.0, 16653.0 ]</td>
309
+ <td>[ 0.0, 42344.04 ]</td>
310
+ <td>[ 0.0, 1793017566.89 ]</td>
311
+ <td>[ 1.0, 16653.0 ]</td>
312
+ <td>[ 3.0, 39876.0 ]</td>
313
+ <td>[ 0.0, 39766.0 ]</td>
314
+ <td>[ -0.7, 13.82 ]</td>
315
+ <td>[ -2.0, 255.39 ]</td>
316
+ <td>[ 0.0, 13.36 ]</td>
317
+ <td>[ 0.0, 16.77 ]</td>
318
+ <td>[ 0.0, 24.71 ]</td>
319
+ <td>[ 0.0, 685.0 ]</td>
320
+ <td>[ -328.0, 962.0 ]</td>
321
+ <td>[ 0.0, 871.0 ]</td>
322
+ <td>[ 0.0, 881.0 ]</td>
323
+ <td>[ 0.0, 935.0 ]</td>
324
+ <td>[ 0.0, 7.11 ]</td>
325
+ <td>[ 0.0, 7.11 ]</td>
326
+ <td>[ 0.0, 8.93 ]</td>
327
+ <td>[ 0.0, 648.0 ]</td>
328
+ <td>[ 0.0, 618.0 ]</td>
329
+ <td>[ 0.0, 11563842.15 ]</td>
330
+ <td>[ 0.0, 0.9 ]</td>
331
+ <td>[ 0.0, 21146257.12 ]</td>
332
+ <td>[ 0.0, 0.76 ]</td>
333
+ <td>[ 0.0, 14140225.9 ]</td>
334
+ <td>[ 0.0, 0.42 ]</td>
335
+ <td>[ 0.0, 15576076.83 ]</td>
336
+ <td>[ 0.0, 0.51 ]</td>
337
+ </tr>
338
+ </tbody>
339
+ </table>
340
+ </div>
341
+
342
  ### Benchmark
343
  The benchmarking defines the downstream task which is used for evaluating the goodness of the synthesized event log datasets with the metrics of real-world datasets. The command to execute a benchmarking is shown in the following script:
344
 
345
+ ```console
346
+ conda activate gedi
347
+ python main.py -a config_files/pipeline_steps/benchmark.json
348
  ```
349
 
350
  In the `benchmark.json`, we have the following key-value pairs:
 
360
  The purpose of the evaluation plotting step is used just for visualization. Some examples of how the plotter can be used is shown in the following exemplarily script:
361
 
362
 
363
+ ```console
364
+ conda activate gedi
365
+ python main.py -a config_files/pipeline_steps/evaluation_plotter.json
366
  ```
367
 
368
  Generally, in the `evaluation_plotter.json`, we have the following key-value pairs:
 
380
  ### Generating data with real targets
381
  To execute the experiments with real targets, we employ the [experiment_real_targets.json](config_files/experiment_real_targets.json). The script's pipeline will output the [generated event logs (GenBaselineED)](data/event_logs/GenBaselineED), which optimize their feature values towards [real-world event data features](data/BaselineED_feat.csv), alongside their respectively measured [feature values](data/GenBaselineED_feat.csv) and [benchmark metrics values](data/GenBaselineED_bench.csv).
382
 
383
+ ```console
384
+ conda activate gedi
385
+ python main.py -a config_files/experiment_real_targets.json
386
  ```
387
 
388
  ### Generating data with grid targets
 
393
  ```
394
  We employ the [experiment_grid_2obj_configfiles_fabric.ipynb](notebooks/experiment_grid_2obj_configfiles_fabric.ipynb) to create all necessary [configuration](config_files/grid_2obj) and [objective](data/grid_2obj) files for this experiment.
395
  For more details about these config_files, please refer to [Feature Extraction](#feature-extraction), [Generation](#generation), and [Benchmark](#benchmark).
396
+ To create configuration files for grid objectives interactively, you can use the start the following dashboard:
397
+ ```
398
+ streamlit run utils/config_fabric.py # To tunnel to local machine add: --server.port 8501 --server.headless true
399
 
400
+ # In local machine (only in case you are tunneling):
401
+ ssh -N -f -L 9000:localhost:8501 <user@remote_machine.com>
402
+ open "http://localhost:9000/"
403
+ ```
404
  ### Visualizations
 
405
  To run the visualizations, we employ [jupyter notebooks](https://jupyter.org/install) and [add the installed environment to the jupyter notebook](https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084). We then start all visualizations by running e.g.: `jupyter noteboook`. In the following, we describe the `.ipynb`-files in the folder `\notebooks` to reproduce the figures from our paper.
406
 
407
  #### [Fig. 4 and fig. 5 Representativeness](notebooks/gedi_figs4and5_representativeness.ipynb)
 
421
  The `GEDI` framework is taken directly from the original paper by [Maldonado](mailto:[email protected]), Frey, Tavares, Rehwald and Seidl and is *to appear on BPM'24*.
422
 
423
  ```bibtex
424
+ @article{maldonado2024gedi,
425
+ author = {Maldonado, Andrea and Frey, {Christian M. M.} and Tavares, {Gabriel M.} and Rehwald, Nikolina and Seidl, Thomas},
426
+ title = {{GEDI:} Generating Event Data with Intentional Features for Benchmarking Process Mining},
427
+ journal = {To be published in BPM 2024. Krakow, Poland, Sep 01-06},
428
+ volume = {},
429
+ year = {2024},
430
+ url = {https://mcml.ai/publications/gedi.pdf},
431
+ doi = {},
432
+ eprinttype = {website},
 
 
 
 
 
 
 
 
 
433
  }
434
  ```
config_files/test/test_abbrv_generation.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [{"pipeline_step": "event_logs_generation",
2
+ "output_path": "output/test",
3
+ "generator_params": {"experiment":
4
+ {"input_path": "data/test/igedi_table_1.csv",
5
+ "objectives": ["rmcv","ense"]},
6
+ "config_space": {"mode": [5, 20], "sequence": [0.01, 1],
7
+ "choice": [0.01, 1], "parallel": [0.01, 1], "loop": [0.01, 1],
8
+ "silent": [0.01, 1], "lt_dependency": [0.01, 1],
9
+ "num_traces": [10, 10001], "duplicate": [0],
10
+ "or": [0]}, "n_trials": 2}},
11
+ {"pipeline_step": "feature_extraction",
12
+ "input_path": "output/test/igedi_table_1/2_ense_rmcv",
13
+ "feature_params": {"feature_set": ["simple_stats", "trace_length", "trace_variant",
14
+ "activities", "start_activities", "end_activities", "eventropies", "epa_based"]},
15
+ "output_path": "output/plots", "real_eventlog_path": "data/test/2_bpic_features.csv",
16
+ "plot_type": "boxplot"}]
data/test/igedi_table_1.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ log,rmcv,ense
2
+ BPIC15f4,0.003,0.604
3
+ RTFMP,0.376,0.112
4
+ HD,0.517,0.254
data/validation/2_ense_rmcv_feat.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ log,n_traces,n_unique_traces,trace_len_coefficient_variation,trace_len_entropy,trace_len_geometric_mean,trace_len_geometric_std,trace_len_harmonic_mean,trace_len_hist1,trace_len_hist10,trace_len_hist2,trace_len_hist3,trace_len_hist4,trace_len_hist5,trace_len_hist6,trace_len_hist7,trace_len_hist8,trace_len_hist9,trace_len_iqr,trace_len_kurtosis,trace_len_kurtosis_hist,trace_len_max,trace_len_mean,trace_len_median,trace_len_min,trace_len_mode,trace_len_q1,trace_len_q3,trace_len_skewness,trace_len_skewness_hist,trace_len_std,trace_len_variance,kurtosis_variant_occurrence,mean_variant_occurrence,ratio_most_common_variant,ratio_top_10_variants,ratio_top_1_variants,ratio_top_20_variants,ratio_top_50_variants,ratio_top_5_variants,ratio_top_75_variants,skewness_variant_occurrence,std_variant_occurrence,activities_iqr,activities_kurtosis,activities_max,activities_mean,activities_median,activities_min,activities_q1,activities_q3,activities_skewness,activities_std,activities_variance,n_unique_activities,n_unique_start_activities,start_activities_iqr,start_activities_kurtosis,start_activities_max,start_activities_mean,start_activities_median,start_activities_min,start_activities_q1,start_activities_q3,start_activities_skewness,start_activities_std,start_activities_variance,end_activities_iqr,end_activities_kurtosis,end_activities_max,end_activities_mean,end_activities_median,end_activities_min,end_activities_q1,end_activities_q3,end_activities_skewness,end_activities_std,end_activities_variance,n_unique_end_activities,eventropy_global_block,eventropy_global_block_flattened,eventropy_k_block_diff_1,eventropy_k_block_diff_3,eventropy_k_block_diff_5,eventropy_k_block_ratio_1,eventropy_k_block_ratio_3,eventropy_k_block_ratio_5,eventropy_knn_3,eventropy_knn_5,eventropy_knn_7,eventropy_lempel_ziv,eventropy_lempel_ziv_flattened,eventropy_prefix,eventropy_prefix_flattened,eventropy_trace,epa_variant_entropy,epa_normalized_variant_entropy,epa_sequence_entropy,epa_normalized_sequence_entropy,epa_sequence_entropy_linear_forgetting,epa_normalized_sequence_entropy_linear_forgetting,epa_sequence_entropy_exponential_forgetting,epa_normalized_sequence_entropy_exponential_forgetting,ratio_variants_per_number_of_traces
2
+ genELBPIC15f4_0604_0003,8616,4031,1.0086445672512825,8.700230419287818,8.516920996327995,2.1832133718212567,6.58111248846037,0.05713165933282198,1.682074468800883e-05,0.009932649738269211,0.0033136867035377378,0.0012279143622246447,0.0005214430853282738,0.00017661781922409254,0.0001093348404720574,3.364148937601766e-05,0.0,9.0,11.77613857723645,4.64306597180025,141,11.964136490250697,7.0,3,3,5.0,14.0,2.836323931248485,2.5294876299887217,12.067561272744191,145.6260350714354,1651.5545366193303,2.137434879682461,0.09099350046425256,0.5789229340761374,0.40401578458681525,0.6256963788300836,0.766016713091922,0.5258820798514392,0.883008356545961,36.276105773051086,15.574023282690577,2184.5,1.9085746306932307,34121,12885.375,8627.0,8584,8616.0,10800.5,1.8663249384138656,8507.416043333898,72376127.734375,8,2,2111.0,-2.0,6419,4308.0,4308.0,2197,3252.5,5363.5,0.0,2111.0,4456321.0,768.0,0.0021026107788850723,4895,1723.2,832.0,495,813.0,1581.0,1.331337855426617,1625.5283940922102,2642342.56,5,15.897,16.276,2.756,1.525,1.375,2.756,2.016,1.775,6.564,6.07,5.761,1.405,1.786,12.139,13.493,9.703,365917.06171394786,0.7166786736830569,651595.1462643282,0.5475971681938718,62016.045914910814,0.05211796208164211,266396.7627350506,0.22387845232743814,0.46785051067780875
3
+ genELHD_0254_0517,6822,565,1.1300022933733087,8.390788875278787,1.9006921917027269,2.263915758458681,1.4763543408149593,0.28822871537617945,0.00010858116985352402,0.04077222927999826,0.02383356678284851,0.006080545511797346,0.005591930247456488,0.002823110416191621,0.0017915893025831464,0.0006514870191211442,0.0004886152643408582,2.0,9.718268017319556,4.770965470001153,28,2.8346525945470535,1.0,1,1,1.0,3.0,2.765986310146101,2.5637920433464965,3.2031639327547703,10.260259180101007,226.4931382842208,12.07433628318584,0.24860744649662855,0.9079448841981823,0.6807387862796834,0.9321313397830548,0.9585165640574611,0.8717384931105248,0.9791849897390794,14.639488482439702,105.6342402074512,1283.0,8.118508585327676,6848,1137.5294117647059,472.0,208,413.0,1696.0,2.9234849385484285,1541.823981624173,2377221.1903114184,17,10,294.25,2.299363631971671,3383,682.2,217.0,101,121.75,416.0,1.9301655015244086,1008.2924972447232,1016653.7600000001,334.5,2.8813625853874614,3383,620.1818181818181,157.0,79,104.5,439.0,2.0614116860983223,981.5564465945092,963453.0578512397,11,9.069,10.932,3.265,0.908,0.67,3.265,1.808,1.456,4.81,4.359,4.05,0.696,2.01,6.995,10.12,4.469,16958.33766640406,0.7450438396474315,70379.87102533762,0.36874603139171797,9719.481922433943,0.050923940806750986,30545.050254490514,0.16003675334882345,0.08282028730577544
4
+ genELRTFMP_0112_0376,6822,565,1.1300022933733087,8.390788875278787,1.9006921917027269,2.263915758458681,1.4763543408149593,0.28822871537617945,0.00010858116985352402,0.04077222927999826,0.02383356678284851,0.006080545511797346,0.005591930247456488,0.002823110416191621,0.0017915893025831464,0.0006514870191211442,0.0004886152643408582,2.0,9.718268017319556,4.770965470001153,28,2.8346525945470535,1.0,1,1,1.0,3.0,2.765986310146101,2.5637920433464965,3.2031639327547703,10.260259180101007,226.4931382842208,12.07433628318584,0.24860744649662855,0.9079448841981823,0.6807387862796834,0.9321313397830548,0.9585165640574611,0.8717384931105248,0.9791849897390794,14.639488482439702,105.6342402074512,1283.0,8.118508585327676,6848,1137.5294117647059,472.0,208,413.0,1696.0,2.9234849385484285,1541.823981624173,2377221.1903114184,17,10,294.25,2.299363631971671,3383,682.2,217.0,101,121.75,416.0,1.9301655015244086,1008.2924972447232,1016653.7600000001,334.5,2.8813625853874614,3383,620.1818181818181,157.0,79,104.5,439.0,2.0614116860983223,981.5564465945092,963453.0578512397,11,9.069,10.932,3.265,0.908,0.67,3.265,1.808,1.456,4.81,4.359,4.05,0.696,2.01,6.995,10.12,4.469,16958.33766640406,0.7450438396474315,70379.87102533762,0.36874603139171797,9719.481922433943,0.050923940806750986,30545.050254490514,0.16003675334882345,0.08282028730577544
gedi/__init__.py CHANGED
@@ -1,3 +1,7 @@
1
- from .run import gedi
 
 
 
 
2
 
3
- __all__=['gedi']
 
1
+ from .generator import GenerateEventLogs
2
+ from .features import EventLogFeatures
3
+ from .augmentation import InstanceAugmentator
4
+ from .benchmark import BenchmarkTest
5
+ from .plotter import BenchmarkPlotter, FeaturesPlotter, AugmentationPlotter, GenerationPlotter
6
 
7
+ __all__=[ 'GenerateEventLogs', 'EventLogFeatures', 'FeatureAnalyser', 'InstanceAugmentator', 'BenchmarkTest', 'BenchmarkPlotter', 'FeaturesPlotter', 'AugmentationPlotter', 'GenerationPlotter']
gedi/features.py CHANGED
@@ -10,7 +10,7 @@ from pathlib import Path
10
  from utils.param_keys import INPUT_PATH
11
  from utils.param_keys.features import FEATURE_PARAMS, FEATURE_SET
12
  from gedi.utils.io_helpers import dump_features_json
13
-
14
  def get_sortby_parameter(elem):
15
  number = int(elem.rsplit(".")[0].rsplit("_", 1)[1])
16
  return number
@@ -63,6 +63,8 @@ class EventLogFeatures(EventLogFile):
63
 
64
  if str(self.filename).endswith('csv'): # Returns dataframe from loaded metafeatures file
65
  self.feat = pd.read_csv(self.filepath)
 
 
66
  print(f"SUCCESS: EventLogFeatures loaded features from {self.filepath}")
67
  elif isinstance(self.filename, list): # Computes metafeatures for list of .xes files
68
  combined_features=pd.DataFrame()
 
10
  from utils.param_keys import INPUT_PATH
11
  from utils.param_keys.features import FEATURE_PARAMS, FEATURE_SET
12
  from gedi.utils.io_helpers import dump_features_json
13
+ from utils.column_mappings import column_mappings
14
  def get_sortby_parameter(elem):
15
  number = int(elem.rsplit(".")[0].rsplit("_", 1)[1])
16
  return number
 
63
 
64
  if str(self.filename).endswith('csv'): # Returns dataframe from loaded metafeatures file
65
  self.feat = pd.read_csv(self.filepath)
66
+ columns_to_rename = {col: column_mappings()[col] for col in self.feat.columns if col in column_mappings()}
67
+ self.feat.rename(columns=columns_to_rename, inplace=True)
68
  print(f"SUCCESS: EventLogFeatures loaded features from {self.filepath}")
69
  elif isinstance(self.filename, list): # Computes metafeatures for list of .xes files
70
  combined_features=pd.DataFrame()
gedi/generator.py CHANGED
@@ -21,6 +21,7 @@ from utils.param_keys import OUTPUT_PATH, INPUT_PATH
21
  from utils.param_keys.generator import GENERATOR_PARAMS, EXPERIMENT, CONFIG_SPACE, N_TRIALS
22
  from gedi.utils.io_helpers import get_output_key_value_location, dump_features_json, compute_similarity
23
  from gedi.utils.io_helpers import read_csvs
 
24
  import xml.etree.ElementTree as ET
25
  import re
26
  from xml.dom import minidom
@@ -153,6 +154,8 @@ class GenerateEventLogs():
153
  experiment = self.params.get(EXPERIMENT)
154
  if experiment is not None:
155
  tasks, output_path = get_tasks(experiment, self.output_path)
 
 
156
  self.output_path = output_path
157
 
158
  if 'ratio_variants_per_number_of_traces' in tasks.columns:#HOTFIX
 
21
  from utils.param_keys.generator import GENERATOR_PARAMS, EXPERIMENT, CONFIG_SPACE, N_TRIALS
22
  from gedi.utils.io_helpers import get_output_key_value_location, dump_features_json, compute_similarity
23
  from gedi.utils.io_helpers import read_csvs
24
+ from utils.column_mappings import column_mappings
25
  import xml.etree.ElementTree as ET
26
  import re
27
  from xml.dom import minidom
 
154
  experiment = self.params.get(EXPERIMENT)
155
  if experiment is not None:
156
  tasks, output_path = get_tasks(experiment, self.output_path)
157
+ columns_to_rename = {col: column_mappings()[col] for col in tasks.columns if col in column_mappings()}
158
+ tasks = tasks.rename(columns=columns_to_rename)
159
  self.output_path = output_path
160
 
161
  if 'ratio_variants_per_number_of_traces' in tasks.columns:#HOTFIX
gedi/run.py DELETED
@@ -1,53 +0,0 @@
1
- import config
2
- import pandas as pd
3
- from datetime import datetime as dt
4
- from gedi.generator import GenerateEventLogs
5
- from gedi.features import EventLogFeatures
6
- from gedi.augmentation import InstanceAugmentator
7
- from gedi.benchmark import BenchmarkTest
8
- from gedi.plotter import BenchmarkPlotter, FeaturesPlotter, AugmentationPlotter, GenerationPlotter
9
- from utils.default_argparse import ArgParser
10
- from utils.param_keys import *
11
-
12
- def run(kwargs:dict, model_params_list: list, filename_list:list):
13
- """
14
- This function chooses the running option for the program.
15
- @param kwargs: dict
16
- contains the running parameters and the event-log file information
17
- @param model_params_list: list
18
- contains a list of model parameters, which are used to analyse this different models.
19
- @param filename_list: list
20
- contains the list of the filenames to load multiple event-logs
21
- @return:
22
- """
23
- params = kwargs[PARAMS]
24
- ft = EventLogFeatures(None)
25
- augmented_ft = InstanceAugmentator()
26
- gen = pd.DataFrame(columns=['log'])
27
-
28
- for model_params in model_params_list:
29
- if model_params.get(PIPELINE_STEP) == 'instance_augmentation':
30
- augmented_ft = InstanceAugmentator(aug_params=model_params, samples=ft.feat)
31
- AugmentationPlotter(augmented_ft, model_params)
32
- elif model_params.get(PIPELINE_STEP) == 'event_logs_generation':
33
- gen = pd.DataFrame(GenerateEventLogs(model_params).log_config)
34
- #gen = pd.read_csv("output/features/generated/grid_2objectives_enseef_enve/2_enseef_enve_feat.csv")
35
- #GenerationPlotter(gen, model_params, output_path="output/plots")
36
- elif model_params.get(PIPELINE_STEP) == 'benchmark_test':
37
- benchmark = BenchmarkTest(model_params, event_logs=gen['log'])
38
- # BenchmarkPlotter(benchmark.features, output_path="output/plots")
39
- elif model_params.get(PIPELINE_STEP) == 'feature_extraction':
40
- ft = EventLogFeatures(**kwargs, logs=gen['log'], ft_params=model_params)
41
- FeaturesPlotter(ft.feat, model_params)
42
- elif model_params.get(PIPELINE_STEP) == "evaluation_plotter":
43
- GenerationPlotter(gen, model_params, output_path=model_params['output_path'], input_path=model_params['input_path'])
44
-
45
- def gedi(config_path):
46
- """
47
- This function runs the GEDI pipeline.
48
- @param config_path: str
49
- contains the path to the config file
50
- @return:
51
- """
52
- model_params_list = config.get_model_params_list(config_path)
53
- run({'params':""}, model_params_list, [])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
execute_grid_experiments.py → gedi/utils/execute_grid_experiments.py RENAMED
@@ -3,7 +3,7 @@ import os
3
  import sys
4
 
5
  from datetime import datetime as dt
6
- from gedi.utils.io_helpers import sort_files
7
  from tqdm import tqdm
8
 
9
  #TODO: Pass i properly
 
3
  import sys
4
 
5
  from datetime import datetime as dt
6
+ from io_helpers import sort_files
7
  from tqdm import tqdm
8
 
9
  #TODO: Pass i properly
main.py CHANGED
@@ -1,12 +1,54 @@
1
  import config
 
2
  from datetime import datetime as dt
3
- from gedi.run import gedi, run
 
 
 
 
4
  from utils.default_argparse import ArgParser
5
  from utils.param_keys import *
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  if __name__=='__main__':
8
  start_gedi = dt.now()
9
  print(f'INFO: GEDI starting {start_gedi}')
 
10
  args = ArgParser().parse('GEDI main')
11
- gedi(args.alg_params_json)
 
 
12
  print(f'SUCCESS: GEDI took {dt.now()-start_gedi} sec.')
 
1
  import config
2
+ import pandas as pd
3
  from datetime import datetime as dt
4
+ from gedi.generator import GenerateEventLogs
5
+ from gedi.features import EventLogFeatures
6
+ from gedi.augmentation import InstanceAugmentator
7
+ from gedi.benchmark import BenchmarkTest
8
+ from gedi.plotter import BenchmarkPlotter, FeaturesPlotter, AugmentationPlotter, GenerationPlotter
9
  from utils.default_argparse import ArgParser
10
  from utils.param_keys import *
11
 
12
+ def run(kwargs:dict, model_paramas_list: list, filename_list:list):
13
+ """
14
+ This function chooses the running option for the program.
15
+ @param kwargs: dict
16
+ contains the running parameters and the event-log file information
17
+ @param model_params_list: list
18
+ contains a list of model parameters, which are used to analyse this different models.
19
+ @param filename_list: list
20
+ contains the list of the filenames to load multiple event-logs
21
+ @return:
22
+ """
23
+ params = kwargs[PARAMS]
24
+ ft = EventLogFeatures(None)
25
+ augmented_ft = InstanceAugmentator()
26
+ gen = pd.DataFrame(columns=['log'])
27
+
28
+ for model_params in model_params_list:
29
+ if model_params.get(PIPELINE_STEP) == 'instance_augmentation':
30
+ augmented_ft = InstanceAugmentator(aug_params=model_params, samples=ft.feat)
31
+ AugmentationPlotter(augmented_ft, model_params)
32
+ elif model_params.get(PIPELINE_STEP) == 'event_logs_generation':
33
+ gen = pd.DataFrame(GenerateEventLogs(model_params).log_config)
34
+ #gen = pd.read_csv("output/features/generated/grid_2objectives_enseef_enve/2_enseef_enve_feat.csv")
35
+ #GenerationPlotter(gen, model_params, output_path="output/plots")
36
+ elif model_params.get(PIPELINE_STEP) == 'benchmark_test':
37
+ benchmark = BenchmarkTest(model_params, event_logs=gen['log'])
38
+ # BenchmarkPlotter(benchmark.features, output_path="output/plots")
39
+ elif model_params.get(PIPELINE_STEP) == 'feature_extraction':
40
+ ft = EventLogFeatures(**kwargs, logs=gen['log'], ft_params=model_params)
41
+ FeaturesPlotter(ft.feat, model_params)
42
+ elif model_params.get(PIPELINE_STEP) == "evaluation_plotter":
43
+ GenerationPlotter(gen, model_params, output_path=model_params['output_path'], input_path=model_params['input_path'])
44
+
45
+
46
  if __name__=='__main__':
47
  start_gedi = dt.now()
48
  print(f'INFO: GEDI starting {start_gedi}')
49
+
50
  args = ArgParser().parse('GEDI main')
51
+ model_params_list = config.get_model_params_list(args.alg_params_json)
52
+ run({'params':""}, model_params_list, [])
53
+
54
  print(f'SUCCESS: GEDI took {dt.now()-start_gedi} sec.')
setup.py CHANGED
@@ -4,7 +4,7 @@ import os
4
  with open("README.md", "r") as fh:
5
  long_description = fh.read()
6
 
7
- version_string = os.environ.get("VERSION_PLACEHOLDER", "0.0.5")
8
  print(version_string)
9
  version = version_string
10
 
@@ -25,59 +25,14 @@ setup(
25
  'Levenshtein==0.23.0',
26
  'matplotlib==3.8.4',
27
  'numpy==1.26.4',
 
28
  'pm4py==2.7.2',
29
  'scikit-learn==1.2.2',
30
- 'scipy==1.10.1',
31
  'seaborn==0.13.2',
32
  'smac==2.0.2',
33
  'tqdm==4.65.0',
34
- 'streamlit-toggle-switch>=1.0.2',
35
- 'click==8.1.7',
36
- 'cloudpickle==3.0.0',
37
- 'configspace==0.7.1',
38
- 'cvxopt==1.3.2',
39
- 'dask==2024.2.1',
40
- 'dask-jobqueue==0.8.5',
41
- 'deprecation==2.1.0',
42
- 'distributed==2024.2.1',
43
- 'emcee==3.1.4',
44
- 'feeed == 1.2.0',
45
- 'fsspec==2024.2.0',
46
- 'imbalanced-learn==0.12.0',
47
- 'imblearn==0.0',
48
- 'importlib-metadata==7.0.1',
49
- 'intervaltree==3.1.0',
50
- 'jinja2==3.1.3',
51
- 'levenshtein==0.23.0',
52
- 'locket==1.0.0',
53
- 'lxml==5.1.0',
54
- 'markupsafe==2.1.5',
55
- 'more-itertools==10.2.0',
56
- 'msgpack==1.0.8',
57
- 'networkx==3.2.1',
58
- 'numpy==1.26.4',
59
- 'pandas>=2.0.0',
60
- 'partd==1.4.1',
61
- 'pm4py==2.7.2',
62
- 'psutil==5.9.8',
63
- 'pydotplus==2.0.2',
64
- 'pynisher==1.0.10',
65
- 'pyrfr==0.9.0',
66
- 'pyyaml==6.0.1',
67
- 'rapidfuzz==3.6.1',
68
- 'regex==2023.12.25',
69
- 'scikit-learn==1.2.2',
70
- 'scipy==1.10.1',
71
- 'seaborn==0.13.2',
72
- 'smac==2.0.2',
73
- 'sortedcontainers==2.4.0',
74
- 'stringdist==1.0.9',
75
- 'tblib==3.0.0',
76
- 'toolz==0.12.1',
77
- 'tqdm==4.65.0',
78
- 'typing-extensions==4.10.0',
79
- 'urllib3==2.2.1',
80
- 'zict==3.0.0'
81
  ],
82
  packages = ['gedi'],
83
  classifiers=[
@@ -87,4 +42,4 @@ setup(
87
  'License :: OSI Approved :: MIT License', # Again, pick a license
88
  'Programming Language :: Python :: 3.9',
89
  ],
90
- )
 
4
  with open("README.md", "r") as fh:
5
  long_description = fh.read()
6
 
7
+ version_string = os.environ.get("VERSION_PLACEHOLDER", "1.0.0")
8
  print(version_string)
9
  version = version_string
10
 
 
25
  'Levenshtein==0.23.0',
26
  'matplotlib==3.8.4',
27
  'numpy==1.26.4',
28
+ 'pandas==2.2.2',
29
  'pm4py==2.7.2',
30
  'scikit-learn==1.2.2',
31
+ 'scipy==1.13.0',
32
  'seaborn==0.13.2',
33
  'smac==2.0.2',
34
  'tqdm==4.65.0',
35
+ 'streamlit-toggle-switch>=1.0.2'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ],
37
  packages = ['gedi'],
38
  classifiers=[
 
42
  'License :: OSI Approved :: MIT License', # Again, pick a license
43
  'Programming Language :: Python :: 3.9',
44
  ],
45
+ )
utils/column_mappings.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ def column_mappings():
2
+
3
+ column_names_short = {
4
+ 'rutpt': 'ratio_unique_traces_per_trace',
5
+ 'rmcv': 'ratio_most_common_variant',
6
+ 'tlcv': 'trace_len_coefficient_variation',
7
+ 'mvo': 'mean_variant_occurrence',
8
+ 'enve': 'epa_normalized_variant_entropy',
9
+ 'ense': 'epa_normalized_sequence_entropy',
10
+ 'eself': 'epa_sequence_entropy_linear_forgetting',
11
+ 'enself': 'epa_normalized_sequence_entropy_linear_forgetting',
12
+ 'eseef': 'epa_sequence_entropy_exponential_forgetting',
13
+ 'enseef': 'epa_normalized_sequence_entropy_exponential_forgetting'
14
+ }
15
+
16
+ return column_names_short
utils/config_fabric.py CHANGED
@@ -13,6 +13,7 @@ import time
13
  import shutil
14
  import zipfile
15
  import io
 
16
 
17
  st.set_page_config(layout='wide')
18
  INPUT_XES="output/inputlog_temp.xes"
@@ -174,6 +175,10 @@ def set_generator_experiments(generator_params):
174
  df = pd.read_csv(uploaded_file)
175
  if len(df.columns) <= 1:
176
  raise pd.errors.ParserError("Please select a file withat least two columns (e.g. log, feature) and use ',' as a delimiter.")
 
 
 
 
177
  sel_features = st.multiselect("Selected features", list(df.columns), list(df.columns)[-1])
178
  if sel_features:
179
  df = df[sel_features]
 
13
  import shutil
14
  import zipfile
15
  import io
16
+ from column_mappings import column_mappings
17
 
18
  st.set_page_config(layout='wide')
19
  INPUT_XES="output/inputlog_temp.xes"
 
175
  df = pd.read_csv(uploaded_file)
176
  if len(df.columns) <= 1:
177
  raise pd.errors.ParserError("Please select a file withat least two columns (e.g. log, feature) and use ',' as a delimiter.")
178
+ columns_to_rename = {col: column_mappings()[col] for col in df.columns if col in column_mappings()}
179
+
180
+ # Rename the matching columns
181
+ df.rename(columns=columns_to_rename, inplace=True)
182
  sel_features = st.multiselect("Selected features", list(df.columns), list(df.columns)[-1])
183
  if sel_features:
184
  df = df[sel_features]