Spaces:
Running
Running
christianmaxmike
commited on
Commit
·
1dbd766
1
Parent(s):
718ccd7
further stuff of readme.md
Browse files
README.md
CHANGED
@@ -29,12 +29,12 @@ conda install pyrfr swig
|
|
29 |
conda activate gedi
|
30 |
python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_test.json
|
31 |
```
|
32 |
-
## Usage
|
33 |
Our pipeline offers several pipeline steps, which can be run sequentially or partially:
|
34 |
-
-
|
35 |
-
- generation
|
36 |
-
- benchmark
|
37 |
-
- evaluation_plotter
|
38 |
|
39 |
We also include two notebooks, which output experimental results as in our paper.
|
40 |
|
@@ -46,15 +46,106 @@ python main.py -o config_files/options/baseline.json -a config_files/algorithm/<
|
|
46 |
For reference of possible keys and values for each step, please see `config_files/algorithm/experiment_test.json`.
|
47 |
To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
|
48 |
|
49 |
-
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
```console
|
53 |
conda activate gedi
|
54 |
-
python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_real_targets.json
|
55 |
```
|
56 |
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
In the following, we describe the ipynb in the folder `\notebooks` to reproduce the illustrations from our paper.
|
59 |
|
60 |
|
|
|
29 |
conda activate gedi
|
30 |
python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_test.json
|
31 |
```
|
32 |
+
## General Usage
|
33 |
Our pipeline offers several pipeline steps, which can be run sequentially or partially:
|
34 |
+
- [Feature Extraction](#feature-extraction)
|
35 |
+
- [Generation](#generation)
|
36 |
+
- [Benchmark](#benchmark)
|
37 |
+
- [Evaluation Plotter](#evaluation_plotter)
|
38 |
|
39 |
We also include two notebooks, which output experimental results as in our paper.
|
40 |
|
|
|
46 |
For reference of possible keys and values for each step, please see `config_files/algorithm/experiment_test.json`.
|
47 |
To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
|
48 |
|
49 |
+
### Feature Extraction
|
50 |
+
---
|
51 |
+
In order to extract the meta features being used for hyperparameter optimization, we employ the following script:
|
52 |
+
```console
|
53 |
+
conda activate gedi
|
54 |
+
python main.py -o config_files/options/baseline.json -a config_files/algorithm/feature_extraction.json
|
55 |
+
```
|
56 |
+
The json file consists of the following key-value pairs:
|
57 |
+
|
58 |
+
- pipeline_step: denotes the current step in the pipeline (here: feature_extraction)
|
59 |
+
- input_path: folder to the input files
|
60 |
+
- feature params: defines a dictionary, where the inner dictionary consists of a key-value pair 'feature_set' with a list of features being extracted from the references files. A list of valid features can be looked up from the FEEED extractor
|
61 |
+
- output_path: defines the path where plot are saved to
|
62 |
+
- real_eventlog_path: defines the file with the meta features extracted from the real event logs
|
63 |
+
- plot_type: defines the style of the output plotting (possible values: violinplot, boxplot)
|
64 |
+
- font_size: label font size of the output plot
|
65 |
+
- boxplot_widht: width of the violinplot/boxplot
|
66 |
+
|
67 |
+
|
68 |
+
### Generation
|
69 |
+
---
|
70 |
+
After having extracted meta features from the files, the next step is to generate event log data accordingly. Generally, there are two settings on how the targets are defined: i) meta feature targets are defined by the meta features from the real event log data; ii) a configuration space is defined which resembles the feasible meta features space.
|
71 |
+
|
72 |
+
The command to execute the generation step is given by a exemplarily generation.json file:
|
73 |
+
|
74 |
+
```console
|
75 |
+
conda activate gedi
|
76 |
+
python main.py -o config_files/options/baseline.json -a config_files/algorithm/generation.json
|
77 |
+
```
|
78 |
+
|
79 |
+
In the `generation.json`, we have the following key-value pairs:
|
80 |
+
|
81 |
+
* pipeline_step: denotes the current step in the pipeline (here: event_logs_generation)
|
82 |
+
* output_path: defines the output folder
|
83 |
+
* generator_params: defines the configuration of the generator itself. For the generator itself, we can set values for the general 'experiment', 'config_space', 'n_trials', and a specific 'plot_reference_feature' being used for plotting
|
84 |
+
|
85 |
+
- experiment: defines the path to the input file which contains the meta features which are used for the optimization step. The 'objectives' defines the specific meta features which are used as optimization criteria.
|
86 |
+
- config_space: here, we define the configuration of the generator module (here: process tree generator). The process tree generator can process input information which defines characteristics for the generated data (a more thorough overview of the params can be found [here](https://github.com/tjouck/PTandLogGenerator):
|
87 |
+
|
88 |
+
- mode: most frequent number of visible activities
|
89 |
+
- sequence: probability to add a sequence operator to tree
|
90 |
+
- choice: probability to add a choice operator to tree
|
91 |
+
- parallel: probability to add a parallel operator to tree
|
92 |
+
- loop: probability to add a loop operator to tree
|
93 |
+
- silent: probability to add silent activity to a choice or loop operator
|
94 |
+
- lt_dependency: probability to add a random dependency to the tree
|
95 |
+
- num_traces: the number of traces in the event log
|
96 |
+
- duplicate: probability to duplicate an activity label
|
97 |
+
- or: probability to add an or operator to tree
|
98 |
+
|
99 |
+
- n_trials: the maximum number of trials for the hyperparameter optimization to find a feasible solution to the specific configuration being used as target
|
100 |
+
|
101 |
+
- plot_reference_feature: defines the feature which is used on the x-axis on the output plots, i.e., each feature defined in the 'objectives' of the 'experiment' is plotted against the reference feature being defined in this value
|
102 |
+
|
103 |
+
|
104 |
+
#### Supplementary: Generating data with real targets
|
105 |
+
In order to execute the experiments with real targets,we employ exemplarily the onfig file `config_files/algorithm/experiment_real_targets.json`. The script's pipeline will output the generated event logs with meta features values being optimized towards meta features of real-world benchmark datasets. Furthermore, it will output the respective feature values in the `\output`folder as well as the benchmark values.
|
106 |
|
107 |
```console
|
108 |
conda activate gedi
|
109 |
+
python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_real_targets.json
|
110 |
```
|
111 |
|
112 |
+
|
113 |
+
|
114 |
+
### Benchmark
|
115 |
+
The benchmarking defines the downstream task which is used for evaluationg the goodness of the synthesized event log datasets with the metrics of real world datasets. The command to execute a benchmarking is shown in the following script:
|
116 |
+
|
117 |
+
```console
|
118 |
+
conda activate gedi
|
119 |
+
python main.py -o config_files/options/baseline.json -a config_files/algorithm/benchmark.json
|
120 |
+
```
|
121 |
+
|
122 |
+
In the `benchmark.json`, we have the following key-value pairs:
|
123 |
+
|
124 |
+
* pipeline_step: denotes the current step in the pipeline (here: benchmark_test)
|
125 |
+
* benchmark_test: defines the downstream task. Currently (in v 1.0), only 'discovery' for process discovery is implemented
|
126 |
+
* input_path: defines the input folder where the synthesized event log data are stored
|
127 |
+
* output_path: defines the output folder
|
128 |
+
* miners: defines the miners for the downstream task 'discovery' which are used in the benchmarking. In v 1.0 the miners 'inductive' for inductive miner, 'heuristics' for heuristics miner, 'imf' for inductive miner infrequent, as well as 'ilp' for integer linear programming are implemented
|
129 |
+
|
130 |
+
|
131 |
+
### Evaluation Plotting
|
132 |
+
The purpose of the evaluation plotting step is used just for visualization. Some examples of how the plotter can be used is shown in the following exemplarily script:
|
133 |
+
|
134 |
+
|
135 |
+
```console
|
136 |
+
conda activate gedi
|
137 |
+
python main.py -o config_files/options/baseline.json -a config_files/algorithm/evaluation_plotter.json
|
138 |
+
```
|
139 |
+
|
140 |
+
Generally, in the `evaluation_plotter.json`, we have the following key-value pairs:
|
141 |
+
|
142 |
+
* pipeline_step: denotes the current step in the pipeline (here: evaluation_plotter)
|
143 |
+
* input_path: defines the input file or the input folder which is considered for the visualizations. If a single file is specified, only the meta features in that file are considered whereas in the case of specifying a folder, the framework iterates over all files and use them for plotting
|
144 |
+
* plot_reference_feature: defines the feature which is used on the x-axis on the output plots, i.e., each feature defined in the input file is plotted against the reference feature being defined in this value
|
145 |
+
* targets: defines the target values which are also used as reference. Likewise to the input_path, the targets can be specified by single file or by a folder
|
146 |
+
* output_path: defines where to store the plots
|
147 |
+
|
148 |
+
## Further results plotting
|
149 |
In the following, we describe the ipynb in the folder `\notebooks` to reproduce the illustrations from our paper.
|
150 |
|
151 |
|