christianmaxmike commited on
Commit
1dbd766
·
1 Parent(s): 718ccd7

further stuff of readme.md

Browse files
Files changed (1) hide show
  1. README.md +100 -9
README.md CHANGED
@@ -29,12 +29,12 @@ conda install pyrfr swig
29
  conda activate gedi
30
  python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_test.json
31
  ```
32
- ## Usage
33
  Our pipeline offers several pipeline steps, which can be run sequentially or partially:
34
- - feature_extraction
35
- - generation
36
- - benchmark
37
- - evaluation_plotter
38
 
39
  We also include two notebooks, which output experimental results as in our paper.
40
 
@@ -46,15 +46,106 @@ python main.py -o config_files/options/baseline.json -a config_files/algorithm/<
46
  For reference of possible keys and values for each step, please see `config_files/algorithm/experiment_test.json`.
47
  To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
48
 
49
- ## Evaluation real targets
50
- In order to execute the experiments with real targets,we employ the config file `config_files/algorithm/experiment_real_targets.json`. The script's pipeline will output the generated event logs with meta features values being optimized towards meta features of real-world benchmark datasets. Furthermore, it will output the respective feature values in the `\output`folder as well as the benchmark values.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ```console
53
  conda activate gedi
54
- python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_real_targets.json.json
55
  ```
56
 
57
- ## Result plotting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  In the following, we describe the ipynb in the folder `\notebooks` to reproduce the illustrations from our paper.
59
 
60
 
 
29
  conda activate gedi
30
  python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_test.json
31
  ```
32
+ ## General Usage
33
  Our pipeline offers several pipeline steps, which can be run sequentially or partially:
34
+ - [Feature Extraction](#feature-extraction)
35
+ - [Generation](#generation)
36
+ - [Benchmark](#benchmark)
37
+ - [Evaluation Plotter](#evaluation_plotter)
38
 
39
  We also include two notebooks, which output experimental results as in our paper.
40
 
 
46
  For reference of possible keys and values for each step, please see `config_files/algorithm/experiment_test.json`.
47
  To run the whole pipeline please create a new `.json` file, specifying all steps you want to run and specify desired keys and values for each step.
48
 
49
+ ### Feature Extraction
50
+ ---
51
+ In order to extract the meta features being used for hyperparameter optimization, we employ the following script:
52
+ ```console
53
+ conda activate gedi
54
+ python main.py -o config_files/options/baseline.json -a config_files/algorithm/feature_extraction.json
55
+ ```
56
+ The json file consists of the following key-value pairs:
57
+
58
+ - pipeline_step: denotes the current step in the pipeline (here: feature_extraction)
59
+ - input_path: folder to the input files
60
+ - feature params: defines a dictionary, where the inner dictionary consists of a key-value pair 'feature_set' with a list of features being extracted from the references files. A list of valid features can be looked up from the FEEED extractor
61
+ - output_path: defines the path where plot are saved to
62
+ - real_eventlog_path: defines the file with the meta features extracted from the real event logs
63
+ - plot_type: defines the style of the output plotting (possible values: violinplot, boxplot)
64
+ - font_size: label font size of the output plot
65
+ - boxplot_widht: width of the violinplot/boxplot
66
+
67
+
68
+ ### Generation
69
+ ---
70
+ After having extracted meta features from the files, the next step is to generate event log data accordingly. Generally, there are two settings on how the targets are defined: i) meta feature targets are defined by the meta features from the real event log data; ii) a configuration space is defined which resembles the feasible meta features space.
71
+
72
+ The command to execute the generation step is given by a exemplarily generation.json file:
73
+
74
+ ```console
75
+ conda activate gedi
76
+ python main.py -o config_files/options/baseline.json -a config_files/algorithm/generation.json
77
+ ```
78
+
79
+ In the `generation.json`, we have the following key-value pairs:
80
+
81
+ * pipeline_step: denotes the current step in the pipeline (here: event_logs_generation)
82
+ * output_path: defines the output folder
83
+ * generator_params: defines the configuration of the generator itself. For the generator itself, we can set values for the general 'experiment', 'config_space', 'n_trials', and a specific 'plot_reference_feature' being used for plotting
84
+
85
+ - experiment: defines the path to the input file which contains the meta features which are used for the optimization step. The 'objectives' defines the specific meta features which are used as optimization criteria.
86
+ - config_space: here, we define the configuration of the generator module (here: process tree generator). The process tree generator can process input information which defines characteristics for the generated data (a more thorough overview of the params can be found [here](https://github.com/tjouck/PTandLogGenerator):
87
+
88
+ - mode: most frequent number of visible activities
89
+ - sequence: probability to add a sequence operator to tree
90
+ - choice: probability to add a choice operator to tree
91
+ - parallel: probability to add a parallel operator to tree
92
+ - loop: probability to add a loop operator to tree
93
+ - silent: probability to add silent activity to a choice or loop operator
94
+ - lt_dependency: probability to add a random dependency to the tree
95
+ - num_traces: the number of traces in the event log
96
+ - duplicate: probability to duplicate an activity label
97
+ - or: probability to add an or operator to tree
98
+
99
+ - n_trials: the maximum number of trials for the hyperparameter optimization to find a feasible solution to the specific configuration being used as target
100
+
101
+ - plot_reference_feature: defines the feature which is used on the x-axis on the output plots, i.e., each feature defined in the 'objectives' of the 'experiment' is plotted against the reference feature being defined in this value
102
+
103
+
104
+ #### Supplementary: Generating data with real targets
105
+ In order to execute the experiments with real targets,we employ exemplarily the onfig file `config_files/algorithm/experiment_real_targets.json`. The script's pipeline will output the generated event logs with meta features values being optimized towards meta features of real-world benchmark datasets. Furthermore, it will output the respective feature values in the `\output`folder as well as the benchmark values.
106
 
107
  ```console
108
  conda activate gedi
109
+ python main.py -o config_files/options/baseline.json -a config_files/algorithm/experiment_real_targets.json
110
  ```
111
 
112
+
113
+
114
+ ### Benchmark
115
+ The benchmarking defines the downstream task which is used for evaluationg the goodness of the synthesized event log datasets with the metrics of real world datasets. The command to execute a benchmarking is shown in the following script:
116
+
117
+ ```console
118
+ conda activate gedi
119
+ python main.py -o config_files/options/baseline.json -a config_files/algorithm/benchmark.json
120
+ ```
121
+
122
+ In the `benchmark.json`, we have the following key-value pairs:
123
+
124
+ * pipeline_step: denotes the current step in the pipeline (here: benchmark_test)
125
+ * benchmark_test: defines the downstream task. Currently (in v 1.0), only 'discovery' for process discovery is implemented
126
+ * input_path: defines the input folder where the synthesized event log data are stored
127
+ * output_path: defines the output folder
128
+ * miners: defines the miners for the downstream task 'discovery' which are used in the benchmarking. In v 1.0 the miners 'inductive' for inductive miner, 'heuristics' for heuristics miner, 'imf' for inductive miner infrequent, as well as 'ilp' for integer linear programming are implemented
129
+
130
+
131
+ ### Evaluation Plotting
132
+ The purpose of the evaluation plotting step is used just for visualization. Some examples of how the plotter can be used is shown in the following exemplarily script:
133
+
134
+
135
+ ```console
136
+ conda activate gedi
137
+ python main.py -o config_files/options/baseline.json -a config_files/algorithm/evaluation_plotter.json
138
+ ```
139
+
140
+ Generally, in the `evaluation_plotter.json`, we have the following key-value pairs:
141
+
142
+ * pipeline_step: denotes the current step in the pipeline (here: evaluation_plotter)
143
+ * input_path: defines the input file or the input folder which is considered for the visualizations. If a single file is specified, only the meta features in that file are considered whereas in the case of specifying a folder, the framework iterates over all files and use them for plotting
144
+ * plot_reference_feature: defines the feature which is used on the x-axis on the output plots, i.e., each feature defined in the input file is plotted against the reference feature being defined in this value
145
+ * targets: defines the target values which are also used as reference. Likewise to the input_path, the targets can be specified by single file or by a folder
146
+ * output_path: defines where to store the plots
147
+
148
+ ## Further results plotting
149
  In the following, we describe the ipynb in the folder `\notebooks` to reproduce the illustrations from our paper.
150
 
151