MilesCranmer commited on
Commit
4584576
1 Parent(s): 147e8d5

Fix doc formatting

Browse files
Files changed (1) hide show
  1. pysr/sr.py +60 -60
pysr/sr.py CHANGED
@@ -131,118 +131,118 @@ def pysr(X, y, weights=None,
131
  `binary_operators`, `unary_operators` to your requirements.
132
 
133
  :param X: np.ndarray or pandas.DataFrame, 2D array. Rows are examples,
134
- columns are features. If pandas DataFrame, the columns are used
135
- for variable names (so make sure they don't contain spaces).
136
  :param y: np.ndarray, 1D array (rows are examples) or 2D array (rows
137
- are examples, columns are outputs). Putting in a 2D array will
138
- trigger a search for equations for each feature of y.
139
  :param weights: np.ndarray, same shape as y. Each element is how to
140
- weight the mean-square-error loss for that particular element
141
- of y.
142
  :param binary_operators: list, List of strings giving the binary operators
143
- in Julia's Base. Default is ["+", "-", "*", "/",].
144
  :param unary_operators: list, Same but for operators taking a single scalar.
145
- Default is [].
146
  :param procs: int, Number of processes (=number of populations running).
147
  :param loss: str, String of Julia code specifying the loss function.
148
- Can either be a loss from LossFunctions.jl, or your own
149
- loss written as a function. Examples of custom written losses
150
- include: `myloss(x, y) = abs(x-y)` for non-weighted, or
151
- `myloss(x, y, w) = w*abs(x-y)` for weighted.
152
- Among the included losses, these are as follows. Regression:
153
- `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
154
- `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`,
155
- `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.
156
- Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
157
- `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
158
- `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
159
  :param populations: int, Number of populations running.
160
  :param niterations: int, Number of iterations of the algorithm to run. The best
161
- equations are printed, and migrate between populations, at the
162
- end of each.
163
  :param ncyclesperiteration: int, Number of total mutations to run, per 10
164
- samples of the population, per iteration.
165
  :param alpha: float, Initial temperature.
166
  :param annealing: bool, Whether to use annealing. You should (and it is default).
167
  :param fractionReplaced: float, How much of population to replace with migrating
168
- equations from other populations.
169
  :param fractionReplacedHof: float, How much of population to replace with migrating
170
- equations from hall of fame.
171
  :param npop: int, Number of individuals in each population
172
  :param parsimony: float, Multiplicative factor for how much to punish complexity.
173
  :param migration: bool, Whether to migrate.
174
  :param hofMigration: bool, Whether to have the hall of fame migrate.
175
  :param shouldOptimizeConstants: bool, Whether to numerically optimize
176
- constants (Nelder-Mead/Newton) at the end of each iteration.
177
  :param topn: int, How many top individuals migrate from each population.
178
  :param perturbationFactor: float, Constants are perturbed by a max
179
- factor of (perturbationFactor*T + 1). Either multiplied by this
180
- or divided by this.
181
  :param weightAddNode: float, Relative likelihood for mutation to add a node
182
  :param weightInsertNode: float, Relative likelihood for mutation to insert a node
183
  :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
184
  :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
185
  :param weightMutateConstant: float, Relative likelihood for mutation to change
186
- the constant slightly in a random direction.
187
  :param weightMutateOperator: float, Relative likelihood for mutation to swap
188
- an operator.
189
  :param weightRandomize: float, Relative likelihood for mutation to completely
190
- delete and then randomly generate the equation
191
  :param weightSimplify: float, Relative likelihood for mutation to simplify
192
- constant parts by evaluation
193
  :param timeout: float, Time in seconds to timeout search
194
  :param equation_file: str, Where to save the files (.csv separated by |)
195
  :param verbosity: int, What verbosity level to use. 0 means minimal print statements.
196
  :param progress: bool, Whether to use a progress bar instead of printing to stdout.
197
  :param maxsize: int, Max size of an equation.
198
  :param maxdepth: int, Max depth of an equation. You can use both maxsize and maxdepth.
199
- maxdepth is by default set to = maxsize, which means that it is redundant.
200
  :param fast_cycle: bool, (experimental) - batch over population subsamples. This
201
- is a slightly different algorithm than regularized evolution, but does cycles
202
- 15% faster. May be algorithmically less efficient.
203
  :param variable_names: list, a list of names for the variables, other
204
- than "x0", "x1", etc.
205
  :param batching: bool, whether to compare population members on small batches
206
- during evolution. Still uses full dataset for comparing against
207
- hall of fame.
208
  :param batchSize: int, the amount of data to use if doing batching.
209
  :param select_k_features: (None, int), whether to run feature selection in
210
- Python using random forests, before passing to the symbolic regression
211
- code. None means no feature selection; an int means select that many
212
- features.
213
  :param warmupMaxsizeBy: float, whether to slowly increase max size from
214
- a small number up to the maxsize (if greater than 0).
215
- If greater than 0, says the fraction of training time at which
216
- the current maxsize will reach the user-passed maxsize.
217
  :param constraints: dict of int (unary) or 2-tuples (binary),
218
- this enforces maxsize constraints on the individual
219
- arguments of operators. E.g., `'pow': (-1, 1)`
220
- says that power laws can have any complexity left argument, but only
221
- 1 complexity exponent. Use this to force more interpretable solutions.
222
  :param useFrequency: bool, whether to measure the frequency of complexities,
223
- and use that instead of parsimony to explore equation space. Will
224
- naturally find equations of all complexities.
225
  :param julia_optimization: int, Optimization level (0, 1, 2, 3)
226
  :param tempdir: str or None, directory for the temporary files
227
  :param delete_tempfiles: bool, whether to delete the temporary files after finishing
228
  :param julia_project: str or None, a Julia environment location containing
229
- a Project.toml (and potentially the source code for SymbolicRegression.jl).
230
- Default gives the Python package directory, where a Project.toml file
231
- should be present from the install.
232
  :param user_input: Whether to ask for user input or not for installing (to
233
- be used for automated scripts). Will choose to install when asked.
234
  :param update: Whether to automatically update Julia packages.
235
  :param temp_equation_file: Whether to put the hall of fame file in
236
- the temp directory. Deletion is then controlled with the
237
- delete_tempfiles argument.
238
  :param output_jax_format: Whether to create a 'jax_format' column in the output,
239
- containing jax-callable functions and the default parameters in a jax array.
240
  :param output_torch_format: Whether to create a 'torch_format' column in the output,
241
- containing a torch module with trainable parameters.
242
  :returns: pd.DataFrame or list, Results dataframe,
243
- giving complexity, MSE, and equations (as strings), as well as functional
244
- forms. If list, each element corresponds to a dataframe of equations
245
- for each output.
246
  """
247
  if binary_operators is None:
248
  binary_operators = '+ * - /'.split(' ')
 
131
  `binary_operators`, `unary_operators` to your requirements.
132
 
133
  :param X: np.ndarray or pandas.DataFrame, 2D array. Rows are examples,
134
+ columns are features. If pandas DataFrame, the columns are used
135
+ for variable names (so make sure they don't contain spaces).
136
  :param y: np.ndarray, 1D array (rows are examples) or 2D array (rows
137
+ are examples, columns are outputs). Putting in a 2D array will
138
+ trigger a search for equations for each feature of y.
139
  :param weights: np.ndarray, same shape as y. Each element is how to
140
+ weight the mean-square-error loss for that particular element
141
+ of y.
142
  :param binary_operators: list, List of strings giving the binary operators
143
+ in Julia's Base. Default is ["+", "-", "*", "/",].
144
  :param unary_operators: list, Same but for operators taking a single scalar.
145
+ Default is [].
146
  :param procs: int, Number of processes (=number of populations running).
147
  :param loss: str, String of Julia code specifying the loss function.
148
+ Can either be a loss from LossFunctions.jl, or your own
149
+ loss written as a function. Examples of custom written losses
150
+ include: `myloss(x, y) = abs(x-y)` for non-weighted, or
151
+ `myloss(x, y, w) = w*abs(x-y)` for weighted.
152
+ Among the included losses, these are as follows. Regression:
153
+ `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
154
+ `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`,
155
+ `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.
156
+ Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
157
+ `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
158
+ `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
159
  :param populations: int, Number of populations running.
160
  :param niterations: int, Number of iterations of the algorithm to run. The best
161
+ equations are printed, and migrate between populations, at the
162
+ end of each.
163
  :param ncyclesperiteration: int, Number of total mutations to run, per 10
164
+ samples of the population, per iteration.
165
  :param alpha: float, Initial temperature.
166
  :param annealing: bool, Whether to use annealing. You should (and it is default).
167
  :param fractionReplaced: float, How much of population to replace with migrating
168
+ equations from other populations.
169
  :param fractionReplacedHof: float, How much of population to replace with migrating
170
+ equations from hall of fame.
171
  :param npop: int, Number of individuals in each population
172
  :param parsimony: float, Multiplicative factor for how much to punish complexity.
173
  :param migration: bool, Whether to migrate.
174
  :param hofMigration: bool, Whether to have the hall of fame migrate.
175
  :param shouldOptimizeConstants: bool, Whether to numerically optimize
176
+ constants (Nelder-Mead/Newton) at the end of each iteration.
177
  :param topn: int, How many top individuals migrate from each population.
178
  :param perturbationFactor: float, Constants are perturbed by a max
179
+ factor of (perturbationFactor*T + 1). Either multiplied by this
180
+ or divided by this.
181
  :param weightAddNode: float, Relative likelihood for mutation to add a node
182
  :param weightInsertNode: float, Relative likelihood for mutation to insert a node
183
  :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
184
  :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
185
  :param weightMutateConstant: float, Relative likelihood for mutation to change
186
+ the constant slightly in a random direction.
187
  :param weightMutateOperator: float, Relative likelihood for mutation to swap
188
+ an operator.
189
  :param weightRandomize: float, Relative likelihood for mutation to completely
190
+ delete and then randomly generate the equation
191
  :param weightSimplify: float, Relative likelihood for mutation to simplify
192
+ constant parts by evaluation
193
  :param timeout: float, Time in seconds to timeout search
194
  :param equation_file: str, Where to save the files (.csv separated by |)
195
  :param verbosity: int, What verbosity level to use. 0 means minimal print statements.
196
  :param progress: bool, Whether to use a progress bar instead of printing to stdout.
197
  :param maxsize: int, Max size of an equation.
198
  :param maxdepth: int, Max depth of an equation. You can use both maxsize and maxdepth.
199
+ maxdepth is by default set to = maxsize, which means that it is redundant.
200
  :param fast_cycle: bool, (experimental) - batch over population subsamples. This
201
+ is a slightly different algorithm than regularized evolution, but does cycles
202
+ 15% faster. May be algorithmically less efficient.
203
  :param variable_names: list, a list of names for the variables, other
204
+ than "x0", "x1", etc.
205
  :param batching: bool, whether to compare population members on small batches
206
+ during evolution. Still uses full dataset for comparing against
207
+ hall of fame.
208
  :param batchSize: int, the amount of data to use if doing batching.
209
  :param select_k_features: (None, int), whether to run feature selection in
210
+ Python using random forests, before passing to the symbolic regression
211
+ code. None means no feature selection; an int means select that many
212
+ features.
213
  :param warmupMaxsizeBy: float, whether to slowly increase max size from
214
+ a small number up to the maxsize (if greater than 0).
215
+ If greater than 0, says the fraction of training time at which
216
+ the current maxsize will reach the user-passed maxsize.
217
  :param constraints: dict of int (unary) or 2-tuples (binary),
218
+ this enforces maxsize constraints on the individual
219
+ arguments of operators. E.g., `'pow': (-1, 1)`
220
+ says that power laws can have any complexity left argument, but only
221
+ 1 complexity exponent. Use this to force more interpretable solutions.
222
  :param useFrequency: bool, whether to measure the frequency of complexities,
223
+ and use that instead of parsimony to explore equation space. Will
224
+ naturally find equations of all complexities.
225
  :param julia_optimization: int, Optimization level (0, 1, 2, 3)
226
  :param tempdir: str or None, directory for the temporary files
227
  :param delete_tempfiles: bool, whether to delete the temporary files after finishing
228
  :param julia_project: str or None, a Julia environment location containing
229
+ a Project.toml (and potentially the source code for SymbolicRegression.jl).
230
+ Default gives the Python package directory, where a Project.toml file
231
+ should be present from the install.
232
  :param user_input: Whether to ask for user input or not for installing (to
233
+ be used for automated scripts). Will choose to install when asked.
234
  :param update: Whether to automatically update Julia packages.
235
  :param temp_equation_file: Whether to put the hall of fame file in
236
+ the temp directory. Deletion is then controlled with the
237
+ delete_tempfiles argument.
238
  :param output_jax_format: Whether to create a 'jax_format' column in the output,
239
+ containing jax-callable functions and the default parameters in a jax array.
240
  :param output_torch_format: Whether to create a 'torch_format' column in the output,
241
+ containing a torch module with trainable parameters.
242
  :returns: pd.DataFrame or list, Results dataframe,
243
+ giving complexity, MSE, and equations (as strings), as well as functional
244
+ forms. If list, each element corresponds to a dataframe of equations
245
+ for each output.
246
  """
247
  if binary_operators is None:
248
  binary_operators = '+ * - /'.split(' ')