MilesCranmer commited on
Commit
beecd14
1 Parent(s): 51f7688

Proper pydoc markdown format

Browse files
Files changed (1) hide show
  1. pysr/sr.py +89 -83
pysr/sr.py CHANGED
@@ -130,118 +130,124 @@ def pysr(X, y, weights=None,
130
  equations, but you should adjust `niterations`,
131
  `binary_operators`, `unary_operators` to your requirements.
132
 
133
- :param X: np.ndarray or pandas.DataFrame, 2D array. Rows are examples, \
134
- columns are features. If pandas DataFrame, the columns are used \
 
 
135
  for variable names (so make sure they don't contain spaces).
136
- :param y: np.ndarray, 1D array (rows are examples) or 2D array (rows \
137
- are examples, columns are outputs). Putting in a 2D array will \
138
  trigger a search for equations for each feature of y.
139
- :param weights: np.ndarray, same shape as y. Each element is how to \
140
- weight the mean-square-error loss for that particular element \
141
  of y.
142
- :param binary_operators: list, List of strings giving the binary operators \
143
  in Julia's Base. Default is ["+", "-", "*", "/",].
144
- :param unary_operators: list, Same but for operators taking a single scalar. \
145
  Default is [].
146
- :param procs: int, Number of processes (=number of populations running).
147
- :param loss: str, String of Julia code specifying the loss function. \
148
- Can either be a loss from LossFunctions.jl, or your own \
149
- loss written as a function. Examples of custom written losses \
150
- include: `myloss(x, y) = abs(x-y)` for non-weighted, or \
151
- `myloss(x, y, w) = w*abs(x-y)` for weighted. \
152
- Among the included losses, these are as follows. Regression: \
153
- `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square), \
154
- `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(系)`, \
155
- `L2EpsilonInsLoss(系)`, `PeriodicLoss(c)`, `QuantileLoss(蟿)`. \
156
- Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`, \
157
- `SmoothedL1HingeLoss(纬)`, `ModifiedHuberLoss()`, `L2MarginLoss()`, \
158
  `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
159
- :param populations: int, Number of populations running.
160
- :param niterations: int, Number of iterations of the algorithm to run. The best \
161
- equations are printed, and migrate between populations, at the \
162
  end of each.
163
- :param ncyclesperiteration: int, Number of total mutations to run, per 10 \
164
  samples of the population, per iteration.
165
- :param alpha: float, Initial temperature.
166
- :param annealing: bool, Whether to use annealing. You should (and it is default).
167
- :param fractionReplaced: float, How much of population to replace with migrating \
168
  equations from other populations.
169
- :param fractionReplacedHof: float, How much of population to replace with migrating \
170
  equations from hall of fame.
171
- :param npop: int, Number of individuals in each population
172
- :param parsimony: float, Multiplicative factor for how much to punish complexity.
173
- :param migration: bool, Whether to migrate.
174
- :param hofMigration: bool, Whether to have the hall of fame migrate.
175
- :param shouldOptimizeConstants: bool, Whether to numerically optimize \
176
  constants (Nelder-Mead/Newton) at the end of each iteration.
177
- :param topn: int, How many top individuals migrate from each population.
178
- :param perturbationFactor: float, Constants are perturbed by a max \
179
- factor of (perturbationFactor*T + 1). Either multiplied by this \
180
  or divided by this.
181
- :param weightAddNode: float, Relative likelihood for mutation to add a node
182
- :param weightInsertNode: float, Relative likelihood for mutation to insert a node
183
- :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
184
- :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
185
- :param weightMutateConstant: float, Relative likelihood for mutation to change \
186
  the constant slightly in a random direction.
187
- :param weightMutateOperator: float, Relative likelihood for mutation to swap \
188
  an operator.
189
- :param weightRandomize: float, Relative likelihood for mutation to completely \
190
  delete and then randomly generate the equation
191
- :param weightSimplify: float, Relative likelihood for mutation to simplify \
192
  constant parts by evaluation
193
- :param timeout: float, Time in seconds to timeout search
194
- :param equation_file: str, Where to save the files (.csv separated by |)
195
- :param verbosity: int, What verbosity level to use. 0 means minimal print statements.
196
- :param progress: bool, Whether to use a progress bar instead of printing to stdout.
197
- :param maxsize: int, Max size of an equation.
198
- :param maxdepth: int, Max depth of an equation. You can use both maxsize and maxdepth. \
199
  maxdepth is by default set to = maxsize, which means that it is redundant.
200
- :param fast_cycle: bool, (experimental) - batch over population subsamples. This \
201
- is a slightly different algorithm than regularized evolution, but does cycles \
202
  15% faster. May be algorithmically less efficient.
203
- :param variable_names: list, a list of names for the variables, other \
204
  than "x0", "x1", etc.
205
- :param batching: bool, whether to compare population members on small batches \
206
- during evolution. Still uses full dataset for comparing against \
207
  hall of fame.
208
- :param batchSize: int, the amount of data to use if doing batching.
209
- :param select_k_features: (None, int), whether to run feature selection in \
210
- Python using random forests, before passing to the symbolic regression \
211
- code. None means no feature selection; an int means select that many \
212
  features.
213
- :param warmupMaxsizeBy: float, whether to slowly increase max size from \
214
- a small number up to the maxsize (if greater than 0). \
215
- If greater than 0, says the fraction of training time at which \
216
  the current maxsize will reach the user-passed maxsize.
217
- :param constraints: dict of int (unary) or 2-tuples (binary), \
218
- this enforces maxsize constraints on the individual \
219
- arguments of operators. E.g., `'pow': (-1, 1)` \
220
- says that power laws can have any complexity left argument, but only \
 
221
  1 complexity exponent. Use this to force more interpretable solutions.
222
- :param useFrequency: bool, whether to measure the frequency of complexities, \
223
- and use that instead of parsimony to explore equation space. Will \
224
  naturally find equations of all complexities.
225
- :param julia_optimization: int, Optimization level (0, 1, 2, 3)
226
- :param tempdir: str or None, directory for the temporary files
227
- :param delete_tempfiles: bool, whether to delete the temporary files after finishing
228
- :param julia_project: str or None, a Julia environment location containing \
229
- a Project.toml (and potentially the source code for SymbolicRegression.jl). \
230
- Default gives the Python package directory, where a Project.toml file \
231
  should be present from the install.
232
- :param user_input: Whether to ask for user input or not for installing (to \
233
  be used for automated scripts). Will choose to install when asked.
234
- :param update: Whether to automatically update Julia packages.
235
- :param temp_equation_file: Whether to put the hall of fame file in \
236
- the temp directory. Deletion is then controlled with the \
237
  delete_tempfiles argument.
238
- :param output_jax_format: Whether to create a 'jax_format' column in the output, \
239
  containing jax-callable functions and the default parameters in a jax array.
240
- :param output_torch_format: Whether to create a 'torch_format' column in the output, \
241
  containing a torch module with trainable parameters.
242
- :returns: pd.DataFrame or list, Results dataframe, \
243
- giving complexity, MSE, and equations (as strings), as well as functional \
244
- forms. If list, each element corresponds to a dataframe of equations \
 
 
 
245
  for each output.
246
  """
247
  if binary_operators is None:
 
130
  equations, but you should adjust `niterations`,
131
  `binary_operators`, `unary_operators` to your requirements.
132
 
133
+ # Arguments
134
+
135
+ X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples,
136
+ columns are features. If pandas DataFrame, the columns are used
137
  for variable names (so make sure they don't contain spaces).
138
+ y (np.ndarray): 1D array (rows are examples) or 2D array (rows
139
+ are examples, columns are outputs). Putting in a 2D array will
140
  trigger a search for equations for each feature of y.
141
+ weights (np.ndarray): same shape as y. Each element is how to
142
+ weight the mean-square-error loss for that particular element
143
  of y.
144
+ binary_operators (list): List of strings giving the binary operators
145
  in Julia's Base. Default is ["+", "-", "*", "/",].
146
+ unary_operators (list): Same but for operators taking a single scalar.
147
  Default is [].
148
+ procs (int): Number of processes (=number of populations running).
149
+ loss (str): String of Julia code specifying the loss function.
150
+ Can either be a loss from LossFunctions.jl, or your own
151
+ loss written as a function. Examples of custom written losses
152
+ include: `myloss(x, y) = abs(x-y)` for non-weighted, or
153
+ `myloss(x, y, w) = w*abs(x-y)` for weighted.
154
+ Among the included losses, these are as follows. Regression:
155
+ `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
156
+ `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(系)`,
157
+ `L2EpsilonInsLoss(系)`, `PeriodicLoss(c)`, `QuantileLoss(蟿)`.
158
+ Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
159
+ `SmoothedL1HingeLoss(纬)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
160
  `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
161
+ populations (int): Number of populations running.
162
+ niterations (int): Number of iterations of the algorithm to run. The best
163
+ equations are printed, and migrate between populations, at the
164
  end of each.
165
+ ncyclesperiteration (int): Number of total mutations to run, per 10
166
  samples of the population, per iteration.
167
+ alpha (float): Initial temperature.
168
+ annealing (bool): Whether to use annealing. You should (and it is default).
169
+ fractionReplaced (float): How much of population to replace with migrating
170
  equations from other populations.
171
+ fractionReplacedHof (float): How much of population to replace with migrating
172
  equations from hall of fame.
173
+ npop (int): Number of individuals in each population
174
+ parsimony (float): Multiplicative factor for how much to punish complexity.
175
+ migration (bool): Whether to migrate.
176
+ hofMigration (bool): Whether to have the hall of fame migrate.
177
+ shouldOptimizeConstants (bool): Whether to numerically optimize
178
  constants (Nelder-Mead/Newton) at the end of each iteration.
179
+ topn (int): How many top individuals migrate from each population.
180
+ perturbationFactor (float): Constants are perturbed by a max
181
+ factor of (perturbationFactor*T + 1). Either multiplied by this
182
  or divided by this.
183
+ weightAddNode (float): Relative likelihood for mutation to add a node
184
+ weightInsertNode (float): Relative likelihood for mutation to insert a node
185
+ weightDeleteNode (float): Relative likelihood for mutation to delete a node
186
+ weightDoNothing (float): Relative likelihood for mutation to leave the individual
187
+ weightMutateConstant (float): Relative likelihood for mutation to change
188
  the constant slightly in a random direction.
189
+ weightMutateOperator (float): Relative likelihood for mutation to swap
190
  an operator.
191
+ weightRandomize (float): Relative likelihood for mutation to completely
192
  delete and then randomly generate the equation
193
+ weightSimplify (float): Relative likelihood for mutation to simplify
194
  constant parts by evaluation
195
+ timeout (float): Time in seconds to timeout search
196
+ equation_file (str): Where to save the files (.csv separated by |)
197
+ verbosity (int): What verbosity level to use. 0 means minimal print statements.
198
+ progress (bool): Whether to use a progress bar instead of printing to stdout.
199
+ maxsize (int): Max size of an equation.
200
+ maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth.
201
  maxdepth is by default set to = maxsize, which means that it is redundant.
202
+ fast_cycle (bool): (experimental) - batch over population subsamples. This
203
+ is a slightly different algorithm than regularized evolution, but does cycles
204
  15% faster. May be algorithmically less efficient.
205
+ variable_names (list): a list of names for the variables, other
206
  than "x0", "x1", etc.
207
+ batching (bool): whether to compare population members on small batches
208
+ during evolution. Still uses full dataset for comparing against
209
  hall of fame.
210
+ batchSize (int): the amount of data to use if doing batching.
211
+ select_k_features (None/int), whether to run feature selection in
212
+ Python using random forests, before passing to the symbolic regression
213
+ code. None means no feature selection; an int means select that many
214
  features.
215
+ warmupMaxsizeBy (float): whether to slowly increase max size from
216
+ a small number up to the maxsize (if greater than 0).
217
+ If greater than 0, says the fraction of training time at which
218
  the current maxsize will reach the user-passed maxsize.
219
+ constraints (dict): Dictionary of `int` (unary operators)
220
+ or tuples of two `int`s (binary),
221
+ this enforces maxsize constraints on the individual
222
+ arguments of operators. e.g., `'pow': (-1, 1)`
223
+ says that power laws can have any complexity left argument, but only
224
  1 complexity exponent. Use this to force more interpretable solutions.
225
+ useFrequency (bool): whether to measure the frequency of complexities,
226
+ and use that instead of parsimony to explore equation space. Will
227
  naturally find equations of all complexities.
228
+ julia_optimization (int): Optimization level (0, 1, 2, 3)
229
+ tempdir (str/None): directory for the temporary files
230
+ delete_tempfiles (bool): whether to delete the temporary files after finishing
231
+ julia_project (str/None): a Julia environment location containing
232
+ a Project.toml (and potentially the source code for SymbolicRegression.jl).
233
+ Default gives the Python package directory, where a Project.toml file
234
  should be present from the install.
235
+ user_input (bool): Whether to ask for user input or not for installing (to
236
  be used for automated scripts). Will choose to install when asked.
237
+ update (bool): Whether to automatically update Julia packages.
238
+ temp_equation_file (bool): Whether to put the hall of fame file in
239
+ the temp directory. Deletion is then controlled with the
240
  delete_tempfiles argument.
241
+ output_jax_format (bool): Whether to create a 'jax_format' column in the output,
242
  containing jax-callable functions and the default parameters in a jax array.
243
+ output_torch_format (bool): Whether to create a 'torch_format' column in the output,
244
  containing a torch module with trainable parameters.
245
+
246
+ # Returns
247
+
248
+ equations (pd.DataFrame/list): Results dataframe,
249
+ giving complexity, MSE, and equations (as strings), as well as functional
250
+ forms. If list, each element corresponds to a dataframe of equations
251
  for each output.
252
  """
253
  if binary_operators is None: