Spaces:
Running
Running
MilesCranmer
commited on
Commit
·
408a63c
1
Parent(s):
cdd291e
Clean up main docstrings
Browse files- pysr/sr.py +185 -123
pysr/sr.py
CHANGED
@@ -230,57 +230,65 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
230 |
|
231 |
Parameters
|
232 |
----------
|
233 |
-
model_selection : str
|
234 |
Model selection criterion when selecting a final expression from
|
235 |
the list of best expression at each complexity.
|
236 |
-
Can be 'accuracy'
|
237 |
-
|
238 |
-
|
239 |
-
|
240 |
-
|
241 |
-
|
242 |
-
|
243 |
-
|
244 |
-
|
245 |
-
|
246 |
-
|
247 |
-
binary_operators : list[str], default=["+", "-", "*", "/"]
|
248 |
List of strings for binary operators used in the search.
|
249 |
See the [operators page](https://astroautomata.com/PySR/operators/)
|
250 |
for more details.
|
251 |
-
|
|
|
252 |
Operators which only take a single scalar as input.
|
253 |
For example, `"cos"` or `"exp"`.
|
254 |
-
|
|
|
255 |
Number of iterations of the algorithm to run. The best
|
256 |
equations are printed and migrate between populations at the
|
257 |
end of each iteration.
|
258 |
-
|
|
|
259 |
Number of populations running.
|
260 |
-
|
|
|
261 |
Number of individuals in each population.
|
262 |
-
|
|
|
263 |
Limits the total number of evaluations of expressions to
|
264 |
-
this number.
|
265 |
-
maxsize : int
|
266 |
-
Max complexity of an equation.
|
267 |
-
maxdepth : int
|
268 |
Max depth of an equation. You can use both `maxsize` and
|
269 |
`maxdepth`. `maxdepth` is by default not used.
|
270 |
-
|
|
|
271 |
Whether to slowly increase max size from a small number up to
|
272 |
the maxsize (if greater than 0). If greater than 0, says the
|
273 |
fraction of training time at which the current maxsize will
|
274 |
reach the user-passed maxsize.
|
275 |
-
|
|
|
276 |
Make the search return early once this many seconds have passed.
|
277 |
-
|
|
|
278 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
|
279 |
maxsize constraints on the individual arguments of operators.
|
280 |
E.g., `'pow': (-1, 1)` says that power laws can have any
|
281 |
complexity left argument, but only 1 complexity in the right
|
282 |
argument. Use this to force more interpretable solutions.
|
283 |
-
|
|
|
284 |
Specifies how many times a combination of operators can be
|
285 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
|
286 |
specifies that `cos` may never appear within a `sin`, but `sin`
|
@@ -296,7 +304,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
296 |
operators, you only need to provide a single number: both
|
297 |
arguments are treated the same way, and the max of each
|
298 |
argument is constrained.
|
299 |
-
|
|
|
300 |
String of Julia code specifying the loss function. Can either
|
301 |
be a loss from LossFunctions.jl, or your own loss written as a
|
302 |
function. Examples of custom written losses include:
|
@@ -311,7 +320,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
311 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
|
312 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
|
313 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
|
314 |
-
|
|
|
315 |
If you would like to use a complexity other than 1 for an
|
316 |
operator, specify the complexity here. For example,
|
317 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
|
@@ -319,184 +329,231 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
319 |
the `+` operator (which is the default). You may specify real
|
320 |
numbers for a complexity, and the total complexity of a tree
|
321 |
will be rounded to the nearest integer after computing.
|
322 |
-
|
323 |
-
|
324 |
-
|
325 |
-
|
326 |
-
|
|
|
327 |
Multiplicative factor for how much to punish complexity.
|
328 |
-
|
|
|
329 |
Whether to measure the frequency of complexities, and use that
|
330 |
instead of parsimony to explore equation space. Will naturally
|
331 |
find equations of all complexities.
|
332 |
-
|
|
|
333 |
Whether to use the frequency mentioned above in the tournament,
|
334 |
rather than just the simulated annealing.
|
335 |
-
|
|
|
336 |
Initial temperature for simulated annealing
|
337 |
(requires `annealing` to be `True`).
|
338 |
-
|
339 |
-
|
340 |
-
|
|
|
341 |
Stop the search early if this loss is reached. You may also
|
342 |
pass a string containing a Julia function which
|
343 |
takes a loss and complexity as input, for example:
|
344 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
345 |
-
|
|
|
346 |
Number of total mutations to run, per 10 samples of the
|
347 |
population, per iteration.
|
348 |
-
|
|
|
349 |
How much of population to replace with migrating equations from
|
350 |
other populations.
|
351 |
-
|
|
|
352 |
How much of population to replace with migrating equations from
|
353 |
-
hall of fame.
|
354 |
-
weight_add_node : float
|
355 |
Relative likelihood for mutation to add a node.
|
356 |
-
|
|
|
357 |
Relative likelihood for mutation to insert a node.
|
358 |
-
|
|
|
359 |
Relative likelihood for mutation to delete a node.
|
360 |
-
|
|
|
361 |
Relative likelihood for mutation to leave the individual.
|
362 |
-
|
|
|
363 |
Relative likelihood for mutation to change the constant slightly
|
364 |
in a random direction.
|
365 |
-
|
|
|
366 |
Relative likelihood for mutation to swap an operator.
|
367 |
-
|
|
|
368 |
Relative likelihood for mutation to completely delete and then
|
369 |
randomly generate the equation
|
370 |
-
|
|
|
371 |
Relative likelihood for mutation to simplify constant parts by evaluation
|
372 |
-
|
|
|
373 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
374 |
-
|
|
|
375 |
Whether to skip mutation and crossover failures, rather than
|
376 |
simply re-sampling the current member.
|
377 |
-
|
378 |
-
|
379 |
-
|
380 |
-
|
381 |
-
|
|
|
382 |
How many top individuals migrate from each population.
|
383 |
-
|
|
|
384 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
|
385 |
-
at the end of each iteration.
|
386 |
-
optimizer_algorithm : str
|
387 |
Optimization scheme to use for optimizing constants. Can currently
|
388 |
be `NelderMead` or `BFGS`.
|
389 |
-
|
|
|
390 |
Number of time to restart the constants optimization process with
|
391 |
different initial conditions.
|
392 |
-
|
|
|
393 |
Probability of optimizing the constants during a single iteration of
|
394 |
the evolutionary algorithm.
|
395 |
-
|
|
|
396 |
Number of iterations that the constants optimizer can take.
|
397 |
-
|
|
|
398 |
Constants are perturbed by a max factor of
|
399 |
(perturbation_factor*T + 1). Either multiplied by this or
|
400 |
divided by this.
|
401 |
-
|
|
|
402 |
Number of expressions to consider in each tournament.
|
403 |
-
|
|
|
404 |
Probability of selecting the best expression in each
|
405 |
tournament. The probability will decay as p*(1-p)^n for other
|
406 |
expressions, sorted by loss.
|
407 |
-
|
|
|
408 |
Number of processes (=number of populations running).
|
409 |
-
|
|
|
410 |
Use multithreading instead of distributed backend.
|
411 |
-
Using procs=0 will turn off both.
|
412 |
-
cluster_manager : str
|
413 |
For distributed computing, this sets the job queue system. Set
|
414 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
|
415 |
"htc". If set to one of these, PySR will run in distributed
|
416 |
mode, and use `procs` to figure out how many processes to launch.
|
417 |
-
|
|
|
418 |
Whether to compare population members on small batches during
|
419 |
evolution. Still uses full dataset for comparing against hall
|
420 |
-
of fame.
|
421 |
-
batch_size : int
|
422 |
-
The amount of data to use if doing batching.
|
423 |
-
fast_cycle : bool
|
424 |
Batch over population subsamples. This is a slightly different
|
425 |
algorithm than regularized evolution, but does cycles 15%
|
426 |
faster. May be algorithmically less efficient.
|
427 |
-
|
428 |
-
|
429 |
-
|
430 |
-
|
|
|
|
|
|
|
431 |
Pass an int for reproducible results across multiple function calls.
|
432 |
See :term:`Glossary <random_state>`.
|
433 |
-
|
|
|
434 |
Make a PySR search give the same result every run.
|
435 |
To use this, you must turn off parallelism
|
436 |
(with `procs`=0, `multithreading`=False),
|
437 |
and set `random_state` to a fixed seed.
|
438 |
-
|
|
|
439 |
Tells fit to continue from where the last call to fit finished.
|
440 |
If false, each call to fit will be fresh, overwriting previous results.
|
441 |
-
|
|
|
442 |
What verbosity level to use. 0 means minimal print statements.
|
443 |
-
|
|
|
444 |
What verbosity level to use for package updates.
|
445 |
Will take value of `verbosity` if not given.
|
446 |
-
|
|
|
447 |
Whether to use a progress bar instead of printing to stdout.
|
448 |
-
|
|
|
449 |
Where to save the files (.csv extension).
|
450 |
-
|
|
|
451 |
Whether to put the hall of fame file in the temp directory.
|
452 |
Deletion is then controlled with the `delete_tempfiles`
|
453 |
parameter.
|
454 |
-
|
455 |
-
|
456 |
-
|
|
|
457 |
Whether to delete the temporary files after finishing.
|
458 |
-
|
|
|
459 |
A Julia environment location containing a Project.toml
|
460 |
(and potentially the source code for SymbolicRegression.jl).
|
461 |
Default gives the Python package directory, where a
|
462 |
Project.toml file should be present from the install.
|
463 |
-
update: bool
|
464 |
Whether to automatically update Julia packages.
|
465 |
-
|
|
|
466 |
Whether to create a 'jax_format' column in the output,
|
467 |
containing jax-callable functions and the default parameters in
|
468 |
a jax array.
|
469 |
-
|
|
|
470 |
Whether to create a 'torch_format' column in the output,
|
471 |
containing a torch module with trainable parameters.
|
472 |
-
|
|
|
473 |
Provides mappings between custom `binary_operators` or
|
474 |
`unary_operators` defined in julia strings, to those same
|
475 |
operators defined in sympy.
|
476 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
|
477 |
model to be export to sympy, `extra_sympy_mappings`
|
478 |
would be `{"inv": lambda x: 1/x}`.
|
479 |
-
|
|
|
480 |
Similar to `extra_sympy_mappings` but for model export
|
481 |
to jax. The dictionary maps sympy functions to jax functions.
|
482 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
|
483 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
|
484 |
-
|
|
|
485 |
The same as `extra_jax_mappings` but for model export
|
486 |
to pytorch. Note that the dictionary keys should be callable
|
487 |
pytorch expressions.
|
488 |
-
For example: `extra_torch_mappings={sympy.sin: torch.sin}
|
489 |
-
|
|
|
490 |
Whether to use a Gaussian Process to denoise the data before
|
491 |
inputting to PySR. Can help PySR fit noisy data.
|
492 |
-
|
|
|
493 |
whether to run feature selection in Python using random forests,
|
494 |
before passing to the symbolic regression code. None means no
|
495 |
feature selection; an int means select that many features.
|
496 |
-
|
|
|
497 |
Supports deprecated keyword arguments. Other arguments will
|
498 |
result in an error.
|
499 |
-
|
500 |
Attributes
|
501 |
----------
|
502 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
|
@@ -793,9 +850,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
793 |
selection_mask : list[bool]
|
794 |
If using select_k_features, you must pass `model.selection_mask_` here.
|
795 |
Not needed if loading from a pickle file.
|
796 |
-
nout : int
|
797 |
Number of outputs of the model.
|
798 |
Not needed if loading from a pickle file.
|
|
|
799 |
**pysr_kwargs : dict
|
800 |
Any other keyword arguments to initialize the PySRRegressor object.
|
801 |
These will overwrite those stored in the pickle file.
|
@@ -999,7 +1057,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
999 |
|
1000 |
Parameters
|
1001 |
----------
|
1002 |
-
index : int | list[int]
|
1003 |
If you wish to select a particular equation from `self.equations_`,
|
1004 |
give the row number here. This overrides the `model_selection`
|
1005 |
parameter. If there are multiple output features, then pass
|
@@ -1171,9 +1229,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1171 |
y : ndarray | pandas.DataFrame}
|
1172 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
|
1173 |
Will be cast to `X`'s dtype if necessary.
|
1174 |
-
Xresampled : ndarray | pandas.DataFrame
|
1175 |
-
|
1176 |
-
|
1177 |
weights : ndarray | pandas.DataFrame
|
1178 |
Weight array of the same shape as `y`.
|
1179 |
Each element is how to weight the mean-square-error loss
|
@@ -1252,15 +1310,15 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1252 |
y : ndarray | pandas.DataFrame
|
1253 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
1254 |
Will be cast to X's dtype if necessary.
|
1255 |
-
Xresampled : ndarray | pandas.DataFrame
|
1256 |
Resampled training data, of shape `(n_resampled, n_features)`,
|
1257 |
used for denoising.
|
1258 |
variable_names : list[str]
|
1259 |
Names of each variable in the training dataset, `X`.
|
1260 |
Of length `n_features`.
|
1261 |
-
random_state : int
|
1262 |
Pass an int for reproducible results across multiple function calls.
|
1263 |
-
See :term:`Glossary <random_state>`.
|
1264 |
|
1265 |
Returns
|
1266 |
-------
|
@@ -1578,17 +1636,17 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1578 |
y : ndarray | pandas.DataFrame
|
1579 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
1580 |
Will be cast to X's dtype if necessary.
|
1581 |
-
Xresampled : ndarray | pandas.DataFrame
|
1582 |
Resampled training data, of shape (n_resampled, n_features),
|
1583 |
to generate a denoised data on. This
|
1584 |
will be used as the training data, rather than `X`.
|
1585 |
-
weights : ndarray | pandas.DataFrame
|
1586 |
Weight array of the same shape as `y`.
|
1587 |
Each element is how to weight the mean-square-error loss
|
1588 |
for that particular element of `y`. Alternatively,
|
1589 |
if a custom `loss` was set, it will can be used
|
1590 |
in arbitrary ways.
|
1591 |
-
variable_names : list[str]
|
1592 |
A list of names for the variables, rather than "x0", "x1", etc.
|
1593 |
If `X` is a pandas dataframe, the column names will be used
|
1594 |
instead of `variable_names`. Cannot contain spaces or special
|
@@ -1695,8 +1753,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1695 |
|
1696 |
Parameters
|
1697 |
----------
|
1698 |
-
checkpoint_file : str
|
1699 |
Path to checkpoint hall of fame file to be loaded.
|
|
|
1700 |
"""
|
1701 |
if checkpoint_file:
|
1702 |
self.equation_file_ = checkpoint_file
|
@@ -1716,7 +1775,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1716 |
X : ndarray | pandas.DataFrame
|
1717 |
Training data of shape `(n_samples, n_features)`.
|
1718 |
|
1719 |
-
index : int | list[int]
|
1720 |
If you want to compute the output of an expression using a
|
1721 |
particular row of `self.equations_`, you may specify the index here.
|
1722 |
For multiple output equations, you must pass a list of indices
|
@@ -1784,7 +1843,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1784 |
|
1785 |
Parameters
|
1786 |
----------
|
1787 |
-
index : int | list[int]
|
1788 |
If you wish to select a particular equation from
|
1789 |
`self.equations_`, give the index number here. This overrides
|
1790 |
the `model_selection` parameter. If there are multiple output
|
@@ -1808,15 +1867,16 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1808 |
|
1809 |
Parameters
|
1810 |
----------
|
1811 |
-
index : int | list[int]
|
1812 |
If you wish to select a particular equation from
|
1813 |
`self.equations_`, give the index number here. This overrides
|
1814 |
the `model_selection` parameter. If there are multiple output
|
1815 |
features, then pass a list of indices with the order the same
|
1816 |
as the output feature.
|
1817 |
-
precision : int
|
1818 |
The number of significant figures shown in the LaTeX
|
1819 |
representation.
|
|
|
1820 |
|
1821 |
Returns
|
1822 |
-------
|
@@ -1843,7 +1903,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1843 |
|
1844 |
Parameters
|
1845 |
----------
|
1846 |
-
index : int | list[int]
|
1847 |
If you wish to select a particular equation from
|
1848 |
`self.equations_`, give the index number here. This overrides
|
1849 |
the `model_selection` parameter. If there are multiple output
|
@@ -1874,7 +1934,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1874 |
|
1875 |
Parameters
|
1876 |
----------
|
1877 |
-
index : int | list[int]
|
1878 |
If you wish to select a particular equation from
|
1879 |
`self.equations_`, give the index number here. This overrides
|
1880 |
the `model_selection` parameter. If there are multiple output
|
@@ -2094,16 +2154,18 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
2094 |
|
2095 |
Parameters
|
2096 |
----------
|
2097 |
-
indices : list[int] | list[list[int]]
|
2098 |
If you wish to select a particular subset of equations from
|
2099 |
`self.equations_`, give the row numbers here. By default,
|
2100 |
all equations will be used. If there are multiple output
|
2101 |
features, then pass a list of lists.
|
2102 |
-
precision : int
|
2103 |
The number of significant figures shown in the LaTeX
|
2104 |
representations.
|
2105 |
-
|
|
|
2106 |
Which columns to include in the table.
|
|
|
2107 |
|
2108 |
Returns
|
2109 |
-------
|
|
|
230 |
|
231 |
Parameters
|
232 |
----------
|
233 |
+
model_selection : str
|
234 |
Model selection criterion when selecting a final expression from
|
235 |
the list of best expression at each complexity.
|
236 |
+
Can be `'accuracy'`, `'best'`, or `'score'`. Default is `'best'`.
|
237 |
+
`'accuracy'` selects the candidate model with the lowest loss
|
238 |
+
(highest accuracy).
|
239 |
+
`'score'` selects the candidate model with the highest score.
|
240 |
+
Score is defined as the negated derivative of the log-loss with
|
241 |
+
respect to complexity - if an expression has a much better
|
242 |
+
loss at a slightly higher complexity, it is preferred.
|
243 |
+
`'best'` selects the candidate model with the highest score
|
244 |
+
among expressions with a loss better than at least 1.5x the
|
245 |
+
most accurate model.
|
246 |
+
binary_operators : list[str]
|
|
|
247 |
List of strings for binary operators used in the search.
|
248 |
See the [operators page](https://astroautomata.com/PySR/operators/)
|
249 |
for more details.
|
250 |
+
Default is `["+", "-", "*", "/"]`.
|
251 |
+
unary_operators : list[str]
|
252 |
Operators which only take a single scalar as input.
|
253 |
For example, `"cos"` or `"exp"`.
|
254 |
+
Default is `None`.
|
255 |
+
niterations : int
|
256 |
Number of iterations of the algorithm to run. The best
|
257 |
equations are printed and migrate between populations at the
|
258 |
end of each iteration.
|
259 |
+
Default is `40`.
|
260 |
+
populations : int
|
261 |
Number of populations running.
|
262 |
+
Default is `15`.
|
263 |
+
population_size : int
|
264 |
Number of individuals in each population.
|
265 |
+
Default is `33`.
|
266 |
+
max_evals : int
|
267 |
Limits the total number of evaluations of expressions to
|
268 |
+
this number. Default is `None`.
|
269 |
+
maxsize : int
|
270 |
+
Max complexity of an equation. Default is `20`.
|
271 |
+
maxdepth : int
|
272 |
Max depth of an equation. You can use both `maxsize` and
|
273 |
`maxdepth`. `maxdepth` is by default not used.
|
274 |
+
Default is `None`.
|
275 |
+
warmup_maxsize_by : float
|
276 |
Whether to slowly increase max size from a small number up to
|
277 |
the maxsize (if greater than 0). If greater than 0, says the
|
278 |
fraction of training time at which the current maxsize will
|
279 |
reach the user-passed maxsize.
|
280 |
+
Default is `0.0`.
|
281 |
+
timeout_in_seconds : float
|
282 |
Make the search return early once this many seconds have passed.
|
283 |
+
Default is `None`.
|
284 |
+
constraints : dict[str, int | tuple[int,int]]
|
285 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
|
286 |
maxsize constraints on the individual arguments of operators.
|
287 |
E.g., `'pow': (-1, 1)` says that power laws can have any
|
288 |
complexity left argument, but only 1 complexity in the right
|
289 |
argument. Use this to force more interpretable solutions.
|
290 |
+
Default is `None`.
|
291 |
+
nested_constraints : dict[str, dict]
|
292 |
Specifies how many times a combination of operators can be
|
293 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
|
294 |
specifies that `cos` may never appear within a `sin`, but `sin`
|
|
|
304 |
operators, you only need to provide a single number: both
|
305 |
arguments are treated the same way, and the max of each
|
306 |
argument is constrained.
|
307 |
+
Default is `None`.
|
308 |
+
loss : str
|
309 |
String of Julia code specifying the loss function. Can either
|
310 |
be a loss from LossFunctions.jl, or your own loss written as a
|
311 |
function. Examples of custom written losses include:
|
|
|
320 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
|
321 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
|
322 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
|
323 |
+
Default is `"L2DistLoss()"`.
|
324 |
+
complexity_of_operators : dict[str, float]
|
325 |
If you would like to use a complexity other than 1 for an
|
326 |
operator, specify the complexity here. For example,
|
327 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
|
|
|
329 |
the `+` operator (which is the default). You may specify real
|
330 |
numbers for a complexity, and the total complexity of a tree
|
331 |
will be rounded to the nearest integer after computing.
|
332 |
+
Default is `None`.
|
333 |
+
complexity_of_constants : float
|
334 |
+
Complexity of constants. Default is `1`.
|
335 |
+
complexity_of_variables : float
|
336 |
+
Complexity of variables. Default is `1`.
|
337 |
+
parsimony : float
|
338 |
Multiplicative factor for how much to punish complexity.
|
339 |
+
Default is `0.0032`.
|
340 |
+
use_frequency : bool
|
341 |
Whether to measure the frequency of complexities, and use that
|
342 |
instead of parsimony to explore equation space. Will naturally
|
343 |
find equations of all complexities.
|
344 |
+
Default is `True`.
|
345 |
+
use_frequency_in_tournament : bool
|
346 |
Whether to use the frequency mentioned above in the tournament,
|
347 |
rather than just the simulated annealing.
|
348 |
+
Default is `True`.
|
349 |
+
alpha : float
|
350 |
Initial temperature for simulated annealing
|
351 |
(requires `annealing` to be `True`).
|
352 |
+
Default is `0.1`.
|
353 |
+
annealing : bool
|
354 |
+
Whether to use annealing. Default is `False`.
|
355 |
+
early_stop_condition : float | str
|
356 |
Stop the search early if this loss is reached. You may also
|
357 |
pass a string containing a Julia function which
|
358 |
takes a loss and complexity as input, for example:
|
359 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
360 |
+
Default is `None`.
|
361 |
+
ncyclesperiteration : int
|
362 |
Number of total mutations to run, per 10 samples of the
|
363 |
population, per iteration.
|
364 |
+
Default is `550`.
|
365 |
+
fraction_replaced : float
|
366 |
How much of population to replace with migrating equations from
|
367 |
other populations.
|
368 |
+
Default is `0.000364`.
|
369 |
+
fraction_replaced_hof : float
|
370 |
How much of population to replace with migrating equations from
|
371 |
+
hall of fame. Default is `0.035`.
|
372 |
+
weight_add_node : float
|
373 |
Relative likelihood for mutation to add a node.
|
374 |
+
Default is `0.79`.
|
375 |
+
weight_insert_node : float
|
376 |
Relative likelihood for mutation to insert a node.
|
377 |
+
Default is `5.1`.
|
378 |
+
weight_delete_node : float
|
379 |
Relative likelihood for mutation to delete a node.
|
380 |
+
Default is `1.7`.
|
381 |
+
weight_do_nothing : float
|
382 |
Relative likelihood for mutation to leave the individual.
|
383 |
+
Default is `0.21`.
|
384 |
+
weight_mutate_constant : float
|
385 |
Relative likelihood for mutation to change the constant slightly
|
386 |
in a random direction.
|
387 |
+
Default is `0.048`.
|
388 |
+
weight_mutate_operator : float
|
389 |
Relative likelihood for mutation to swap an operator.
|
390 |
+
Default is `0.47`.
|
391 |
+
weight_randomize : float
|
392 |
Relative likelihood for mutation to completely delete and then
|
393 |
randomly generate the equation
|
394 |
+
Default is `0.00023`.
|
395 |
+
weight_simplify : float
|
396 |
Relative likelihood for mutation to simplify constant parts by evaluation
|
397 |
+
Default is `0.0020`.
|
398 |
+
crossover_probability : float
|
399 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
400 |
+
Default is `0.066`.
|
401 |
+
skip_mutation_failures : bool
|
402 |
Whether to skip mutation and crossover failures, rather than
|
403 |
simply re-sampling the current member.
|
404 |
+
Default is `True`.
|
405 |
+
migration : bool
|
406 |
+
Whether to migrate. Default is `True`.
|
407 |
+
hof_migration : bool
|
408 |
+
Whether to have the hall of fame migrate. Default is `True`.
|
409 |
+
topn : int
|
410 |
How many top individuals migrate from each population.
|
411 |
+
Default is `12`.
|
412 |
+
should_optimize_constants : bool
|
413 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
|
414 |
+
at the end of each iteration. Default is `True`.
|
415 |
+
optimizer_algorithm : str
|
416 |
Optimization scheme to use for optimizing constants. Can currently
|
417 |
be `NelderMead` or `BFGS`.
|
418 |
+
Default is `"BFGS"`.
|
419 |
+
optimizer_nrestarts : int
|
420 |
Number of time to restart the constants optimization process with
|
421 |
different initial conditions.
|
422 |
+
Default is `2`.
|
423 |
+
optimize_probability : float
|
424 |
Probability of optimizing the constants during a single iteration of
|
425 |
the evolutionary algorithm.
|
426 |
+
Default is `0.14`.
|
427 |
+
optimizer_iterations : int
|
428 |
Number of iterations that the constants optimizer can take.
|
429 |
+
Default is `8`.
|
430 |
+
perturbation_factor : float
|
431 |
Constants are perturbed by a max factor of
|
432 |
(perturbation_factor*T + 1). Either multiplied by this or
|
433 |
divided by this.
|
434 |
+
Default is `0.076`.
|
435 |
+
tournament_selection_n : int
|
436 |
Number of expressions to consider in each tournament.
|
437 |
+
Default is `10`.
|
438 |
+
tournament_selection_p : float
|
439 |
Probability of selecting the best expression in each
|
440 |
tournament. The probability will decay as p*(1-p)^n for other
|
441 |
expressions, sorted by loss.
|
442 |
+
Default is `0.86`.
|
443 |
+
procs : int
|
444 |
Number of processes (=number of populations running).
|
445 |
+
Default is `cpu_count()`.
|
446 |
+
multithreading : bool
|
447 |
Use multithreading instead of distributed backend.
|
448 |
+
Using procs=0 will turn off both. Default is `True`.
|
449 |
+
cluster_manager : str
|
450 |
For distributed computing, this sets the job queue system. Set
|
451 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
|
452 |
"htc". If set to one of these, PySR will run in distributed
|
453 |
mode, and use `procs` to figure out how many processes to launch.
|
454 |
+
Default is `None`.
|
455 |
+
batching : bool
|
456 |
Whether to compare population members on small batches during
|
457 |
evolution. Still uses full dataset for comparing against hall
|
458 |
+
of fame. Default is `False`.
|
459 |
+
batch_size : int
|
460 |
+
The amount of data to use if doing batching. Default is `50`.
|
461 |
+
fast_cycle : bool
|
462 |
Batch over population subsamples. This is a slightly different
|
463 |
algorithm than regularized evolution, but does cycles 15%
|
464 |
faster. May be algorithmically less efficient.
|
465 |
+
Default is `False`.
|
466 |
+
precision : int
|
467 |
+
What precision to use for the data. By default this is `32`
|
468 |
+
(float32), but you can select `64` or `16` as well, giving
|
469 |
+
you 64 or 16 bits of floating point precision, respectively.
|
470 |
+
Default is `32`.
|
471 |
+
random_state : int, Numpy RandomState instance or None
|
472 |
Pass an int for reproducible results across multiple function calls.
|
473 |
See :term:`Glossary <random_state>`.
|
474 |
+
Default is `None`.
|
475 |
+
deterministic : bool
|
476 |
Make a PySR search give the same result every run.
|
477 |
To use this, you must turn off parallelism
|
478 |
(with `procs`=0, `multithreading`=False),
|
479 |
and set `random_state` to a fixed seed.
|
480 |
+
Default is `False`.
|
481 |
+
warm_start : bool
|
482 |
Tells fit to continue from where the last call to fit finished.
|
483 |
If false, each call to fit will be fresh, overwriting previous results.
|
484 |
+
Default is `False`.
|
485 |
+
verbosity : int
|
486 |
What verbosity level to use. 0 means minimal print statements.
|
487 |
+
Default is `1e9`.
|
488 |
+
update_verbosity : int
|
489 |
What verbosity level to use for package updates.
|
490 |
Will take value of `verbosity` if not given.
|
491 |
+
Default is `None`.
|
492 |
+
progress : bool
|
493 |
Whether to use a progress bar instead of printing to stdout.
|
494 |
+
Default is `True`.
|
495 |
+
equation_file : str
|
496 |
Where to save the files (.csv extension).
|
497 |
+
Default is `None`.
|
498 |
+
temp_equation_file : bool
|
499 |
Whether to put the hall of fame file in the temp directory.
|
500 |
Deletion is then controlled with the `delete_tempfiles`
|
501 |
parameter.
|
502 |
+
Default is `False`.
|
503 |
+
tempdir : str
|
504 |
+
directory for the temporary files. Default is `None`.
|
505 |
+
delete_tempfiles : bool
|
506 |
Whether to delete the temporary files after finishing.
|
507 |
+
Default is `True`.
|
508 |
+
julia_project : str
|
509 |
A Julia environment location containing a Project.toml
|
510 |
(and potentially the source code for SymbolicRegression.jl).
|
511 |
Default gives the Python package directory, where a
|
512 |
Project.toml file should be present from the install.
|
513 |
+
update: bool
|
514 |
Whether to automatically update Julia packages.
|
515 |
+
Default is `True`.
|
516 |
+
output_jax_format : bool
|
517 |
Whether to create a 'jax_format' column in the output,
|
518 |
containing jax-callable functions and the default parameters in
|
519 |
a jax array.
|
520 |
+
Default is `False`.
|
521 |
+
output_torch_format : bool
|
522 |
Whether to create a 'torch_format' column in the output,
|
523 |
containing a torch module with trainable parameters.
|
524 |
+
Default is `False`.
|
525 |
+
extra_sympy_mappings : dict[str, Callable]
|
526 |
Provides mappings between custom `binary_operators` or
|
527 |
`unary_operators` defined in julia strings, to those same
|
528 |
operators defined in sympy.
|
529 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
|
530 |
model to be export to sympy, `extra_sympy_mappings`
|
531 |
would be `{"inv": lambda x: 1/x}`.
|
532 |
+
Default is `None`.
|
533 |
+
extra_jax_mappings : dict[Callable, str]
|
534 |
Similar to `extra_sympy_mappings` but for model export
|
535 |
to jax. The dictionary maps sympy functions to jax functions.
|
536 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
|
537 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
|
538 |
+
Default is `None`.
|
539 |
+
extra_torch_mappings : dict[Callable, Callable]
|
540 |
The same as `extra_jax_mappings` but for model export
|
541 |
to pytorch. Note that the dictionary keys should be callable
|
542 |
pytorch expressions.
|
543 |
+
For example: `extra_torch_mappings={sympy.sin: torch.sin}`.
|
544 |
+
Default is `None`.
|
545 |
+
denoise : bool
|
546 |
Whether to use a Gaussian Process to denoise the data before
|
547 |
inputting to PySR. Can help PySR fit noisy data.
|
548 |
+
Default is `False`.
|
549 |
+
select_k_features : int
|
550 |
whether to run feature selection in Python using random forests,
|
551 |
before passing to the symbolic regression code. None means no
|
552 |
feature selection; an int means select that many features.
|
553 |
+
Default is `None`.
|
554 |
+
**kwargs : dict
|
555 |
Supports deprecated keyword arguments. Other arguments will
|
556 |
result in an error.
|
|
|
557 |
Attributes
|
558 |
----------
|
559 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
|
|
|
850 |
selection_mask : list[bool]
|
851 |
If using select_k_features, you must pass `model.selection_mask_` here.
|
852 |
Not needed if loading from a pickle file.
|
853 |
+
nout : int
|
854 |
Number of outputs of the model.
|
855 |
Not needed if loading from a pickle file.
|
856 |
+
Default is `1`.
|
857 |
**pysr_kwargs : dict
|
858 |
Any other keyword arguments to initialize the PySRRegressor object.
|
859 |
These will overwrite those stored in the pickle file.
|
|
|
1057 |
|
1058 |
Parameters
|
1059 |
----------
|
1060 |
+
index : int | list[int]
|
1061 |
If you wish to select a particular equation from `self.equations_`,
|
1062 |
give the row number here. This overrides the `model_selection`
|
1063 |
parameter. If there are multiple output features, then pass
|
|
|
1229 |
y : ndarray | pandas.DataFrame}
|
1230 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
|
1231 |
Will be cast to `X`'s dtype if necessary.
|
1232 |
+
Xresampled : ndarray | pandas.DataFrame
|
1233 |
+
Resampled training data used for denoising,
|
1234 |
+
of shape `(n_resampled, n_features)`.
|
1235 |
weights : ndarray | pandas.DataFrame
|
1236 |
Weight array of the same shape as `y`.
|
1237 |
Each element is how to weight the mean-square-error loss
|
|
|
1310 |
y : ndarray | pandas.DataFrame
|
1311 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
1312 |
Will be cast to X's dtype if necessary.
|
1313 |
+
Xresampled : ndarray | pandas.DataFrame
|
1314 |
Resampled training data, of shape `(n_resampled, n_features)`,
|
1315 |
used for denoising.
|
1316 |
variable_names : list[str]
|
1317 |
Names of each variable in the training dataset, `X`.
|
1318 |
Of length `n_features`.
|
1319 |
+
random_state : int | np.RandomState
|
1320 |
Pass an int for reproducible results across multiple function calls.
|
1321 |
+
See :term:`Glossary <random_state>`. Default is `None`.
|
1322 |
|
1323 |
Returns
|
1324 |
-------
|
|
|
1636 |
y : ndarray | pandas.DataFrame
|
1637 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
1638 |
Will be cast to X's dtype if necessary.
|
1639 |
+
Xresampled : ndarray | pandas.DataFrame
|
1640 |
Resampled training data, of shape (n_resampled, n_features),
|
1641 |
to generate a denoised data on. This
|
1642 |
will be used as the training data, rather than `X`.
|
1643 |
+
weights : ndarray | pandas.DataFrame
|
1644 |
Weight array of the same shape as `y`.
|
1645 |
Each element is how to weight the mean-square-error loss
|
1646 |
for that particular element of `y`. Alternatively,
|
1647 |
if a custom `loss` was set, it will can be used
|
1648 |
in arbitrary ways.
|
1649 |
+
variable_names : list[str]
|
1650 |
A list of names for the variables, rather than "x0", "x1", etc.
|
1651 |
If `X` is a pandas dataframe, the column names will be used
|
1652 |
instead of `variable_names`. Cannot contain spaces or special
|
|
|
1753 |
|
1754 |
Parameters
|
1755 |
----------
|
1756 |
+
checkpoint_file : str
|
1757 |
Path to checkpoint hall of fame file to be loaded.
|
1758 |
+
The default will use the set `equation_file_`.
|
1759 |
"""
|
1760 |
if checkpoint_file:
|
1761 |
self.equation_file_ = checkpoint_file
|
|
|
1775 |
X : ndarray | pandas.DataFrame
|
1776 |
Training data of shape `(n_samples, n_features)`.
|
1777 |
|
1778 |
+
index : int | list[int]
|
1779 |
If you want to compute the output of an expression using a
|
1780 |
particular row of `self.equations_`, you may specify the index here.
|
1781 |
For multiple output equations, you must pass a list of indices
|
|
|
1843 |
|
1844 |
Parameters
|
1845 |
----------
|
1846 |
+
index : int | list[int]
|
1847 |
If you wish to select a particular equation from
|
1848 |
`self.equations_`, give the index number here. This overrides
|
1849 |
the `model_selection` parameter. If there are multiple output
|
|
|
1867 |
|
1868 |
Parameters
|
1869 |
----------
|
1870 |
+
index : int | list[int]
|
1871 |
If you wish to select a particular equation from
|
1872 |
`self.equations_`, give the index number here. This overrides
|
1873 |
the `model_selection` parameter. If there are multiple output
|
1874 |
features, then pass a list of indices with the order the same
|
1875 |
as the output feature.
|
1876 |
+
precision : int
|
1877 |
The number of significant figures shown in the LaTeX
|
1878 |
representation.
|
1879 |
+
Default is `3`.
|
1880 |
|
1881 |
Returns
|
1882 |
-------
|
|
|
1903 |
|
1904 |
Parameters
|
1905 |
----------
|
1906 |
+
index : int | list[int]
|
1907 |
If you wish to select a particular equation from
|
1908 |
`self.equations_`, give the index number here. This overrides
|
1909 |
the `model_selection` parameter. If there are multiple output
|
|
|
1934 |
|
1935 |
Parameters
|
1936 |
----------
|
1937 |
+
index : int | list[int]
|
1938 |
If you wish to select a particular equation from
|
1939 |
`self.equations_`, give the index number here. This overrides
|
1940 |
the `model_selection` parameter. If there are multiple output
|
|
|
2154 |
|
2155 |
Parameters
|
2156 |
----------
|
2157 |
+
indices : list[int] | list[list[int]]
|
2158 |
If you wish to select a particular subset of equations from
|
2159 |
`self.equations_`, give the row numbers here. By default,
|
2160 |
all equations will be used. If there are multiple output
|
2161 |
features, then pass a list of lists.
|
2162 |
+
precision : int
|
2163 |
The number of significant figures shown in the LaTeX
|
2164 |
representations.
|
2165 |
+
Default is `3`.
|
2166 |
+
columns : list[str]
|
2167 |
Which columns to include in the table.
|
2168 |
+
Default is `["equation", "complexity", "loss", "score"]`.
|
2169 |
|
2170 |
Returns
|
2171 |
-------
|