Clean up main docstrings
Browse files- pysr/ +185 -123
@@ -230,57 +230,65 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
230 |
231 |
232 |
233 |
model_selection : str
234 |
Model selection criterion when selecting a final expression from
235 |
the list of best expression at each complexity.
236 |
Can be 'accuracy'
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 |
binary_operators : list[str], default=["+", "-", "*", "/"]
248 |
List of strings for binary operators used in the search.
249 |
See the [operators page](
250 |
for more details.
251 |
252 |
Operators which only take a single scalar as input.
253 |
For example, `"cos"` or `"exp"`.
254 |
255 |
Number of iterations of the algorithm to run. The best
256 |
equations are printed and migrate between populations at the
257 |
end of each iteration.
258 |
259 |
Number of populations running.
260 |
261 |
Number of individuals in each population.
262 |
263 |
Limits the total number of evaluations of expressions to
264 |
this number.
265 |
maxsize : int
266 |
Max complexity of an equation.
267 |
maxdepth : int
268 |
Max depth of an equation. You can use both `maxsize` and
269 |
`maxdepth`. `maxdepth` is by default not used.
270 |
271 |
Whether to slowly increase max size from a small number up to
272 |
the maxsize (if greater than 0). If greater than 0, says the
273 |
fraction of training time at which the current maxsize will
274 |
reach the user-passed maxsize.
275 |
276 |
Make the search return early once this many seconds have passed.
277 |
278 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
279 |
maxsize constraints on the individual arguments of operators.
280 |
E.g., `'pow': (-1, 1)` says that power laws can have any
281 |
complexity left argument, but only 1 complexity in the right
282 |
argument. Use this to force more interpretable solutions.
283 |
284 |
Specifies how many times a combination of operators can be
285 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
286 |
specifies that `cos` may never appear within a `sin`, but `sin`
@@ -296,7 +304,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
296 |
operators, you only need to provide a single number: both
297 |
arguments are treated the same way, and the max of each
298 |
argument is constrained.
299 |
300 |
String of Julia code specifying the loss function. Can either
301 |
be a loss from LossFunctions.jl, or your own loss written as a
302 |
function. Examples of custom written losses include:
@@ -311,7 +320,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
311 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
312 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
313 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
314 |
315 |
If you would like to use a complexity other than 1 for an
316 |
operator, specify the complexity here. For example,
317 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
@@ -319,184 +329,231 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
319 |
the `+` operator (which is the default). You may specify real
320 |
numbers for a complexity, and the total complexity of a tree
321 |
will be rounded to the nearest integer after computing.
322 |
323 |
324 |
325 |
326 |
327 |
Multiplicative factor for how much to punish complexity.
328 |
329 |
Whether to measure the frequency of complexities, and use that
330 |
instead of parsimony to explore equation space. Will naturally
331 |
find equations of all complexities.
332 |
333 |
Whether to use the frequency mentioned above in the tournament,
334 |
rather than just the simulated annealing.
335 |
336 |
Initial temperature for simulated annealing
337 |
(requires `annealing` to be `True`).
338 |
339 |
340 |
341 |
Stop the search early if this loss is reached. You may also
342 |
pass a string containing a Julia function which
343 |
takes a loss and complexity as input, for example:
344 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
345 |
346 |
Number of total mutations to run, per 10 samples of the
347 |
population, per iteration.
348 |
349 |
How much of population to replace with migrating equations from
350 |
other populations.
351 |
352 |
How much of population to replace with migrating equations from
353 |
hall of fame.
354 |
weight_add_node : float
355 |
Relative likelihood for mutation to add a node.
356 |
357 |
Relative likelihood for mutation to insert a node.
358 |
359 |
Relative likelihood for mutation to delete a node.
360 |
361 |
Relative likelihood for mutation to leave the individual.
362 |
363 |
Relative likelihood for mutation to change the constant slightly
364 |
in a random direction.
365 |
366 |
Relative likelihood for mutation to swap an operator.
367 |
368 |
Relative likelihood for mutation to completely delete and then
369 |
randomly generate the equation
370 |
371 |
Relative likelihood for mutation to simplify constant parts by evaluation
372 |
373 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
374 |
375 |
Whether to skip mutation and crossover failures, rather than
376 |
simply re-sampling the current member.
377 |
378 |
379 |
380 |
381 |
382 |
How many top individuals migrate from each population.
383 |
384 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
385 |
at the end of each iteration.
386 |
optimizer_algorithm : str
387 |
Optimization scheme to use for optimizing constants. Can currently
388 |
be `NelderMead` or `BFGS`.
389 |
390 |
Number of time to restart the constants optimization process with
391 |
different initial conditions.
392 |
393 |
Probability of optimizing the constants during a single iteration of
394 |
the evolutionary algorithm.
395 |
396 |
Number of iterations that the constants optimizer can take.
397 |
398 |
Constants are perturbed by a max factor of
399 |
(perturbation_factor*T + 1). Either multiplied by this or
400 |
divided by this.
401 |
402 |
Number of expressions to consider in each tournament.
403 |
404 |
Probability of selecting the best expression in each
405 |
tournament. The probability will decay as p*(1-p)^n for other
406 |
expressions, sorted by loss.
407 |
408 |
Number of processes (=number of populations running).
409 |
410 |
Use multithreading instead of distributed backend.
411 |
Using procs=0 will turn off both.
412 |
cluster_manager : str
413 |
For distributed computing, this sets the job queue system. Set
414 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
415 |
"htc". If set to one of these, PySR will run in distributed
416 |
mode, and use `procs` to figure out how many processes to launch.
417 |
418 |
Whether to compare population members on small batches during
419 |
evolution. Still uses full dataset for comparing against hall
420 |
of fame.
421 |
batch_size : int
422 |
The amount of data to use if doing batching.
423 |
fast_cycle : bool
424 |
Batch over population subsamples. This is a slightly different
425 |
algorithm than regularized evolution, but does cycles 15%
426 |
faster. May be algorithmically less efficient.
427 |
428 |
429 |
430 |
431 |
Pass an int for reproducible results across multiple function calls.
432 |
See :term:`Glossary <random_state>`.
433 |
434 |
Make a PySR search give the same result every run.
435 |
To use this, you must turn off parallelism
436 |
(with `procs`=0, `multithreading`=False),
437 |
and set `random_state` to a fixed seed.
438 |
439 |
Tells fit to continue from where the last call to fit finished.
440 |
If false, each call to fit will be fresh, overwriting previous results.
441 |
442 |
What verbosity level to use. 0 means minimal print statements.
443 |
444 |
What verbosity level to use for package updates.
445 |
Will take value of `verbosity` if not given.
446 |
447 |
Whether to use a progress bar instead of printing to stdout.
448 |
449 |
Where to save the files (.csv extension).
450 |
451 |
Whether to put the hall of fame file in the temp directory.
452 |
Deletion is then controlled with the `delete_tempfiles`
453 |
454 |
455 |
456 |
457 |
Whether to delete the temporary files after finishing.
458 |
459 |
A Julia environment location containing a Project.toml
460 |
(and potentially the source code for SymbolicRegression.jl).
461 |
Default gives the Python package directory, where a
462 |
Project.toml file should be present from the install.
463 |
update: bool
464 |
Whether to automatically update Julia packages.
465 |
466 |
Whether to create a 'jax_format' column in the output,
467 |
containing jax-callable functions and the default parameters in
468 |
a jax array.
469 |
470 |
Whether to create a 'torch_format' column in the output,
471 |
containing a torch module with trainable parameters.
472 |
473 |
Provides mappings between custom `binary_operators` or
474 |
`unary_operators` defined in julia strings, to those same
475 |
operators defined in sympy.
476 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
477 |
model to be export to sympy, `extra_sympy_mappings`
478 |
would be `{"inv": lambda x: 1/x}`.
479 |
480 |
Similar to `extra_sympy_mappings` but for model export
481 |
to jax. The dictionary maps sympy functions to jax functions.
482 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
483 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
484 |
485 |
The same as `extra_jax_mappings` but for model export
486 |
to pytorch. Note that the dictionary keys should be callable
487 |
pytorch expressions.
488 |
For example: `extra_torch_mappings={sympy.sin: torch.sin}
489 |
490 |
Whether to use a Gaussian Process to denoise the data before
491 |
inputting to PySR. Can help PySR fit noisy data.
492 |
493 |
whether to run feature selection in Python using random forests,
494 |
before passing to the symbolic regression code. None means no
495 |
feature selection; an int means select that many features.
496 |
497 |
Supports deprecated keyword arguments. Other arguments will
498 |
result in an error.
499 |
500 |
501 |
502 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
@@ -793,9 +850,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
793 |
selection_mask : list[bool]
794 |
If using select_k_features, you must pass `model.selection_mask_` here.
795 |
Not needed if loading from a pickle file.
796 |
nout : int
797 |
Number of outputs of the model.
798 |
Not needed if loading from a pickle file.
799 |
**pysr_kwargs : dict
800 |
Any other keyword arguments to initialize the PySRRegressor object.
801 |
These will overwrite those stored in the pickle file.
@@ -999,7 +1057,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
999 |
1000 |
1001 |
1002 |
index : int | list[int]
1003 |
If you wish to select a particular equation from `self.equations_`,
1004 |
give the row number here. This overrides the `model_selection`
1005 |
parameter. If there are multiple output features, then pass
@@ -1171,9 +1229,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1171 |
y : ndarray | pandas.DataFrame}
1172 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
1173 |
Will be cast to `X`'s dtype if necessary.
1174 |
Xresampled : ndarray | pandas.DataFrame
1175 |
1176 |
1177 |
weights : ndarray | pandas.DataFrame
1178 |
Weight array of the same shape as `y`.
1179 |
Each element is how to weight the mean-square-error loss
@@ -1252,15 +1310,15 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1252 |
y : ndarray | pandas.DataFrame
1253 |
Target values of shape (n_samples,) or (n_samples, n_targets).
1254 |
Will be cast to X's dtype if necessary.
1255 |
Xresampled : ndarray | pandas.DataFrame
1256 |
Resampled training data, of shape `(n_resampled, n_features)`,
1257 |
used for denoising.
1258 |
variable_names : list[str]
1259 |
Names of each variable in the training dataset, `X`.
1260 |
Of length `n_features`.
1261 |
random_state : int
1262 |
Pass an int for reproducible results across multiple function calls.
1263 |
See :term:`Glossary <random_state>`.
1264 |
1265 |
1266 |
@@ -1578,17 +1636,17 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1578 |
y : ndarray | pandas.DataFrame
1579 |
Target values of shape (n_samples,) or (n_samples, n_targets).
1580 |
Will be cast to X's dtype if necessary.
1581 |
Xresampled : ndarray | pandas.DataFrame
1582 |
Resampled training data, of shape (n_resampled, n_features),
1583 |
to generate a denoised data on. This
1584 |
will be used as the training data, rather than `X`.
1585 |
weights : ndarray | pandas.DataFrame
1586 |
Weight array of the same shape as `y`.
1587 |
Each element is how to weight the mean-square-error loss
1588 |
for that particular element of `y`. Alternatively,
1589 |
if a custom `loss` was set, it will can be used
1590 |
in arbitrary ways.
1591 |
variable_names : list[str]
1592 |
A list of names for the variables, rather than "x0", "x1", etc.
1593 |
If `X` is a pandas dataframe, the column names will be used
1594 |
instead of `variable_names`. Cannot contain spaces or special
@@ -1695,8 +1753,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1695 |
1696 |
1697 |
1698 |
checkpoint_file : str
1699 |
Path to checkpoint hall of fame file to be loaded.
1700 |
1701 |
if checkpoint_file:
1702 |
self.equation_file_ = checkpoint_file
@@ -1716,7 +1775,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1716 |
X : ndarray | pandas.DataFrame
1717 |
Training data of shape `(n_samples, n_features)`.
1718 |
1719 |
index : int | list[int]
1720 |
If you want to compute the output of an expression using a
1721 |
particular row of `self.equations_`, you may specify the index here.
1722 |
For multiple output equations, you must pass a list of indices
@@ -1784,7 +1843,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1784 |
1785 |
1786 |
1787 |
index : int | list[int]
1788 |
If you wish to select a particular equation from
1789 |
`self.equations_`, give the index number here. This overrides
1790 |
the `model_selection` parameter. If there are multiple output
@@ -1808,15 +1867,16 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1808 |
1809 |
1810 |
1811 |
index : int | list[int]
1812 |
If you wish to select a particular equation from
1813 |
`self.equations_`, give the index number here. This overrides
1814 |
the `model_selection` parameter. If there are multiple output
1815 |
features, then pass a list of indices with the order the same
1816 |
as the output feature.
1817 |
precision : int
1818 |
The number of significant figures shown in the LaTeX
1819 |
1820 |
1821 |
1822 |
@@ -1843,7 +1903,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1843 |
1844 |
1845 |
1846 |
index : int | list[int]
1847 |
If you wish to select a particular equation from
1848 |
`self.equations_`, give the index number here. This overrides
1849 |
the `model_selection` parameter. If there are multiple output
@@ -1874,7 +1934,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1874 |
1875 |
1876 |
1877 |
index : int | list[int]
1878 |
If you wish to select a particular equation from
1879 |
`self.equations_`, give the index number here. This overrides
1880 |
the `model_selection` parameter. If there are multiple output
@@ -2094,16 +2154,18 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
2094 |
2095 |
2096 |
2097 |
indices : list[int] | list[list[int]]
2098 |
If you wish to select a particular subset of equations from
2099 |
`self.equations_`, give the row numbers here. By default,
2100 |
all equations will be used. If there are multiple output
2101 |
features, then pass a list of lists.
2102 |
precision : int
2103 |
The number of significant figures shown in the LaTeX
2104 |
2105 |
2106 |
Which columns to include in the table.
2107 |
2108 |
2109 |
230 |
231 |
232 |
233 |
model_selection : str
234 |
Model selection criterion when selecting a final expression from
235 |
the list of best expression at each complexity.
236 |
Can be `'accuracy'`, `'best'`, or `'score'`. Default is `'best'`.
237 |
`'accuracy'` selects the candidate model with the lowest loss
238 |
(highest accuracy).
239 |
`'score'` selects the candidate model with the highest score.
240 |
Score is defined as the negated derivative of the log-loss with
241 |
respect to complexity - if an expression has a much better
242 |
loss at a slightly higher complexity, it is preferred.
243 |
`'best'` selects the candidate model with the highest score
244 |
among expressions with a loss better than at least 1.5x the
245 |
most accurate model.
246 |
binary_operators : list[str]
247 |
List of strings for binary operators used in the search.
248 |
See the [operators page](
249 |
for more details.
250 |
Default is `["+", "-", "*", "/"]`.
251 |
unary_operators : list[str]
252 |
Operators which only take a single scalar as input.
253 |
For example, `"cos"` or `"exp"`.
254 |
Default is `None`.
255 |
niterations : int
256 |
Number of iterations of the algorithm to run. The best
257 |
equations are printed and migrate between populations at the
258 |
end of each iteration.
259 |
Default is `40`.
260 |
populations : int
261 |
Number of populations running.
262 |
Default is `15`.
263 |
population_size : int
264 |
Number of individuals in each population.
265 |
Default is `33`.
266 |
max_evals : int
267 |
Limits the total number of evaluations of expressions to
268 |
this number. Default is `None`.
269 |
maxsize : int
270 |
Max complexity of an equation. Default is `20`.
271 |
maxdepth : int
272 |
Max depth of an equation. You can use both `maxsize` and
273 |
`maxdepth`. `maxdepth` is by default not used.
274 |
Default is `None`.
275 |
warmup_maxsize_by : float
276 |
Whether to slowly increase max size from a small number up to
277 |
the maxsize (if greater than 0). If greater than 0, says the
278 |
fraction of training time at which the current maxsize will
279 |
reach the user-passed maxsize.
280 |
Default is `0.0`.
281 |
timeout_in_seconds : float
282 |
Make the search return early once this many seconds have passed.
283 |
Default is `None`.
284 |
constraints : dict[str, int | tuple[int,int]]
285 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
286 |
maxsize constraints on the individual arguments of operators.
287 |
E.g., `'pow': (-1, 1)` says that power laws can have any
288 |
complexity left argument, but only 1 complexity in the right
289 |
argument. Use this to force more interpretable solutions.
290 |
Default is `None`.
291 |
nested_constraints : dict[str, dict]
292 |
Specifies how many times a combination of operators can be
293 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
294 |
specifies that `cos` may never appear within a `sin`, but `sin`
304 |
operators, you only need to provide a single number: both
305 |
arguments are treated the same way, and the max of each
306 |
argument is constrained.
307 |
Default is `None`.
308 |
loss : str
309 |
String of Julia code specifying the loss function. Can either
310 |
be a loss from LossFunctions.jl, or your own loss written as a
311 |
function. Examples of custom written losses include:
320 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
321 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
322 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
323 |
Default is `"L2DistLoss()"`.
324 |
complexity_of_operators : dict[str, float]
325 |
If you would like to use a complexity other than 1 for an
326 |
operator, specify the complexity here. For example,
327 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
329 |
the `+` operator (which is the default). You may specify real
330 |
numbers for a complexity, and the total complexity of a tree
331 |
will be rounded to the nearest integer after computing.
332 |
Default is `None`.
333 |
complexity_of_constants : float
334 |
Complexity of constants. Default is `1`.
335 |
complexity_of_variables : float
336 |
Complexity of variables. Default is `1`.
337 |
parsimony : float
338 |
Multiplicative factor for how much to punish complexity.
339 |
Default is `0.0032`.
340 |
use_frequency : bool
341 |
Whether to measure the frequency of complexities, and use that
342 |
instead of parsimony to explore equation space. Will naturally
343 |
find equations of all complexities.
344 |
Default is `True`.
345 |
use_frequency_in_tournament : bool
346 |
Whether to use the frequency mentioned above in the tournament,
347 |
rather than just the simulated annealing.
348 |
Default is `True`.
349 |
alpha : float
350 |
Initial temperature for simulated annealing
351 |
(requires `annealing` to be `True`).
352 |
Default is `0.1`.
353 |
annealing : bool
354 |
Whether to use annealing. Default is `False`.
355 |
early_stop_condition : float | str
356 |
Stop the search early if this loss is reached. You may also
357 |
pass a string containing a Julia function which
358 |
takes a loss and complexity as input, for example:
359 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
360 |
Default is `None`.
361 |
ncyclesperiteration : int
362 |
Number of total mutations to run, per 10 samples of the
363 |
population, per iteration.
364 |
Default is `550`.
365 |
fraction_replaced : float
366 |
How much of population to replace with migrating equations from
367 |
other populations.
368 |
Default is `0.000364`.
369 |
fraction_replaced_hof : float
370 |
How much of population to replace with migrating equations from
371 |
hall of fame. Default is `0.035`.
372 |
weight_add_node : float
373 |
Relative likelihood for mutation to add a node.
374 |
Default is `0.79`.
375 |
weight_insert_node : float
376 |
Relative likelihood for mutation to insert a node.
377 |
Default is `5.1`.
378 |
weight_delete_node : float
379 |
Relative likelihood for mutation to delete a node.
380 |
Default is `1.7`.
381 |
weight_do_nothing : float
382 |
Relative likelihood for mutation to leave the individual.
383 |
Default is `0.21`.
384 |
weight_mutate_constant : float
385 |
Relative likelihood for mutation to change the constant slightly
386 |
in a random direction.
387 |
Default is `0.048`.
388 |
weight_mutate_operator : float
389 |
Relative likelihood for mutation to swap an operator.
390 |
Default is `0.47`.
391 |
weight_randomize : float
392 |
Relative likelihood for mutation to completely delete and then
393 |
randomly generate the equation
394 |
Default is `0.00023`.
395 |
weight_simplify : float
396 |
Relative likelihood for mutation to simplify constant parts by evaluation
397 |
Default is `0.0020`.
398 |
crossover_probability : float
399 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
400 |
Default is `0.066`.
401 |
skip_mutation_failures : bool
402 |
Whether to skip mutation and crossover failures, rather than
403 |
simply re-sampling the current member.
404 |
Default is `True`.
405 |
migration : bool
406 |
Whether to migrate. Default is `True`.
407 |
hof_migration : bool
408 |
Whether to have the hall of fame migrate. Default is `True`.
409 |
topn : int
410 |
How many top individuals migrate from each population.
411 |
Default is `12`.
412 |
should_optimize_constants : bool
413 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
414 |
at the end of each iteration. Default is `True`.
415 |
optimizer_algorithm : str
416 |
Optimization scheme to use for optimizing constants. Can currently
417 |
be `NelderMead` or `BFGS`.
418 |
Default is `"BFGS"`.
419 |
optimizer_nrestarts : int
420 |
Number of time to restart the constants optimization process with
421 |
different initial conditions.
422 |
Default is `2`.
423 |
optimize_probability : float
424 |
Probability of optimizing the constants during a single iteration of
425 |
the evolutionary algorithm.
426 |
Default is `0.14`.
427 |
optimizer_iterations : int
428 |
Number of iterations that the constants optimizer can take.
429 |
Default is `8`.
430 |
perturbation_factor : float
431 |
Constants are perturbed by a max factor of
432 |
(perturbation_factor*T + 1). Either multiplied by this or
433 |
divided by this.
434 |
Default is `0.076`.
435 |
tournament_selection_n : int
436 |
Number of expressions to consider in each tournament.
437 |
Default is `10`.
438 |
tournament_selection_p : float
439 |
Probability of selecting the best expression in each
440 |
tournament. The probability will decay as p*(1-p)^n for other
441 |
expressions, sorted by loss.
442 |
Default is `0.86`.
443 |
procs : int
444 |
Number of processes (=number of populations running).
445 |
Default is `cpu_count()`.
446 |
multithreading : bool
447 |
Use multithreading instead of distributed backend.
448 |
Using procs=0 will turn off both. Default is `True`.
449 |
cluster_manager : str
450 |
For distributed computing, this sets the job queue system. Set
451 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
452 |
"htc". If set to one of these, PySR will run in distributed
453 |
mode, and use `procs` to figure out how many processes to launch.
454 |
Default is `None`.
455 |
batching : bool
456 |
Whether to compare population members on small batches during
457 |
evolution. Still uses full dataset for comparing against hall
458 |
of fame. Default is `False`.
459 |
batch_size : int
460 |
The amount of data to use if doing batching. Default is `50`.
461 |
fast_cycle : bool
462 |
Batch over population subsamples. This is a slightly different
463 |
algorithm than regularized evolution, but does cycles 15%
464 |
faster. May be algorithmically less efficient.
465 |
Default is `False`.
466 |
precision : int
467 |
What precision to use for the data. By default this is `32`
468 |
(float32), but you can select `64` or `16` as well, giving
469 |
you 64 or 16 bits of floating point precision, respectively.
470 |
Default is `32`.
471 |
random_state : int, Numpy RandomState instance or None
472 |
Pass an int for reproducible results across multiple function calls.
473 |
See :term:`Glossary <random_state>`.
474 |
Default is `None`.
475 |
deterministic : bool
476 |
Make a PySR search give the same result every run.
477 |
To use this, you must turn off parallelism
478 |
(with `procs`=0, `multithreading`=False),
479 |
and set `random_state` to a fixed seed.
480 |
Default is `False`.
481 |
warm_start : bool
482 |
Tells fit to continue from where the last call to fit finished.
483 |
If false, each call to fit will be fresh, overwriting previous results.
484 |
Default is `False`.
485 |
verbosity : int
486 |
What verbosity level to use. 0 means minimal print statements.
487 |
Default is `1e9`.
488 |
update_verbosity : int
489 |
What verbosity level to use for package updates.
490 |
Will take value of `verbosity` if not given.
491 |
Default is `None`.
492 |
progress : bool
493 |
Whether to use a progress bar instead of printing to stdout.
494 |
Default is `True`.
495 |
equation_file : str
496 |
Where to save the files (.csv extension).
497 |
Default is `None`.
498 |
temp_equation_file : bool
499 |
Whether to put the hall of fame file in the temp directory.
500 |
Deletion is then controlled with the `delete_tempfiles`
501 |
502 |
Default is `False`.
503 |
tempdir : str
504 |
directory for the temporary files. Default is `None`.
505 |
delete_tempfiles : bool
506 |
Whether to delete the temporary files after finishing.
507 |
Default is `True`.
508 |
julia_project : str
509 |
A Julia environment location containing a Project.toml
510 |
(and potentially the source code for SymbolicRegression.jl).
511 |
Default gives the Python package directory, where a
512 |
Project.toml file should be present from the install.
513 |
update: bool
514 |
Whether to automatically update Julia packages.
515 |
Default is `True`.
516 |
output_jax_format : bool
517 |
Whether to create a 'jax_format' column in the output,
518 |
containing jax-callable functions and the default parameters in
519 |
a jax array.
520 |
Default is `False`.
521 |
output_torch_format : bool
522 |
Whether to create a 'torch_format' column in the output,
523 |
containing a torch module with trainable parameters.
524 |
Default is `False`.
525 |
extra_sympy_mappings : dict[str, Callable]
526 |
Provides mappings between custom `binary_operators` or
527 |
`unary_operators` defined in julia strings, to those same
528 |
operators defined in sympy.
529 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
530 |
model to be export to sympy, `extra_sympy_mappings`
531 |
would be `{"inv": lambda x: 1/x}`.
532 |
Default is `None`.
533 |
extra_jax_mappings : dict[Callable, str]
534 |
Similar to `extra_sympy_mappings` but for model export
535 |
to jax. The dictionary maps sympy functions to jax functions.
536 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
537 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
538 |
Default is `None`.
539 |
extra_torch_mappings : dict[Callable, Callable]
540 |
The same as `extra_jax_mappings` but for model export
541 |
to pytorch. Note that the dictionary keys should be callable
542 |
pytorch expressions.
543 |
For example: `extra_torch_mappings={sympy.sin: torch.sin}`.
544 |
Default is `None`.
545 |
denoise : bool
546 |
Whether to use a Gaussian Process to denoise the data before
547 |
inputting to PySR. Can help PySR fit noisy data.
548 |
Default is `False`.
549 |
select_k_features : int
550 |
whether to run feature selection in Python using random forests,
551 |
before passing to the symbolic regression code. None means no
552 |
feature selection; an int means select that many features.
553 |
Default is `None`.
554 |
**kwargs : dict
555 |
Supports deprecated keyword arguments. Other arguments will
556 |
result in an error.
557 |
558 |
559 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
850 |
selection_mask : list[bool]
851 |
If using select_k_features, you must pass `model.selection_mask_` here.
852 |
Not needed if loading from a pickle file.
853 |
nout : int
854 |
Number of outputs of the model.
855 |
Not needed if loading from a pickle file.
856 |
Default is `1`.
857 |
**pysr_kwargs : dict
858 |
Any other keyword arguments to initialize the PySRRegressor object.
859 |
These will overwrite those stored in the pickle file.
1057 |
1058 |
1059 |
1060 |
index : int | list[int]
1061 |
If you wish to select a particular equation from `self.equations_`,
1062 |
give the row number here. This overrides the `model_selection`
1063 |
parameter. If there are multiple output features, then pass
1229 |
y : ndarray | pandas.DataFrame}
1230 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
1231 |
Will be cast to `X`'s dtype if necessary.
1232 |
Xresampled : ndarray | pandas.DataFrame
1233 |
Resampled training data used for denoising,
1234 |
of shape `(n_resampled, n_features)`.
1235 |
weights : ndarray | pandas.DataFrame
1236 |
Weight array of the same shape as `y`.
1237 |
Each element is how to weight the mean-square-error loss
1310 |
y : ndarray | pandas.DataFrame
1311 |
Target values of shape (n_samples,) or (n_samples, n_targets).
1312 |
Will be cast to X's dtype if necessary.
1313 |
Xresampled : ndarray | pandas.DataFrame
1314 |
Resampled training data, of shape `(n_resampled, n_features)`,
1315 |
used for denoising.
1316 |
variable_names : list[str]
1317 |
Names of each variable in the training dataset, `X`.
1318 |
Of length `n_features`.
1319 |
random_state : int | np.RandomState
1320 |
Pass an int for reproducible results across multiple function calls.
1321 |
See :term:`Glossary <random_state>`. Default is `None`.
1322 |
1323 |
1324 |
1636 |
y : ndarray | pandas.DataFrame
1637 |
Target values of shape (n_samples,) or (n_samples, n_targets).
1638 |
Will be cast to X's dtype if necessary.
1639 |
Xresampled : ndarray | pandas.DataFrame
1640 |
Resampled training data, of shape (n_resampled, n_features),
1641 |
to generate a denoised data on. This
1642 |
will be used as the training data, rather than `X`.
1643 |
weights : ndarray | pandas.DataFrame
1644 |
Weight array of the same shape as `y`.
1645 |
Each element is how to weight the mean-square-error loss
1646 |
for that particular element of `y`. Alternatively,
1647 |
if a custom `loss` was set, it will can be used
1648 |
in arbitrary ways.
1649 |
variable_names : list[str]
1650 |
A list of names for the variables, rather than "x0", "x1", etc.
1651 |
If `X` is a pandas dataframe, the column names will be used
1652 |
instead of `variable_names`. Cannot contain spaces or special
1753 |
1754 |
1755 |
1756 |
checkpoint_file : str
1757 |
Path to checkpoint hall of fame file to be loaded.
1758 |
The default will use the set `equation_file_`.
1759 |
1760 |
if checkpoint_file:
1761 |
self.equation_file_ = checkpoint_file
1775 |
X : ndarray | pandas.DataFrame
1776 |
Training data of shape `(n_samples, n_features)`.
1777 |
1778 |
index : int | list[int]
1779 |
If you want to compute the output of an expression using a
1780 |
particular row of `self.equations_`, you may specify the index here.
1781 |
For multiple output equations, you must pass a list of indices
1843 |
1844 |
1845 |
1846 |
index : int | list[int]
1847 |
If you wish to select a particular equation from
1848 |
`self.equations_`, give the index number here. This overrides
1849 |
the `model_selection` parameter. If there are multiple output
1867 |
1868 |
1869 |
1870 |
index : int | list[int]
1871 |
If you wish to select a particular equation from
1872 |
`self.equations_`, give the index number here. This overrides
1873 |
the `model_selection` parameter. If there are multiple output
1874 |
features, then pass a list of indices with the order the same
1875 |
as the output feature.
1876 |
precision : int
1877 |
The number of significant figures shown in the LaTeX
1878 |
1879 |
Default is `3`.
1880 |
1881 |
1882 |
1903 |
1904 |
1905 |
1906 |
index : int | list[int]
1907 |
If you wish to select a particular equation from
1908 |
`self.equations_`, give the index number here. This overrides
1909 |
the `model_selection` parameter. If there are multiple output
1934 |
1935 |
1936 |
1937 |
index : int | list[int]
1938 |
If you wish to select a particular equation from
1939 |
`self.equations_`, give the index number here. This overrides
1940 |
the `model_selection` parameter. If there are multiple output
2154 |
2155 |
2156 |
2157 |
indices : list[int] | list[list[int]]
2158 |
If you wish to select a particular subset of equations from
2159 |
`self.equations_`, give the row numbers here. By default,
2160 |
all equations will be used. If there are multiple output
2161 |
features, then pass a list of lists.
2162 |
precision : int
2163 |
The number of significant figures shown in the LaTeX
2164 |
2165 |
Default is `3`.
2166 |
columns : list[str]
2167 |
Which columns to include in the table.
2168 |
Default is `["equation", "complexity", "loss", "score"]`.
2169 |
2170 |
2171 |