MilesCranmer commited on
Commit
408a63c
·
1 Parent(s): cdd291e

Clean up main docstrings

Browse files
Files changed (1) hide show
  1. pysr/sr.py +185 -123
pysr/sr.py CHANGED
@@ -230,57 +230,65 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
230
 
231
  Parameters
232
  ----------
233
- model_selection : str, default="best"
234
  Model selection criterion when selecting a final expression from
235
  the list of best expression at each complexity.
236
- Can be 'accuracy', 'best', or 'score'.
237
-
238
- - `"accuracy"` selects the candidate model with the lowest loss
239
- (highest accuracy).
240
- - `"score"` selects the candidate model with the highest score.
241
- Score is defined as the negated derivative of the log-loss with
242
- respect to complexity - if an expression has a much better
243
- loss at a slightly higher complexity, it is preferred.
244
- - `"best"` selects the candidate model with the highest score
245
- among expressions with a loss better than at least 1.5x the
246
- most accurate model.
247
- binary_operators : list[str], default=["+", "-", "*", "/"]
248
  List of strings for binary operators used in the search.
249
  See the [operators page](https://astroautomata.com/PySR/operators/)
250
  for more details.
251
- unary_operators : list[str], default=None
 
252
  Operators which only take a single scalar as input.
253
  For example, `"cos"` or `"exp"`.
254
- niterations : int, default=40
 
255
  Number of iterations of the algorithm to run. The best
256
  equations are printed and migrate between populations at the
257
  end of each iteration.
258
- populations : int, default=15
 
259
  Number of populations running.
260
- population_size : int, default=33
 
261
  Number of individuals in each population.
262
- max_evals : int, default=None
 
263
  Limits the total number of evaluations of expressions to
264
- this number.
265
- maxsize : int, default=20
266
- Max complexity of an equation.
267
- maxdepth : int, default=None
268
  Max depth of an equation. You can use both `maxsize` and
269
  `maxdepth`. `maxdepth` is by default not used.
270
- warmup_maxsize_by : float, default=0.0
 
271
  Whether to slowly increase max size from a small number up to
272
  the maxsize (if greater than 0). If greater than 0, says the
273
  fraction of training time at which the current maxsize will
274
  reach the user-passed maxsize.
275
- timeout_in_seconds : float, default=None
 
276
  Make the search return early once this many seconds have passed.
277
- constraints : dict[str, int | tuple[int,int]], default=None
 
278
  Dictionary of int (unary) or 2-tuples (binary), this enforces
279
  maxsize constraints on the individual arguments of operators.
280
  E.g., `'pow': (-1, 1)` says that power laws can have any
281
  complexity left argument, but only 1 complexity in the right
282
  argument. Use this to force more interpretable solutions.
283
- nested_constraints : dict[str, dict], default=None
 
284
  Specifies how many times a combination of operators can be
285
  nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
286
  specifies that `cos` may never appear within a `sin`, but `sin`
@@ -296,7 +304,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
296
  operators, you only need to provide a single number: both
297
  arguments are treated the same way, and the max of each
298
  argument is constrained.
299
- loss : str, default="L2DistLoss()"
 
300
  String of Julia code specifying the loss function. Can either
301
  be a loss from LossFunctions.jl, or your own loss written as a
302
  function. Examples of custom written losses include:
@@ -311,7 +320,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
311
  `L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
312
  `ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
313
  `SigmoidLoss()`, `DWDMarginLoss(q)`.
314
- complexity_of_operators : dict[str, float], default=None
 
315
  If you would like to use a complexity other than 1 for an
316
  operator, specify the complexity here. For example,
317
  `{"sin": 2, "+": 1}` would give a complexity of 2 for each use
@@ -319,184 +329,231 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
319
  the `+` operator (which is the default). You may specify real
320
  numbers for a complexity, and the total complexity of a tree
321
  will be rounded to the nearest integer after computing.
322
- complexity_of_constants : float, default=1
323
- Complexity of constants.
324
- complexity_of_variables : float, default=1
325
- Complexity of variables.
326
- parsimony : float, default=0.0032
 
327
  Multiplicative factor for how much to punish complexity.
328
- use_frequency : bool, default=True
 
329
  Whether to measure the frequency of complexities, and use that
330
  instead of parsimony to explore equation space. Will naturally
331
  find equations of all complexities.
332
- use_frequency_in_tournament : bool, default=True
 
333
  Whether to use the frequency mentioned above in the tournament,
334
  rather than just the simulated annealing.
335
- alpha : float, default=0.1
 
336
  Initial temperature for simulated annealing
337
  (requires `annealing` to be `True`).
338
- annealing : bool, default=False
339
- Whether to use annealing.
340
- early_stop_condition : float | str, default=None
 
341
  Stop the search early if this loss is reached. You may also
342
  pass a string containing a Julia function which
343
  takes a loss and complexity as input, for example:
344
  `"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
345
- ncyclesperiteration : int, default=550
 
346
  Number of total mutations to run, per 10 samples of the
347
  population, per iteration.
348
- fraction_replaced : float, default=0.000364
 
349
  How much of population to replace with migrating equations from
350
  other populations.
351
- fraction_replaced_hof : float, default=0.035
 
352
  How much of population to replace with migrating equations from
353
- hall of fame.
354
- weight_add_node : float, default=0.79
355
  Relative likelihood for mutation to add a node.
356
- weight_insert_node : float, default=5.1
 
357
  Relative likelihood for mutation to insert a node.
358
- weight_delete_node : float, default=1.7
 
359
  Relative likelihood for mutation to delete a node.
360
- weight_do_nothing : float, default=0.21
 
361
  Relative likelihood for mutation to leave the individual.
362
- weight_mutate_constant : float, default=0.048
 
363
  Relative likelihood for mutation to change the constant slightly
364
  in a random direction.
365
- weight_mutate_operator : float, default=0.47
 
366
  Relative likelihood for mutation to swap an operator.
367
- weight_randomize : float, default=0.00023
 
368
  Relative likelihood for mutation to completely delete and then
369
  randomly generate the equation
370
- weight_simplify : float, default=0.0020
 
371
  Relative likelihood for mutation to simplify constant parts by evaluation
372
- crossover_probability : float, default=0.066
 
373
  Absolute probability of crossover-type genetic operation, instead of a mutation.
374
- skip_mutation_failures : bool, default=True
 
375
  Whether to skip mutation and crossover failures, rather than
376
  simply re-sampling the current member.
377
- migration : bool, default=True
378
- Whether to migrate.
379
- hof_migration : bool, default=True
380
- Whether to have the hall of fame migrate.
381
- topn : int, default=12
 
382
  How many top individuals migrate from each population.
383
- should_optimize_constants : bool, default=True
 
384
  Whether to numerically optimize constants (Nelder-Mead/Newton)
385
- at the end of each iteration.
386
- optimizer_algorithm : str, default="BFGS"
387
  Optimization scheme to use for optimizing constants. Can currently
388
  be `NelderMead` or `BFGS`.
389
- optimizer_nrestarts : int, default=2
 
390
  Number of time to restart the constants optimization process with
391
  different initial conditions.
392
- optimize_probability : float, default=0.14
 
393
  Probability of optimizing the constants during a single iteration of
394
  the evolutionary algorithm.
395
- optimizer_iterations : int, default=8
 
396
  Number of iterations that the constants optimizer can take.
397
- perturbation_factor : float, default=0.076
 
398
  Constants are perturbed by a max factor of
399
  (perturbation_factor*T + 1). Either multiplied by this or
400
  divided by this.
401
- tournament_selection_n : int, default=10
 
402
  Number of expressions to consider in each tournament.
403
- tournament_selection_p : float, default=0.86
 
404
  Probability of selecting the best expression in each
405
  tournament. The probability will decay as p*(1-p)^n for other
406
  expressions, sorted by loss.
407
- procs : int, default=multiprocessing.cpu_count()
 
408
  Number of processes (=number of populations running).
409
- multithreading : bool, default=True
 
410
  Use multithreading instead of distributed backend.
411
- Using procs=0 will turn off both.
412
- cluster_manager : str, default=None
413
  For distributed computing, this sets the job queue system. Set
414
  to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
415
  "htc". If set to one of these, PySR will run in distributed
416
  mode, and use `procs` to figure out how many processes to launch.
417
- batching : bool, default=False
 
418
  Whether to compare population members on small batches during
419
  evolution. Still uses full dataset for comparing against hall
420
- of fame.
421
- batch_size : int, default=50
422
- The amount of data to use if doing batching.
423
- fast_cycle : bool, default=False (experimental)
424
  Batch over population subsamples. This is a slightly different
425
  algorithm than regularized evolution, but does cycles 15%
426
  faster. May be algorithmically less efficient.
427
- precision : int, default=32
428
- What precision to use for the data. By default this is 32
429
- (float32), but you can select 64 or 16 as well.
430
- random_state : int, Numpy RandomState instance or None, default=None
 
 
 
431
  Pass an int for reproducible results across multiple function calls.
432
  See :term:`Glossary <random_state>`.
433
- deterministic : bool, default=False
 
434
  Make a PySR search give the same result every run.
435
  To use this, you must turn off parallelism
436
  (with `procs`=0, `multithreading`=False),
437
  and set `random_state` to a fixed seed.
438
- warm_start : bool, default=False
 
439
  Tells fit to continue from where the last call to fit finished.
440
  If false, each call to fit will be fresh, overwriting previous results.
441
- verbosity : int, default=1e9
 
442
  What verbosity level to use. 0 means minimal print statements.
443
- update_verbosity : int, default=None
 
444
  What verbosity level to use for package updates.
445
  Will take value of `verbosity` if not given.
446
- progress : bool, default=True
 
447
  Whether to use a progress bar instead of printing to stdout.
448
- equation_file : str, default=None
 
449
  Where to save the files (.csv extension).
450
- temp_equation_file : bool, default=False
 
451
  Whether to put the hall of fame file in the temp directory.
452
  Deletion is then controlled with the `delete_tempfiles`
453
  parameter.
454
- tempdir : str, default=None
455
- directory for the temporary files.
456
- delete_tempfiles : bool, default=True
 
457
  Whether to delete the temporary files after finishing.
458
- julia_project : str, default=None
 
459
  A Julia environment location containing a Project.toml
460
  (and potentially the source code for SymbolicRegression.jl).
461
  Default gives the Python package directory, where a
462
  Project.toml file should be present from the install.
463
- update: bool, default=True
464
  Whether to automatically update Julia packages.
465
- output_jax_format : bool, default=False
 
466
  Whether to create a 'jax_format' column in the output,
467
  containing jax-callable functions and the default parameters in
468
  a jax array.
469
- output_torch_format : bool, default=False
 
470
  Whether to create a 'torch_format' column in the output,
471
  containing a torch module with trainable parameters.
472
- extra_sympy_mappings : dict[str, Callable], default=None
 
473
  Provides mappings between custom `binary_operators` or
474
  `unary_operators` defined in julia strings, to those same
475
  operators defined in sympy.
476
  E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
477
  model to be export to sympy, `extra_sympy_mappings`
478
  would be `{"inv": lambda x: 1/x}`.
479
- extra_jax_mappings : dict[Callable, str], default=None
 
480
  Similar to `extra_sympy_mappings` but for model export
481
  to jax. The dictionary maps sympy functions to jax functions.
482
  For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
483
  the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
484
- extra_torch_mappings : dict[Callable, Callable], default=None
 
485
  The same as `extra_jax_mappings` but for model export
486
  to pytorch. Note that the dictionary keys should be callable
487
  pytorch expressions.
488
- For example: `extra_torch_mappings={sympy.sin: torch.sin}`
489
- denoise : bool, default=False
 
490
  Whether to use a Gaussian Process to denoise the data before
491
  inputting to PySR. Can help PySR fit noisy data.
492
- select_k_features : int, default=None
 
493
  whether to run feature selection in Python using random forests,
494
  before passing to the symbolic regression code. None means no
495
  feature selection; an int means select that many features.
496
- **kwargs : dict, default=None
 
497
  Supports deprecated keyword arguments. Other arguments will
498
  result in an error.
499
-
500
  Attributes
501
  ----------
502
  equations_ : pandas.DataFrame | list[pandas.DataFrame]
@@ -793,9 +850,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
793
  selection_mask : list[bool]
794
  If using select_k_features, you must pass `model.selection_mask_` here.
795
  Not needed if loading from a pickle file.
796
- nout : int, default=1
797
  Number of outputs of the model.
798
  Not needed if loading from a pickle file.
 
799
  **pysr_kwargs : dict
800
  Any other keyword arguments to initialize the PySRRegressor object.
801
  These will overwrite those stored in the pickle file.
@@ -999,7 +1057,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
999
 
1000
  Parameters
1001
  ----------
1002
- index : int | list[int], default=None
1003
  If you wish to select a particular equation from `self.equations_`,
1004
  give the row number here. This overrides the `model_selection`
1005
  parameter. If there are multiple output features, then pass
@@ -1171,9 +1229,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1171
  y : ndarray | pandas.DataFrame}
1172
  Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
1173
  Will be cast to `X`'s dtype if necessary.
1174
- Xresampled : ndarray | pandas.DataFrame of shape
1175
- (n_resampled, n_features), default=None
1176
- Resampled training data used for denoising.
1177
  weights : ndarray | pandas.DataFrame
1178
  Weight array of the same shape as `y`.
1179
  Each element is how to weight the mean-square-error loss
@@ -1252,15 +1310,15 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1252
  y : ndarray | pandas.DataFrame
1253
  Target values of shape (n_samples,) or (n_samples, n_targets).
1254
  Will be cast to X's dtype if necessary.
1255
- Xresampled : ndarray | pandas.DataFrame, default=None
1256
  Resampled training data, of shape `(n_resampled, n_features)`,
1257
  used for denoising.
1258
  variable_names : list[str]
1259
  Names of each variable in the training dataset, `X`.
1260
  Of length `n_features`.
1261
- random_state : int, Numpy RandomState instance or None, default=None
1262
  Pass an int for reproducible results across multiple function calls.
1263
- See :term:`Glossary <random_state>`.
1264
 
1265
  Returns
1266
  -------
@@ -1578,17 +1636,17 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1578
  y : ndarray | pandas.DataFrame
1579
  Target values of shape (n_samples,) or (n_samples, n_targets).
1580
  Will be cast to X's dtype if necessary.
1581
- Xresampled : ndarray | pandas.DataFrame, default=None
1582
  Resampled training data, of shape (n_resampled, n_features),
1583
  to generate a denoised data on. This
1584
  will be used as the training data, rather than `X`.
1585
- weights : ndarray | pandas.DataFrame, default=None
1586
  Weight array of the same shape as `y`.
1587
  Each element is how to weight the mean-square-error loss
1588
  for that particular element of `y`. Alternatively,
1589
  if a custom `loss` was set, it will can be used
1590
  in arbitrary ways.
1591
- variable_names : list[str], default=None
1592
  A list of names for the variables, rather than "x0", "x1", etc.
1593
  If `X` is a pandas dataframe, the column names will be used
1594
  instead of `variable_names`. Cannot contain spaces or special
@@ -1695,8 +1753,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1695
 
1696
  Parameters
1697
  ----------
1698
- checkpoint_file : str, default=None
1699
  Path to checkpoint hall of fame file to be loaded.
 
1700
  """
1701
  if checkpoint_file:
1702
  self.equation_file_ = checkpoint_file
@@ -1716,7 +1775,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1716
  X : ndarray | pandas.DataFrame
1717
  Training data of shape `(n_samples, n_features)`.
1718
 
1719
- index : int | list[int], default=None
1720
  If you want to compute the output of an expression using a
1721
  particular row of `self.equations_`, you may specify the index here.
1722
  For multiple output equations, you must pass a list of indices
@@ -1784,7 +1843,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1784
 
1785
  Parameters
1786
  ----------
1787
- index : int | list[int], default=None
1788
  If you wish to select a particular equation from
1789
  `self.equations_`, give the index number here. This overrides
1790
  the `model_selection` parameter. If there are multiple output
@@ -1808,15 +1867,16 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1808
 
1809
  Parameters
1810
  ----------
1811
- index : int | list[int], default=None
1812
  If you wish to select a particular equation from
1813
  `self.equations_`, give the index number here. This overrides
1814
  the `model_selection` parameter. If there are multiple output
1815
  features, then pass a list of indices with the order the same
1816
  as the output feature.
1817
- precision : int, default=3
1818
  The number of significant figures shown in the LaTeX
1819
  representation.
 
1820
 
1821
  Returns
1822
  -------
@@ -1843,7 +1903,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1843
 
1844
  Parameters
1845
  ----------
1846
- index : int | list[int], default=None
1847
  If you wish to select a particular equation from
1848
  `self.equations_`, give the index number here. This overrides
1849
  the `model_selection` parameter. If there are multiple output
@@ -1874,7 +1934,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1874
 
1875
  Parameters
1876
  ----------
1877
- index : int | list[int], default=None
1878
  If you wish to select a particular equation from
1879
  `self.equations_`, give the index number here. This overrides
1880
  the `model_selection` parameter. If there are multiple output
@@ -2094,16 +2154,18 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
2094
 
2095
  Parameters
2096
  ----------
2097
- indices : list[int] | list[list[int]], default=None
2098
  If you wish to select a particular subset of equations from
2099
  `self.equations_`, give the row numbers here. By default,
2100
  all equations will be used. If there are multiple output
2101
  features, then pass a list of lists.
2102
- precision : int, default=3
2103
  The number of significant figures shown in the LaTeX
2104
  representations.
2105
- columns : list[str], default=["equation", "complexity", "loss", "score"]
 
2106
  Which columns to include in the table.
 
2107
 
2108
  Returns
2109
  -------
 
230
 
231
  Parameters
232
  ----------
233
+ model_selection : str
234
  Model selection criterion when selecting a final expression from
235
  the list of best expression at each complexity.
236
+ Can be `'accuracy'`, `'best'`, or `'score'`. Default is `'best'`.
237
+ `'accuracy'` selects the candidate model with the lowest loss
238
+ (highest accuracy).
239
+ `'score'` selects the candidate model with the highest score.
240
+ Score is defined as the negated derivative of the log-loss with
241
+ respect to complexity - if an expression has a much better
242
+ loss at a slightly higher complexity, it is preferred.
243
+ `'best'` selects the candidate model with the highest score
244
+ among expressions with a loss better than at least 1.5x the
245
+ most accurate model.
246
+ binary_operators : list[str]
 
247
  List of strings for binary operators used in the search.
248
  See the [operators page](https://astroautomata.com/PySR/operators/)
249
  for more details.
250
+ Default is `["+", "-", "*", "/"]`.
251
+ unary_operators : list[str]
252
  Operators which only take a single scalar as input.
253
  For example, `"cos"` or `"exp"`.
254
+ Default is `None`.
255
+ niterations : int
256
  Number of iterations of the algorithm to run. The best
257
  equations are printed and migrate between populations at the
258
  end of each iteration.
259
+ Default is `40`.
260
+ populations : int
261
  Number of populations running.
262
+ Default is `15`.
263
+ population_size : int
264
  Number of individuals in each population.
265
+ Default is `33`.
266
+ max_evals : int
267
  Limits the total number of evaluations of expressions to
268
+ this number. Default is `None`.
269
+ maxsize : int
270
+ Max complexity of an equation. Default is `20`.
271
+ maxdepth : int
272
  Max depth of an equation. You can use both `maxsize` and
273
  `maxdepth`. `maxdepth` is by default not used.
274
+ Default is `None`.
275
+ warmup_maxsize_by : float
276
  Whether to slowly increase max size from a small number up to
277
  the maxsize (if greater than 0). If greater than 0, says the
278
  fraction of training time at which the current maxsize will
279
  reach the user-passed maxsize.
280
+ Default is `0.0`.
281
+ timeout_in_seconds : float
282
  Make the search return early once this many seconds have passed.
283
+ Default is `None`.
284
+ constraints : dict[str, int | tuple[int,int]]
285
  Dictionary of int (unary) or 2-tuples (binary), this enforces
286
  maxsize constraints on the individual arguments of operators.
287
  E.g., `'pow': (-1, 1)` says that power laws can have any
288
  complexity left argument, but only 1 complexity in the right
289
  argument. Use this to force more interpretable solutions.
290
+ Default is `None`.
291
+ nested_constraints : dict[str, dict]
292
  Specifies how many times a combination of operators can be
293
  nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
294
  specifies that `cos` may never appear within a `sin`, but `sin`
 
304
  operators, you only need to provide a single number: both
305
  arguments are treated the same way, and the max of each
306
  argument is constrained.
307
+ Default is `None`.
308
+ loss : str
309
  String of Julia code specifying the loss function. Can either
310
  be a loss from LossFunctions.jl, or your own loss written as a
311
  function. Examples of custom written losses include:
 
320
  `L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
321
  `ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
322
  `SigmoidLoss()`, `DWDMarginLoss(q)`.
323
+ Default is `"L2DistLoss()"`.
324
+ complexity_of_operators : dict[str, float]
325
  If you would like to use a complexity other than 1 for an
326
  operator, specify the complexity here. For example,
327
  `{"sin": 2, "+": 1}` would give a complexity of 2 for each use
 
329
  the `+` operator (which is the default). You may specify real
330
  numbers for a complexity, and the total complexity of a tree
331
  will be rounded to the nearest integer after computing.
332
+ Default is `None`.
333
+ complexity_of_constants : float
334
+ Complexity of constants. Default is `1`.
335
+ complexity_of_variables : float
336
+ Complexity of variables. Default is `1`.
337
+ parsimony : float
338
  Multiplicative factor for how much to punish complexity.
339
+ Default is `0.0032`.
340
+ use_frequency : bool
341
  Whether to measure the frequency of complexities, and use that
342
  instead of parsimony to explore equation space. Will naturally
343
  find equations of all complexities.
344
+ Default is `True`.
345
+ use_frequency_in_tournament : bool
346
  Whether to use the frequency mentioned above in the tournament,
347
  rather than just the simulated annealing.
348
+ Default is `True`.
349
+ alpha : float
350
  Initial temperature for simulated annealing
351
  (requires `annealing` to be `True`).
352
+ Default is `0.1`.
353
+ annealing : bool
354
+ Whether to use annealing. Default is `False`.
355
+ early_stop_condition : float | str
356
  Stop the search early if this loss is reached. You may also
357
  pass a string containing a Julia function which
358
  takes a loss and complexity as input, for example:
359
  `"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
360
+ Default is `None`.
361
+ ncyclesperiteration : int
362
  Number of total mutations to run, per 10 samples of the
363
  population, per iteration.
364
+ Default is `550`.
365
+ fraction_replaced : float
366
  How much of population to replace with migrating equations from
367
  other populations.
368
+ Default is `0.000364`.
369
+ fraction_replaced_hof : float
370
  How much of population to replace with migrating equations from
371
+ hall of fame. Default is `0.035`.
372
+ weight_add_node : float
373
  Relative likelihood for mutation to add a node.
374
+ Default is `0.79`.
375
+ weight_insert_node : float
376
  Relative likelihood for mutation to insert a node.
377
+ Default is `5.1`.
378
+ weight_delete_node : float
379
  Relative likelihood for mutation to delete a node.
380
+ Default is `1.7`.
381
+ weight_do_nothing : float
382
  Relative likelihood for mutation to leave the individual.
383
+ Default is `0.21`.
384
+ weight_mutate_constant : float
385
  Relative likelihood for mutation to change the constant slightly
386
  in a random direction.
387
+ Default is `0.048`.
388
+ weight_mutate_operator : float
389
  Relative likelihood for mutation to swap an operator.
390
+ Default is `0.47`.
391
+ weight_randomize : float
392
  Relative likelihood for mutation to completely delete and then
393
  randomly generate the equation
394
+ Default is `0.00023`.
395
+ weight_simplify : float
396
  Relative likelihood for mutation to simplify constant parts by evaluation
397
+ Default is `0.0020`.
398
+ crossover_probability : float
399
  Absolute probability of crossover-type genetic operation, instead of a mutation.
400
+ Default is `0.066`.
401
+ skip_mutation_failures : bool
402
  Whether to skip mutation and crossover failures, rather than
403
  simply re-sampling the current member.
404
+ Default is `True`.
405
+ migration : bool
406
+ Whether to migrate. Default is `True`.
407
+ hof_migration : bool
408
+ Whether to have the hall of fame migrate. Default is `True`.
409
+ topn : int
410
  How many top individuals migrate from each population.
411
+ Default is `12`.
412
+ should_optimize_constants : bool
413
  Whether to numerically optimize constants (Nelder-Mead/Newton)
414
+ at the end of each iteration. Default is `True`.
415
+ optimizer_algorithm : str
416
  Optimization scheme to use for optimizing constants. Can currently
417
  be `NelderMead` or `BFGS`.
418
+ Default is `"BFGS"`.
419
+ optimizer_nrestarts : int
420
  Number of time to restart the constants optimization process with
421
  different initial conditions.
422
+ Default is `2`.
423
+ optimize_probability : float
424
  Probability of optimizing the constants during a single iteration of
425
  the evolutionary algorithm.
426
+ Default is `0.14`.
427
+ optimizer_iterations : int
428
  Number of iterations that the constants optimizer can take.
429
+ Default is `8`.
430
+ perturbation_factor : float
431
  Constants are perturbed by a max factor of
432
  (perturbation_factor*T + 1). Either multiplied by this or
433
  divided by this.
434
+ Default is `0.076`.
435
+ tournament_selection_n : int
436
  Number of expressions to consider in each tournament.
437
+ Default is `10`.
438
+ tournament_selection_p : float
439
  Probability of selecting the best expression in each
440
  tournament. The probability will decay as p*(1-p)^n for other
441
  expressions, sorted by loss.
442
+ Default is `0.86`.
443
+ procs : int
444
  Number of processes (=number of populations running).
445
+ Default is `cpu_count()`.
446
+ multithreading : bool
447
  Use multithreading instead of distributed backend.
448
+ Using procs=0 will turn off both. Default is `True`.
449
+ cluster_manager : str
450
  For distributed computing, this sets the job queue system. Set
451
  to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
452
  "htc". If set to one of these, PySR will run in distributed
453
  mode, and use `procs` to figure out how many processes to launch.
454
+ Default is `None`.
455
+ batching : bool
456
  Whether to compare population members on small batches during
457
  evolution. Still uses full dataset for comparing against hall
458
+ of fame. Default is `False`.
459
+ batch_size : int
460
+ The amount of data to use if doing batching. Default is `50`.
461
+ fast_cycle : bool
462
  Batch over population subsamples. This is a slightly different
463
  algorithm than regularized evolution, but does cycles 15%
464
  faster. May be algorithmically less efficient.
465
+ Default is `False`.
466
+ precision : int
467
+ What precision to use for the data. By default this is `32`
468
+ (float32), but you can select `64` or `16` as well, giving
469
+ you 64 or 16 bits of floating point precision, respectively.
470
+ Default is `32`.
471
+ random_state : int, Numpy RandomState instance or None
472
  Pass an int for reproducible results across multiple function calls.
473
  See :term:`Glossary <random_state>`.
474
+ Default is `None`.
475
+ deterministic : bool
476
  Make a PySR search give the same result every run.
477
  To use this, you must turn off parallelism
478
  (with `procs`=0, `multithreading`=False),
479
  and set `random_state` to a fixed seed.
480
+ Default is `False`.
481
+ warm_start : bool
482
  Tells fit to continue from where the last call to fit finished.
483
  If false, each call to fit will be fresh, overwriting previous results.
484
+ Default is `False`.
485
+ verbosity : int
486
  What verbosity level to use. 0 means minimal print statements.
487
+ Default is `1e9`.
488
+ update_verbosity : int
489
  What verbosity level to use for package updates.
490
  Will take value of `verbosity` if not given.
491
+ Default is `None`.
492
+ progress : bool
493
  Whether to use a progress bar instead of printing to stdout.
494
+ Default is `True`.
495
+ equation_file : str
496
  Where to save the files (.csv extension).
497
+ Default is `None`.
498
+ temp_equation_file : bool
499
  Whether to put the hall of fame file in the temp directory.
500
  Deletion is then controlled with the `delete_tempfiles`
501
  parameter.
502
+ Default is `False`.
503
+ tempdir : str
504
+ directory for the temporary files. Default is `None`.
505
+ delete_tempfiles : bool
506
  Whether to delete the temporary files after finishing.
507
+ Default is `True`.
508
+ julia_project : str
509
  A Julia environment location containing a Project.toml
510
  (and potentially the source code for SymbolicRegression.jl).
511
  Default gives the Python package directory, where a
512
  Project.toml file should be present from the install.
513
+ update: bool
514
  Whether to automatically update Julia packages.
515
+ Default is `True`.
516
+ output_jax_format : bool
517
  Whether to create a 'jax_format' column in the output,
518
  containing jax-callable functions and the default parameters in
519
  a jax array.
520
+ Default is `False`.
521
+ output_torch_format : bool
522
  Whether to create a 'torch_format' column in the output,
523
  containing a torch module with trainable parameters.
524
+ Default is `False`.
525
+ extra_sympy_mappings : dict[str, Callable]
526
  Provides mappings between custom `binary_operators` or
527
  `unary_operators` defined in julia strings, to those same
528
  operators defined in sympy.
529
  E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
530
  model to be export to sympy, `extra_sympy_mappings`
531
  would be `{"inv": lambda x: 1/x}`.
532
+ Default is `None`.
533
+ extra_jax_mappings : dict[Callable, str]
534
  Similar to `extra_sympy_mappings` but for model export
535
  to jax. The dictionary maps sympy functions to jax functions.
536
  For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
537
  the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
538
+ Default is `None`.
539
+ extra_torch_mappings : dict[Callable, Callable]
540
  The same as `extra_jax_mappings` but for model export
541
  to pytorch. Note that the dictionary keys should be callable
542
  pytorch expressions.
543
+ For example: `extra_torch_mappings={sympy.sin: torch.sin}`.
544
+ Default is `None`.
545
+ denoise : bool
546
  Whether to use a Gaussian Process to denoise the data before
547
  inputting to PySR. Can help PySR fit noisy data.
548
+ Default is `False`.
549
+ select_k_features : int
550
  whether to run feature selection in Python using random forests,
551
  before passing to the symbolic regression code. None means no
552
  feature selection; an int means select that many features.
553
+ Default is `None`.
554
+ **kwargs : dict
555
  Supports deprecated keyword arguments. Other arguments will
556
  result in an error.
 
557
  Attributes
558
  ----------
559
  equations_ : pandas.DataFrame | list[pandas.DataFrame]
 
850
  selection_mask : list[bool]
851
  If using select_k_features, you must pass `model.selection_mask_` here.
852
  Not needed if loading from a pickle file.
853
+ nout : int
854
  Number of outputs of the model.
855
  Not needed if loading from a pickle file.
856
+ Default is `1`.
857
  **pysr_kwargs : dict
858
  Any other keyword arguments to initialize the PySRRegressor object.
859
  These will overwrite those stored in the pickle file.
 
1057
 
1058
  Parameters
1059
  ----------
1060
+ index : int | list[int]
1061
  If you wish to select a particular equation from `self.equations_`,
1062
  give the row number here. This overrides the `model_selection`
1063
  parameter. If there are multiple output features, then pass
 
1229
  y : ndarray | pandas.DataFrame}
1230
  Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
1231
  Will be cast to `X`'s dtype if necessary.
1232
+ Xresampled : ndarray | pandas.DataFrame
1233
+ Resampled training data used for denoising,
1234
+ of shape `(n_resampled, n_features)`.
1235
  weights : ndarray | pandas.DataFrame
1236
  Weight array of the same shape as `y`.
1237
  Each element is how to weight the mean-square-error loss
 
1310
  y : ndarray | pandas.DataFrame
1311
  Target values of shape (n_samples,) or (n_samples, n_targets).
1312
  Will be cast to X's dtype if necessary.
1313
+ Xresampled : ndarray | pandas.DataFrame
1314
  Resampled training data, of shape `(n_resampled, n_features)`,
1315
  used for denoising.
1316
  variable_names : list[str]
1317
  Names of each variable in the training dataset, `X`.
1318
  Of length `n_features`.
1319
+ random_state : int | np.RandomState
1320
  Pass an int for reproducible results across multiple function calls.
1321
+ See :term:`Glossary <random_state>`. Default is `None`.
1322
 
1323
  Returns
1324
  -------
 
1636
  y : ndarray | pandas.DataFrame
1637
  Target values of shape (n_samples,) or (n_samples, n_targets).
1638
  Will be cast to X's dtype if necessary.
1639
+ Xresampled : ndarray | pandas.DataFrame
1640
  Resampled training data, of shape (n_resampled, n_features),
1641
  to generate a denoised data on. This
1642
  will be used as the training data, rather than `X`.
1643
+ weights : ndarray | pandas.DataFrame
1644
  Weight array of the same shape as `y`.
1645
  Each element is how to weight the mean-square-error loss
1646
  for that particular element of `y`. Alternatively,
1647
  if a custom `loss` was set, it will can be used
1648
  in arbitrary ways.
1649
+ variable_names : list[str]
1650
  A list of names for the variables, rather than "x0", "x1", etc.
1651
  If `X` is a pandas dataframe, the column names will be used
1652
  instead of `variable_names`. Cannot contain spaces or special
 
1753
 
1754
  Parameters
1755
  ----------
1756
+ checkpoint_file : str
1757
  Path to checkpoint hall of fame file to be loaded.
1758
+ The default will use the set `equation_file_`.
1759
  """
1760
  if checkpoint_file:
1761
  self.equation_file_ = checkpoint_file
 
1775
  X : ndarray | pandas.DataFrame
1776
  Training data of shape `(n_samples, n_features)`.
1777
 
1778
+ index : int | list[int]
1779
  If you want to compute the output of an expression using a
1780
  particular row of `self.equations_`, you may specify the index here.
1781
  For multiple output equations, you must pass a list of indices
 
1843
 
1844
  Parameters
1845
  ----------
1846
+ index : int | list[int]
1847
  If you wish to select a particular equation from
1848
  `self.equations_`, give the index number here. This overrides
1849
  the `model_selection` parameter. If there are multiple output
 
1867
 
1868
  Parameters
1869
  ----------
1870
+ index : int | list[int]
1871
  If you wish to select a particular equation from
1872
  `self.equations_`, give the index number here. This overrides
1873
  the `model_selection` parameter. If there are multiple output
1874
  features, then pass a list of indices with the order the same
1875
  as the output feature.
1876
+ precision : int
1877
  The number of significant figures shown in the LaTeX
1878
  representation.
1879
+ Default is `3`.
1880
 
1881
  Returns
1882
  -------
 
1903
 
1904
  Parameters
1905
  ----------
1906
+ index : int | list[int]
1907
  If you wish to select a particular equation from
1908
  `self.equations_`, give the index number here. This overrides
1909
  the `model_selection` parameter. If there are multiple output
 
1934
 
1935
  Parameters
1936
  ----------
1937
+ index : int | list[int]
1938
  If you wish to select a particular equation from
1939
  `self.equations_`, give the index number here. This overrides
1940
  the `model_selection` parameter. If there are multiple output
 
2154
 
2155
  Parameters
2156
  ----------
2157
+ indices : list[int] | list[list[int]]
2158
  If you wish to select a particular subset of equations from
2159
  `self.equations_`, give the row numbers here. By default,
2160
  all equations will be used. If there are multiple output
2161
  features, then pass a list of lists.
2162
+ precision : int
2163
  The number of significant figures shown in the LaTeX
2164
  representations.
2165
+ Default is `3`.
2166
+ columns : list[str]
2167
  Which columns to include in the table.
2168
+ Default is `["equation", "complexity", "loss", "score"]`.
2169
 
2170
  Returns
2171
  -------