Spaces:

MilesCranmer
/

PySR

Running

App Files Files Community

MilesCranmer commited on Jun 1, 2021

Commit

4584576

1 Parent(s): 147e8d5

Fix doc formatting

Browse files

Files changed (1) hide show

pysr/sr.py +60 -60

pysr/sr.py CHANGED Viewed

@@ -131,118 +131,118 @@ def pysr(X, y, weights=None,
     `binary_operators`, `unary_operators` to your requirements.
     :param X: np.ndarray or pandas.DataFrame, 2D array. Rows are examples,
-              columns are features. If pandas DataFrame, the columns are used
-              for variable names (so make sure they don't contain spaces).
     :param y: np.ndarray, 1D array (rows are examples) or 2D array (rows
-              are examples, columns are outputs). Putting in a 2D array will
-              trigger a search for equations for each feature of y.
     :param weights: np.ndarray, same shape as y. Each element is how to
-              weight the mean-square-error loss for that particular element
-              of y.
     :param binary_operators: list, List of strings giving the binary operators
-              in Julia's Base. Default is ["+", "-", "*", "/",].
     :param unary_operators: list, Same but for operators taking a single scalar.
-              Default is [].
     :param procs: int, Number of processes (=number of populations running).
     :param loss: str, String of Julia code specifying the loss function.
-              Can either be a loss from LossFunctions.jl, or your own
-              loss written as a function. Examples of custom written losses
-              include: `myloss(x, y) = abs(x-y)` for non-weighted, or
-              `myloss(x, y, w) = w*abs(x-y)` for weighted.
-              Among the included losses, these are as follows. Regression:
-              `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
-              `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`,
-              `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.
-              Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
-              `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
-              `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
     :param populations: int, Number of populations running.
     :param niterations: int, Number of iterations of the algorithm to run. The best
-              equations are printed, and migrate between populations, at the
-              end of each.
     :param ncyclesperiteration: int, Number of total mutations to run, per 10
-              samples of the population, per iteration.
     :param alpha: float, Initial temperature.
     :param annealing: bool, Whether to use annealing. You should (and it is default).
     :param fractionReplaced: float, How much of population to replace with migrating
-              equations from other populations.
     :param fractionReplacedHof: float, How much of population to replace with migrating
-              equations from hall of fame.
     :param npop: int, Number of individuals in each population
     :param parsimony: float, Multiplicative factor for how much to punish complexity.
     :param migration: bool, Whether to migrate.
     :param hofMigration: bool, Whether to have the hall of fame migrate.
     :param shouldOptimizeConstants: bool, Whether to numerically optimize
-              constants (Nelder-Mead/Newton) at the end of each iteration.
     :param topn: int, How many top individuals migrate from each population.
     :param perturbationFactor: float, Constants are perturbed by a max
-              factor of (perturbationFactor*T + 1). Either multiplied by this
-              or divided by this.
     :param weightAddNode: float, Relative likelihood for mutation to add a node
     :param weightInsertNode: float, Relative likelihood for mutation to insert a node
     :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
     :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
     :param weightMutateConstant: float, Relative likelihood for mutation to change
-              the constant slightly in a random direction.
     :param weightMutateOperator: float, Relative likelihood for mutation to swap
-              an operator.
     :param weightRandomize: float, Relative likelihood for mutation to completely
-              delete and then randomly generate the equation
     :param weightSimplify: float, Relative likelihood for mutation to simplify
-              constant parts by evaluation
     :param timeout: float, Time in seconds to timeout search
     :param equation_file: str, Where to save the files (.csv separated by |)
     :param verbosity: int, What verbosity level to use. 0 means minimal print statements.
     :param progress: bool, Whether to use a progress bar instead of printing to stdout.
     :param maxsize: int, Max size of an equation.
     :param maxdepth: int, Max depth of an equation. You can use both maxsize and maxdepth.
-              maxdepth is by default set to = maxsize, which means that it is redundant.
     :param fast_cycle: bool, (experimental) - batch over population subsamples. This
-              is a slightly different algorithm than regularized evolution, but does cycles
-              15% faster. May be algorithmically less efficient.
     :param variable_names: list, a list of names for the variables, other
-              than "x0", "x1", etc.
     :param batching: bool, whether to compare population members on small batches
-              during evolution. Still uses full dataset for comparing against
-              hall of fame.
     :param batchSize: int, the amount of data to use if doing batching.
     :param select_k_features: (None, int), whether to run feature selection in
-              Python using random forests, before passing to the symbolic regression
-              code. None means no feature selection; an int means select that many
-              features.
     :param warmupMaxsizeBy: float, whether to slowly increase max size from
-              a small number up to the maxsize (if greater than 0).
-              If greater than 0, says the fraction of training time at which
-              the current maxsize will reach the user-passed maxsize.
     :param constraints: dict of int (unary) or 2-tuples (binary),
-              this enforces maxsize constraints on the individual
-              arguments of operators. E.g., `'pow': (-1, 1)`
-              says that power laws can have any complexity left argument, but only
-              1 complexity exponent. Use this to force more interpretable solutions.
     :param useFrequency: bool, whether to measure the frequency of complexities,
-              and use that instead of parsimony to explore equation space. Will
-              naturally find equations of all complexities.
     :param julia_optimization: int, Optimization level (0, 1, 2, 3)
     :param tempdir: str or None, directory for the temporary files
     :param delete_tempfiles: bool, whether to delete the temporary files after finishing
     :param julia_project: str or None, a Julia environment location containing
-              a Project.toml (and potentially the source code for SymbolicRegression.jl).
-              Default gives the Python package directory, where a Project.toml file
-              should be present from the install.
     :param user_input: Whether to ask for user input or not for installing (to
-              be used for automated scripts). Will choose to install when asked.
     :param update: Whether to automatically update Julia packages.
     :param temp_equation_file: Whether to put the hall of fame file in
-              the temp directory. Deletion is then controlled with the
-              delete_tempfiles argument.
     :param output_jax_format: Whether to create a 'jax_format' column in the output,
-              containing jax-callable functions and the default parameters in a jax array.
     :param output_torch_format: Whether to create a 'torch_format' column in the output,
-              containing a torch module with trainable parameters.
     :returns: pd.DataFrame or list, Results dataframe,
-              giving complexity, MSE, and equations (as strings), as well as functional
-              forms. If list, each element corresponds to a dataframe of equations
-              for each output.
     """
     if binary_operators is None:
         binary_operators = '+ * - /'.split(' ')

     `binary_operators`, `unary_operators` to your requirements.
     :param X: np.ndarray or pandas.DataFrame, 2D array. Rows are examples,
+    columns are features. If pandas DataFrame, the columns are used
+    for variable names (so make sure they don't contain spaces).
     :param y: np.ndarray, 1D array (rows are examples) or 2D array (rows
+    are examples, columns are outputs). Putting in a 2D array will
+    trigger a search for equations for each feature of y.
     :param weights: np.ndarray, same shape as y. Each element is how to
+    weight the mean-square-error loss for that particular element
+    of y.
     :param binary_operators: list, List of strings giving the binary operators
+    in Julia's Base. Default is ["+", "-", "*", "/",].
     :param unary_operators: list, Same but for operators taking a single scalar.
+    Default is [].
     :param procs: int, Number of processes (=number of populations running).
     :param loss: str, String of Julia code specifying the loss function.
+    Can either be a loss from LossFunctions.jl, or your own
+    loss written as a function. Examples of custom written losses
+    include: `myloss(x, y) = abs(x-y)` for non-weighted, or
+    `myloss(x, y, w) = w*abs(x-y)` for weighted.
+    Among the included losses, these are as follows. Regression:
+    `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
+    `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`,
+    `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.
+    Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
+    `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
+    `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
     :param populations: int, Number of populations running.
     :param niterations: int, Number of iterations of the algorithm to run. The best
+    equations are printed, and migrate between populations, at the
+    end of each.
     :param ncyclesperiteration: int, Number of total mutations to run, per 10
+    samples of the population, per iteration.
     :param alpha: float, Initial temperature.
     :param annealing: bool, Whether to use annealing. You should (and it is default).
     :param fractionReplaced: float, How much of population to replace with migrating
+    equations from other populations.
     :param fractionReplacedHof: float, How much of population to replace with migrating
+    equations from hall of fame.
     :param npop: int, Number of individuals in each population
     :param parsimony: float, Multiplicative factor for how much to punish complexity.
     :param migration: bool, Whether to migrate.
     :param hofMigration: bool, Whether to have the hall of fame migrate.
     :param shouldOptimizeConstants: bool, Whether to numerically optimize
+    constants (Nelder-Mead/Newton) at the end of each iteration.
     :param topn: int, How many top individuals migrate from each population.
     :param perturbationFactor: float, Constants are perturbed by a max
+    factor of (perturbationFactor*T + 1). Either multiplied by this
+    or divided by this.
     :param weightAddNode: float, Relative likelihood for mutation to add a node
     :param weightInsertNode: float, Relative likelihood for mutation to insert a node
     :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
     :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
     :param weightMutateConstant: float, Relative likelihood for mutation to change
+    the constant slightly in a random direction.
     :param weightMutateOperator: float, Relative likelihood for mutation to swap
+    an operator.
     :param weightRandomize: float, Relative likelihood for mutation to completely
+    delete and then randomly generate the equation
     :param weightSimplify: float, Relative likelihood for mutation to simplify
+    constant parts by evaluation
     :param timeout: float, Time in seconds to timeout search
     :param equation_file: str, Where to save the files (.csv separated by |)
     :param verbosity: int, What verbosity level to use. 0 means minimal print statements.
     :param progress: bool, Whether to use a progress bar instead of printing to stdout.
     :param maxsize: int, Max size of an equation.
     :param maxdepth: int, Max depth of an equation. You can use both maxsize and maxdepth.
+    maxdepth is by default set to = maxsize, which means that it is redundant.
     :param fast_cycle: bool, (experimental) - batch over population subsamples. This
+    is a slightly different algorithm than regularized evolution, but does cycles
+    15% faster. May be algorithmically less efficient.
     :param variable_names: list, a list of names for the variables, other
+    than "x0", "x1", etc.
     :param batching: bool, whether to compare population members on small batches
+    during evolution. Still uses full dataset for comparing against
+    hall of fame.
     :param batchSize: int, the amount of data to use if doing batching.
     :param select_k_features: (None, int), whether to run feature selection in
+    Python using random forests, before passing to the symbolic regression
+    code. None means no feature selection; an int means select that many
+    features.
     :param warmupMaxsizeBy: float, whether to slowly increase max size from
+    a small number up to the maxsize (if greater than 0).
+    If greater than 0, says the fraction of training time at which
+    the current maxsize will reach the user-passed maxsize.
     :param constraints: dict of int (unary) or 2-tuples (binary),
+    this enforces maxsize constraints on the individual
+    arguments of operators. E.g., `'pow': (-1, 1)`
+    says that power laws can have any complexity left argument, but only
+    1 complexity exponent. Use this to force more interpretable solutions.
     :param useFrequency: bool, whether to measure the frequency of complexities,
+    and use that instead of parsimony to explore equation space. Will
+    naturally find equations of all complexities.
     :param julia_optimization: int, Optimization level (0, 1, 2, 3)
     :param tempdir: str or None, directory for the temporary files
     :param delete_tempfiles: bool, whether to delete the temporary files after finishing
     :param julia_project: str or None, a Julia environment location containing
+    a Project.toml (and potentially the source code for SymbolicRegression.jl).
+    Default gives the Python package directory, where a Project.toml file
+    should be present from the install.
     :param user_input: Whether to ask for user input or not for installing (to
+    be used for automated scripts). Will choose to install when asked.
     :param update: Whether to automatically update Julia packages.
     :param temp_equation_file: Whether to put the hall of fame file in
+    the temp directory. Deletion is then controlled with the
+    delete_tempfiles argument.
     :param output_jax_format: Whether to create a 'jax_format' column in the output,
+    containing jax-callable functions and the default parameters in a jax array.
     :param output_torch_format: Whether to create a 'torch_format' column in the output,
+    containing a torch module with trainable parameters.
     :returns: pd.DataFrame or list, Results dataframe,
+    giving complexity, MSE, and equations (as strings), as well as functional
+    forms. If list, each element corresponds to a dataframe of equations
+    for each output.
     """
     if binary_operators is None:
         binary_operators = '+ * - /'.split(' ')