Spaces:

MilesCranmer
/

PySR

Running

App Files Files Community

MilesCranmer commited on Sep 3, 2022

Commit

d90120f

1 Parent(s): a1e153e

Fix formatting warnings

Browse files

Files changed (5) hide show

docs/api-advanced.md +2 -1
docs/api.md +1 -1
docs/examples.md +21 -7
docs/generate_papers.py +1 -0
docs/options.md +27 -8

docs/api-advanced.md CHANGED Viewed

@@ -1,6 +1,7 @@
 # Internal Reference
 ## Julia Interface
 ::: pysr.julia_helpers
     options:
         members:
@@ -34,4 +35,4 @@
     options:
         members:
             - sympy2torch
-        heading_level: 3

 # Internal Reference
 ## Julia Interface
 ::: pysr.julia_helpers
     options:
         members:
     options:
         members:
             - sympy2torch
+        heading_level: 3

docs/api.md CHANGED Viewed

@@ -13,4 +13,4 @@
             - latex_table
             - refresh
         show_root_members_full_path: true
-        heading_level: 2

             - latex_table
             - refresh
         show_root_members_full_path: true
+        heading_level: 2

docs/examples.md CHANGED Viewed

@@ -1,16 +1,15 @@
 # Toy Examples with Code
-### Preamble
 ```python
 import numpy as np
 from pysr import *
 ```
 ## 1. Simple search
-Here's a simple example where we
 find the expression `2 cos(x3) + x0^2 - 2`.
 ```python
@@ -40,6 +39,7 @@ print(model)
 Here, we do the same thing, but with multiple expressions at once,
 each requiring a different feature.
 ```python
 X = 2 * np.random.randn(100, 5)
 y = 1 / X[:, [0, 1, 2]]
@@ -60,22 +60,26 @@ function:
 model.set_params(extra_sympy_mappings={"inv": lambda x: 1/x})
 model.sympy()
 ```
 If you look at the lists of expressions before and after, you will
 see that the sympy format now has replaced `inv` with `1/`.
 We can again look at the equation chosen:
 ```python
 print(model)
 ```
 For now, let's consider the expressions for output 0.
 We can see the LaTeX version of this with:
 ```python
 model.latex()[0]
 ```
-or output 1 with `model.latex()[1]`.
 Let's plot the prediction against the truth:
 ```python
 from matplotlib import pyplot as plt
 plt.scatter(y[:, 0], model(X)[:, 0])
@@ -83,9 +87,10 @@ plt.xlabel('Truth')
 plt.ylabel('Prediction')
 plt.show()
 ```
 Which gives us:
-![](https://github.com/MilesCranmer/PySR/raw/master/docs/images/example_plot.png)
 ## 5. Feature selection
@@ -104,12 +109,14 @@ the most important 5 features.
 Here is an example. Let's say we have 30 input features and 300 data points, but only 2
 of those features are actually used:
 ```python
 X = np.random.randn(300, 30)
 y = X[:, 3]**2 - X[:, 19]**2 + 1.5
 ```
 Let's create a model with the feature selection argument set up:
 ```python
 model = PySRRegressor(
     binary_operators=["+", "-", "*", "/"],
@@ -117,15 +124,19 @@ model = PySRRegressor(
     select_k_features=5,
 )
 ```
 Now let's fit this:
 ```python
 model.fit(X, y)
 ```
 Before the Julia backend is launched, you can see the string:
-```
 Using features ['x3', 'x5', 'x7', 'x19', 'x21']
 ```
 which indicates that the feature selection (powered by a gradient-boosting tree)
 has successfully selected the relevant two features.
@@ -152,6 +163,7 @@ set the parameter `denoise=True`. This will fit a Gaussian process (containing a
 to the input dataset, and predict new targets (which are assumed to be denoised) from that Gaussian process.
 For example:
 ```python
 X = np.random.randn(100, 5)
 noise = np.random.randn(100) * 0.1
@@ -159,6 +171,7 @@ y = np.exp(X[:, 0]) + X[:, 1] + X[:, 2] + noise
 ```
 Let's create and fit a model with the denoising argument set up:
 ```python
 model = PySRRegressor(
     binary_operators=["+", "-", "*", "/"],
@@ -168,9 +181,10 @@ model = PySRRegressor(
 model.fit(X, y)
 print(model)
 ```
 If all goes well, you should find that it predicts the correct input equation, without the noise term!
 ## 7. Additional features
 For the many other features available in PySR, please
-read the [Options section](options.md).

 # Toy Examples with Code
+## Preamble
 ```python
 import numpy as np
 from pysr import *
 ```
 ## 1. Simple search
+Here's a simple example where we
 find the expression `2 cos(x3) + x0^2 - 2`.
 ```python
 Here, we do the same thing, but with multiple expressions at once,
 each requiring a different feature.
 ```python
 X = 2 * np.random.randn(100, 5)
 y = 1 / X[:, [0, 1, 2]]
 model.set_params(extra_sympy_mappings={"inv": lambda x: 1/x})
 model.sympy()
 ```
 If you look at the lists of expressions before and after, you will
 see that the sympy format now has replaced `inv` with `1/`.
 We can again look at the equation chosen:
 ```python
 print(model)
 ```
 For now, let's consider the expressions for output 0.
 We can see the LaTeX version of this with:
 ```python
 model.latex()[0]
 ```
+or output 1 with `model.latex()[1]`.
 Let's plot the prediction against the truth:
 ```python
 from matplotlib import pyplot as plt
 plt.scatter(y[:, 0], model(X)[:, 0])
 plt.ylabel('Prediction')
 plt.show()
 ```
 Which gives us:
+![Truth vs Prediction](images/example_plot.png)
 ## 5. Feature selection
 Here is an example. Let's say we have 30 input features and 300 data points, but only 2
 of those features are actually used:
 ```python
 X = np.random.randn(300, 30)
 y = X[:, 3]**2 - X[:, 19]**2 + 1.5
 ```
 Let's create a model with the feature selection argument set up:
 ```python
 model = PySRRegressor(
     binary_operators=["+", "-", "*", "/"],
     select_k_features=5,
 )
 ```
 Now let's fit this:
 ```python
 model.fit(X, y)
 ```
 Before the Julia backend is launched, you can see the string:
+```text
 Using features ['x3', 'x5', 'x7', 'x19', 'x21']
 ```
 which indicates that the feature selection (powered by a gradient-boosting tree)
 has successfully selected the relevant two features.
 to the input dataset, and predict new targets (which are assumed to be denoised) from that Gaussian process.
 For example:
 ```python
 X = np.random.randn(100, 5)
 noise = np.random.randn(100) * 0.1
 ```
 Let's create and fit a model with the denoising argument set up:
 ```python
 model = PySRRegressor(
     binary_operators=["+", "-", "*", "/"],
 model.fit(X, y)
 print(model)
 ```
 If all goes well, you should find that it predicts the correct input equation, without the noise term!
 ## 7. Additional features
 For the many other features available in PySR, please
+read the [Options section](options.md).

docs/generate_papers.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import yaml
 from pathlib import Path

+"""This script generates the papers.md file from the papers.yml file."""
 import yaml
 from pathlib import Path

docs/options.md CHANGED Viewed

@@ -43,8 +43,9 @@ the equation selection with the arrow shown in the `pick` column.
 ## Operators
-A list of operators can be found on the operators page.
 One can define custom operators in Julia by passing a string:
 ```python
 PySRRegressor(niterations=100,
     binary_operators=["mult", "plus", "special(x, y) = x^2 + y"],
@@ -107,6 +108,7 @@ on each core.
 Here, we assign weights to each row of data
 using inverse uncertainty squared. We also use 10 processes for the search
 instead of the default.
 ```python
 sigma = ...
 weights = 1/sigma**2
@@ -126,8 +128,8 @@ One can warm up the maxsize from a small number to encourage
 PySR to start simple, by using the `warmupMaxsize` argument.
 This specifies that maxsize increases every `warmupMaxsize`.
 ## Batching
 One can turn on mini-batching, with the `batching` flag,
 and control the batch size with `batch_size`. This will make
 evolution faster for large datasets. Equations are still evaluated
@@ -151,11 +153,11 @@ There is a "maxsize" parameter to PySR, but there is also an operator-level
 constraints={'pow': (-1, 1), 'mult': (3, 3), 'cos': 5}
 ```
-What this says is that: a power law x^y can have an expression of arbitrary (-1) complexity in the x, but only complexity 1 (e.g., a constant or variable) in the y. So (x0 + 3)^5.5 is allowed, but 5.5^(x0 + 3) is not.
 I find this helps a lot for getting more interpretable equations.
 The other terms say that each multiplication can only have sub-expressions
-of up to complexity 3 (e.g., 5.0 + x2) in each side, and cosine can only operate on
-expressions of complexity 5 (e.g., 5.0 + x2 exp(x3)).
 ## Custom complexity
@@ -182,12 +184,12 @@ You can optionally pass a pandas dataframe to the callable function,
 if you called `.fit` on a pandas dataframe as well.
 There are also some helper functions for doing this quickly.
 - `model.latex()` will generate a TeX formatted output of your equation.
 - `model.sympy()` will return the SymPy representation.
 - `model.jax()` will return a callable JAX function combined with parameters (see below)
 - `model.pytorch()` will return a PyTorch model (see below).
 ## Exporting to numpy, pytorch, and jax
 By default, the dataframe of equations will contain columns
@@ -214,21 +216,25 @@ a PyTorch module which runs the equation, using PyTorch functions,
 over `X` (as a PyTorch tensor). This is differentiable, and the
 parameters of this PyTorch module correspond to the learned parameters
 in the equation, and are trainable.
 ```python
 torch_model = model.pytorch()
 torch_model(X)
 ```
 **Warning: If you are using custom operators, you must define `extra_torch_mappings` or `extra_jax_mappings` (both are `dict` of callables) to provide an equivalent definition of the functions.** (At any time you can set these parameters or any others with `model.set_params`.)
 For JAX, you can equivalently call `model.jax()`
 This will return a dictionary containing a `'callable'` (a JAX function),
 and `'parameters'` (a list of parameters in the equation).
 You can execute this function with:
 ```python
 jax_model = model.jax()
 jax_model['callable'](X, jax_model['parameters'])
 ```
-Since the parameter list is a jax array, this therefore lets you also
 train the parameters within JAX (and is differentiable).
 ## `loss`
@@ -243,29 +249,40 @@ page for SymbolicRegression.jl.
 Here are some additional examples:
 abs(x-y) loss
 ```python
 PySRRegressor(..., loss="f(x, y) = abs(x - y)^1.5")
 ```
 Note that the function name doesn't matter:
 ```python
 PySRRegressor(..., loss="loss(x, y) = abs(x * y)")
 ```
 With weights:
 ```python
 model = PySRRegressor(..., loss="myloss(x, y, w) = w * abs(x - y)")
 model.fit(..., weights=weights)
 ```
 Weights can be used in arbitrary ways:
 ```python
 model = PySRRegressor(..., weights=weights, loss="myloss(x, y, w) = abs(x - y)^2/w^2")
 model.fit(..., weights=weights)
 ```
 Built-in loss (faster) (see [losses](https://astroautomata.com/SymbolicRegression.jl/dev/losses/)).
 This one computes the L3 norm:
 ```python
 PySRRegressor(..., loss="LPDistLoss{3}()")
 ```
 Can also uses these losses for weighted (weighted-average):
 ```python
 model = PySRRegressor(..., weights=weights, loss="LPDistLoss{3}()")
 model.fit(..., weights=weights)
@@ -278,12 +295,14 @@ when you call `model.fit`, once before the search starts,
 and again after the search finishes. The filename will
 have the same base name as the input file, but with a `.pkl` extension.
 You can load the saved model state with:
 ```python
 model = PySRRegressor.from_file(pickle_filename)
 ```
 If you have a long-running job and would like to load the model
 before completion, you can also do this. In this case, the model
 loading will use the `csv` file to load the equations, since the
 `csv` file is continually updated during the search. Once
 the search completes, the model including its equations will
-be saved to the pickle file, overwriting the existing version.

 ## Operators
+A list of operators can be found on the [operators page](operators.md).
 One can define custom operators in Julia by passing a string:
 ```python
 PySRRegressor(niterations=100,
     binary_operators=["mult", "plus", "special(x, y) = x^2 + y"],
 Here, we assign weights to each row of data
 using inverse uncertainty squared. We also use 10 processes for the search
 instead of the default.
 ```python
 sigma = ...
 weights = 1/sigma**2
 PySR to start simple, by using the `warmupMaxsize` argument.
 This specifies that maxsize increases every `warmupMaxsize`.
 ## Batching
 One can turn on mini-batching, with the `batching` flag,
 and control the batch size with `batch_size`. This will make
 evolution faster for large datasets. Equations are still evaluated
 constraints={'pow': (-1, 1), 'mult': (3, 3), 'cos': 5}
 ```
+What this says is that: a power law $x^y$ can have an expression of arbitrary (-1) complexity in the x, but only complexity 1 (e.g., a constant or variable) in the y. So $(x_0 + 3)^{5.5}$ is allowed, but $5.5^{x_0 + 3}$ is not.
 I find this helps a lot for getting more interpretable equations.
 The other terms say that each multiplication can only have sub-expressions
+of up to complexity 3 (e.g., $5.0 + x_2$) in each side, and cosine can only operate on
+expressions of complexity 5 (e.g., $5.0 + x_2 exp(x_3)$).
 ## Custom complexity
 if you called `.fit` on a pandas dataframe as well.
 There are also some helper functions for doing this quickly.
 - `model.latex()` will generate a TeX formatted output of your equation.
 - `model.sympy()` will return the SymPy representation.
 - `model.jax()` will return a callable JAX function combined with parameters (see below)
 - `model.pytorch()` will return a PyTorch model (see below).
 ## Exporting to numpy, pytorch, and jax
 By default, the dataframe of equations will contain columns
 over `X` (as a PyTorch tensor). This is differentiable, and the
 parameters of this PyTorch module correspond to the learned parameters
 in the equation, and are trainable.
 ```python
 torch_model = model.pytorch()
 torch_model(X)
 ```
 **Warning: If you are using custom operators, you must define `extra_torch_mappings` or `extra_jax_mappings` (both are `dict` of callables) to provide an equivalent definition of the functions.** (At any time you can set these parameters or any others with `model.set_params`.)
 For JAX, you can equivalently call `model.jax()`
 This will return a dictionary containing a `'callable'` (a JAX function),
 and `'parameters'` (a list of parameters in the equation).
 You can execute this function with:
 ```python
 jax_model = model.jax()
 jax_model['callable'](X, jax_model['parameters'])
 ```
+Since the parameter list is a jax array, this therefore lets you also
 train the parameters within JAX (and is differentiable).
 ## `loss`
 Here are some additional examples:
 abs(x-y) loss
 ```python
 PySRRegressor(..., loss="f(x, y) = abs(x - y)^1.5")
 ```
 Note that the function name doesn't matter:
 ```python
 PySRRegressor(..., loss="loss(x, y) = abs(x * y)")
 ```
 With weights:
 ```python
 model = PySRRegressor(..., loss="myloss(x, y, w) = w * abs(x - y)")
 model.fit(..., weights=weights)
 ```
 Weights can be used in arbitrary ways:
 ```python
 model = PySRRegressor(..., weights=weights, loss="myloss(x, y, w) = abs(x - y)^2/w^2")
 model.fit(..., weights=weights)
 ```
 Built-in loss (faster) (see [losses](https://astroautomata.com/SymbolicRegression.jl/dev/losses/)).
 This one computes the L3 norm:
 ```python
 PySRRegressor(..., loss="LPDistLoss{3}()")
 ```
 Can also uses these losses for weighted (weighted-average):
 ```python
 model = PySRRegressor(..., weights=weights, loss="LPDistLoss{3}()")
 model.fit(..., weights=weights)
 and again after the search finishes. The filename will
 have the same base name as the input file, but with a `.pkl` extension.
 You can load the saved model state with:
 ```python
 model = PySRRegressor.from_file(pickle_filename)
 ```
 If you have a long-running job and would like to load the model
 before completion, you can also do this. In this case, the model
 loading will use the `csv` file to load the equations, since the
 `csv` file is continually updated during the search. Once
 the search completes, the model including its equations will
+be saved to the pickle file, overwriting the existing version.