MilesCranmer commited on
Commit
fadf4db
1 Parent(s): 0fba777

Update README to use scikit-learn API

Browse files
Files changed (2) hide show
  1. README.md +65 -42
  2. example.py +10 -12
README.md CHANGED
@@ -73,71 +73,94 @@ Most common issues at this stage are solved
73
  by [tweaking the Julia package server](https://github.com/MilesCranmer/PySR/issues/27).
74
  to use up-to-date packages.
75
 
76
- ## Docker
77
-
78
- You can also test out PySR in Docker, without
79
- installing it locally, by running the following command in
80
- the root directory of this repo:
81
- ```bash
82
- docker build --pull --rm -f "Dockerfile" -t pysr "."
83
- ```
84
- This builds an image called `pysr`. You can then run this with:
85
- ```bash
86
- docker run -it --rm -v "$PWD:/data" pysr ipython
87
- ```
88
- which will link the current directory to the container's `/data` directory
89
- and then launch ipython.
90
-
91
  # Quickstart
92
 
93
- Here is some demo code (also found in `example.py`)
 
94
  ```python
95
  import numpy as np
96
- from pysr import pysr, best
97
 
98
- # Dataset
99
  X = 2 * np.random.randn(100, 5)
100
- y = 2 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 2
 
 
 
101
 
102
- # Learn equations
103
- equations = pysr(
104
- X,
105
- y,
 
106
  niterations=5,
 
107
  binary_operators=["+", "*"],
108
  unary_operators=[
109
  "cos",
110
  "exp",
111
- "sin", # Pre-defined library of operators (see docs)
112
- "inv(x) = 1/x", # Define your own operator! (Julia syntax)
113
  ],
 
114
  )
115
-
116
- ...# (you can use ctl-c to exit early)
117
-
118
- print(best(equations))
119
  ```
 
120
 
121
- which gives:
122
-
123
  ```python
124
- x0**2 + 2.000016*cos(x3) - 1.9999845
125
  ```
 
 
 
 
126
 
127
- The second and additional calls of `pysr` will be significantly
128
- faster in startup time, since the first call to Julia will compile
129
- and cache functions from the symbolic regression backend.
130
 
131
- One can also use `best_tex` to get the LaTeX form,
132
- or `best_callable` to get a function you can call.
133
- This uses a score which balances complexity and error;
134
- however, one can see the full list of equations with:
135
  ```python
136
- print(equations)
137
  ```
138
- This is a pandas table, with additional columns:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
- - `MSE` - the mean square error of the formula
141
  - `score` - a metric akin to Occam's razor; you should use this to help select the "true" equation.
142
  - `sympy_format` - sympy equation.
143
  - `lambda_format` - a lambda function for that equation, that you can pass values through.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  by [tweaking the Julia package server](https://github.com/MilesCranmer/PySR/issues/27).
74
  to use up-to-date packages.
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  # Quickstart
77
 
78
+ Let's create a PySR example. First, let's import
79
+ numpy to generate some test data:
80
  ```python
81
  import numpy as np
 
82
 
 
83
  X = 2 * np.random.randn(100, 5)
84
+ y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5
85
+ ```
86
+ We have created a dataset with 100 datapoints, with 5 features each.
87
+ The relation we wish to model is $2.5382 \cos(x_3) + x_0^2 - 0.5$.
88
 
89
+ Now, let's create a PySR model and train it.
90
+ PySR's main interface is in the style of scikit-learn:
91
+ ```python
92
+ from pysr import PySRRegressor
93
+ model = PySRRegressor(
94
  niterations=5,
95
+ populations=8,
96
  binary_operators=["+", "*"],
97
  unary_operators=[
98
  "cos",
99
  "exp",
100
+ "sin",
 
101
  ],
102
+ model_selection="best",
103
  )
 
 
 
 
104
  ```
105
+ This will set up the model for 5 iterations of the search code, which contains hundreds of thousands of mutations and equation evaluations.
106
 
107
+ Let's train this model on our dataset:
 
108
  ```python
109
+ model.fit(X, y)
110
  ```
111
+ Internally, this launches a Julia process which will do a multithreaded search for equations to fit the dataset.
112
+
113
+ Equations will be printed during training, and once you are satisfied, you may
114
+ quit early by hitting 'q' and then \<enter\>.
115
 
116
+ After the model has been fit, you can run `model.predict(X)`
117
+ to see the predictions on a given dataset.
 
118
 
119
+ You may run:
 
 
 
120
  ```python
121
+ print(model)
122
  ```
123
+ to print the learned equations, which for the above should be close to:
124
+ ```python
125
+ PySRRegressor.equations = [
126
+ pick score Equation MSE Complexity
127
+ 0 0.000000 3.598587 3.044337e+01 1
128
+ 1 1.074135 (x0 * x0) 3.552313e+00 3
129
+ 2 0.023611 (-0.40477127 + (x0 * x0)) 3.388464e+00 5
130
+ 3 0.855682 ((x0 * x0) + cos(x3)) 1.440074e+00 6
131
+ 4 0.876831 ((x0 * x0) + (2.5026207 * cos(x3))) 2.493328e-01 8
132
+ 5 >>>> 10.687394 ((-0.5000114 + (x0 * x0)) + (2.5382013 * cos(x... 1.299652e-10 10
133
+ 6 2.573098 ((-0.50000024 + (x0 * x0)) + (2.5382 * sin(1.5... 7.565937e-13 12
134
+ ]
135
+ ```
136
+ This arrow in the `pick` column indicates which equation is currently selected by your
137
+ `model_selection` strategy for prediction.
138
+ (You may change `model_selection` after `.fit(X, y)` as well.)
139
+
140
+ `model.equations` is a pandas DataFrame containing all equations, including callable format
141
+ (`lambda_format`),
142
+ SymPy format (`sympy_format`), and even JAX and PyTorch format
143
+ (both of which are differentiable).
144
+
145
+
146
+ ### Notes
147
 
 
148
  - `score` - a metric akin to Occam's razor; you should use this to help select the "true" equation.
149
  - `sympy_format` - sympy equation.
150
  - `lambda_format` - a lambda function for that equation, that you can pass values through.
151
+
152
+
153
+ # Docker
154
+
155
+ You can also test out PySR in Docker, without
156
+ installing it locally, by running the following command in
157
+ the root directory of this repo:
158
+ ```bash
159
+ docker build --pull --rm -f "Dockerfile" -t pysr "."
160
+ ```
161
+ This builds an image called `pysr`. You can then run this with:
162
+ ```bash
163
+ docker run -it --rm -v "$PWD:/data" pysr ipython
164
+ ```
165
+ which will link the current directory to the container's `/data` directory
166
+ and then launch ipython.
example.py CHANGED
@@ -1,23 +1,21 @@
1
  import numpy as np
2
- from pysr import PySRRegressor
3
 
4
- # Dataset
5
- X = 3 * np.random.randn(100, 5)
6
- y = 3 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 2
7
 
8
- # Learn equations
9
  model = PySRRegressor(
10
- niterations=6,
11
- binary_operators=["plus", "mult"],
 
12
  unary_operators=[
13
  "cos",
14
  "exp",
15
- "sin", # Pre-defined library of operators (see https://pysr.readthedocs.io/en/latest/docs/operators/)
16
- "inv(x) = 2/x",
17
  ],
18
- loss="loss(x, y) = abs(x - y)", # Custom loss function
19
- ) # Define your own operator! (Julia syntax)
20
 
21
  model.fit(X, y)
22
 
23
- print(model)
 
1
  import numpy as np
 
2
 
3
+ X = 2 * np.random.randn(100, 5)
4
+ y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5
 
5
 
6
+ from pysr import PySRRegressor
7
  model = PySRRegressor(
8
+ niterations=5,
9
+ populations=8,
10
+ binary_operators=["+", "*"],
11
  unary_operators=[
12
  "cos",
13
  "exp",
14
+ "sin",
 
15
  ],
16
+ model_selection="best",
17
+ )
18
 
19
  model.fit(X, y)
20
 
21
+ print(model)