Spaces:
Running
Running
Merge pull request #1 from MilesCranmer/master
Browse files- .gitignore +1 -0
- .travis.yml +23 -11
- README.md +9 -4
- TODO.md +7 -2
- hyperparamopt.py → benchmarks/hyperparamopt.py +18 -34
- docs/options.md +1 -1
- julia/sr.jl +19 -8
- pysr/sr.py +54 -26
- setup.py +1 -1
- test/travis.sh +0 -5
.gitignore
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
.dataset*.jl
|
2 |
.hyperparams*.jl
|
3 |
*.csv
|
|
|
4 |
performance*txt
|
5 |
*.out
|
6 |
trials*
|
|
|
1 |
.dataset*.jl
|
2 |
.hyperparams*.jl
|
3 |
*.csv
|
4 |
+
*.bkup
|
5 |
performance*txt
|
6 |
*.out
|
7 |
trials*
|
.travis.yml
CHANGED
@@ -1,20 +1,32 @@
|
|
1 |
language: julia
|
2 |
-
os: linux
|
3 |
-
dist: bionic
|
4 |
-
|
5 |
julia:
|
6 |
- 1
|
7 |
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
14 |
|
15 |
before_script:
|
16 |
-
-
|
17 |
|
18 |
script:
|
19 |
-
-
|
20 |
-
-
|
|
|
|
|
|
1 |
language: julia
|
|
|
|
|
|
|
2 |
julia:
|
3 |
- 1
|
4 |
|
5 |
+
jobs:
|
6 |
+
include:
|
7 |
+
- name: "Linux"
|
8 |
+
os: linux
|
9 |
+
dist: bionic
|
10 |
+
before_install: sudo apt-get -y install python3-pip python3-setuptools
|
11 |
+
env: PY=python3 SETUPPREFIX="--user"
|
12 |
+
- name: "macOS"
|
13 |
+
os: osx
|
14 |
+
before_install: python3 --version; pip3 --version; sw_vers
|
15 |
+
env: PY=python3
|
16 |
+
- name: "Windows"
|
17 |
+
os: windows
|
18 |
+
before_install:
|
19 |
+
- choco install python --version 3.8.0
|
20 |
+
- python -m pip install --upgrade pip
|
21 |
+
env: PATH=/c/Python38:/c/Python38/Scripts:$PATH PY=python
|
22 |
|
23 |
+
install: pip3 install --upgrade pip
|
24 |
|
25 |
before_script:
|
26 |
+
- julia --color=yes -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
|
27 |
|
28 |
script:
|
29 |
+
- pip3 install numpy pandas
|
30 |
+
- $PY setup.py install $SETUPPREFIX
|
31 |
+
- PATH=$HOME/.local/bin:$PATH $PY test/test.py
|
32 |
+
|
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
# [PySR.jl](https://github.com/MilesCranmer/PySR)
|
2 |
|
|
|
|
|
3 |
[![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
|
4 |
[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
|
5 |
[![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
|
@@ -47,10 +49,11 @@ then instructions for [mac](https://julialang.org/downloads/platform/#macos)
|
|
47 |
and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
|
48 |
(Don't use the `conda-forge` version; it doesn't seem to work properly.)
|
49 |
Then, at the command line,
|
50 |
-
install the `Optim` and `SpecialFunctions`
|
|
|
51 |
|
52 |
```bash
|
53 |
-
julia -e '
|
54 |
```
|
55 |
|
56 |
For python, you need to have Python 3, numpy, sympy, and pandas installed.
|
@@ -73,8 +76,10 @@ y = 2*np.cos(X[:, 3]) + X[:, 0]**2 - 2
|
|
73 |
|
74 |
# Learn equations
|
75 |
equations = pysr(X, y, niterations=5,
|
76 |
-
|
77 |
-
|
|
|
|
|
78 |
|
79 |
...# (you can use ctl-c to exit early)
|
80 |
|
|
|
1 |
# [PySR.jl](https://github.com/MilesCranmer/PySR)
|
2 |
|
3 |
+
(pronounced like *py* as in python, and then *sur* as in surface)
|
4 |
+
|
5 |
[![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
|
6 |
[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
|
7 |
[![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
|
|
|
49 |
and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
|
50 |
(Don't use the `conda-forge` version; it doesn't seem to work properly.)
|
51 |
Then, at the command line,
|
52 |
+
install and precompile the `Optim` and `SpecialFunctions`
|
53 |
+
packages via:
|
54 |
|
55 |
```bash
|
56 |
+
julia -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
|
57 |
```
|
58 |
|
59 |
For python, you need to have Python 3, numpy, sympy, and pandas installed.
|
|
|
76 |
|
77 |
# Learn equations
|
78 |
equations = pysr(X, y, niterations=5,
|
79 |
+
binary_operators=["plus", "mult"],
|
80 |
+
unary_operators=[
|
81 |
+
"cos", "exp", "sin", #Pre-defined library of operators (see https://pysr.readthedocs.io/en/latest/docs/operators/)
|
82 |
+
"inv(x) = 1/x"]) # Define your own operator! (Julia syntax)
|
83 |
|
84 |
...# (you can use ctl-c to exit early)
|
85 |
|
TODO.md
CHANGED
@@ -58,19 +58,23 @@
|
|
58 |
- [x] Consider printing output sorted by score, not by complexity.
|
59 |
- [x] Increase max complexity slowly over time up to the actual max.
|
60 |
- [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
|
|
|
|
|
61 |
- [ ] Sort these todo lists by priority
|
62 |
|
63 |
## Feature ideas
|
64 |
|
65 |
-
- [ ]
|
|
|
|
|
66 |
- [ ] Cross-validation
|
67 |
-
- [ ] Sympy printing
|
68 |
- [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
|
69 |
- [ ] Add function to plot equations
|
70 |
- [ ] Refresh screen rather than dumping to stdout?
|
71 |
- [ ] Add ability to save state from python
|
72 |
- [ ] Additional degree operators?
|
73 |
- [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
|
|
|
74 |
- [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
|
75 |
- [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
|
76 |
- [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
|
@@ -100,6 +104,7 @@
|
|
100 |
|
101 |
- [ ] How hard is it to turn the recursive array evaluation into a for loop?
|
102 |
- [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
|
|
|
103 |
- [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
|
104 |
- Ongoing in cluster branch
|
105 |
- [ ] Performance: try inling things?
|
|
|
58 |
- [x] Consider printing output sorted by score, not by complexity.
|
59 |
- [x] Increase max complexity slowly over time up to the actual max.
|
60 |
- [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
|
61 |
+
- [x] Do printing from Python side. Then we can do simplification and pretty-printing.
|
62 |
+
- [x] Sympy printing
|
63 |
- [ ] Sort these todo lists by priority
|
64 |
|
65 |
## Feature ideas
|
66 |
|
67 |
+
- [ ] Other default losses (e.g., abs, other likelihoods, or just allow user to pass this as a string).
|
68 |
+
- [ ] Other dtypes available
|
69 |
+
- [ ] NDSA-II
|
70 |
- [ ] Cross-validation
|
|
|
71 |
- [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
|
72 |
- [ ] Add function to plot equations
|
73 |
- [ ] Refresh screen rather than dumping to stdout?
|
74 |
- [ ] Add ability to save state from python
|
75 |
- [ ] Additional degree operators?
|
76 |
- [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
|
77 |
+
- Much easier option: simply flatten the output vector, and set the index as another input feature. The equation learned will be a single equation containing indices as a feature.
|
78 |
- [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
|
79 |
- [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
|
80 |
- [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
|
|
|
104 |
|
105 |
- [ ] How hard is it to turn the recursive array evaluation into a for loop?
|
106 |
- [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
|
107 |
+
- in array branch
|
108 |
- [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
|
109 |
- Ongoing in cluster branch
|
110 |
- [ ] Performance: try inling things?
|
hyperparamopt.py → benchmarks/hyperparamopt.py
RENAMED
@@ -34,58 +34,46 @@ def run_trial(args):
|
|
34 |
"""
|
35 |
|
36 |
print("Running on", args)
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
total_steps = 10*100*1000
|
42 |
-
niterations = args['niterations']
|
43 |
-
npop = args['npop']
|
44 |
-
if niterations == 0 or npop == 0:
|
45 |
-
print("Bad parameters")
|
46 |
-
return {'status': 'ok', 'loss': np.inf}
|
47 |
-
|
48 |
-
args['ncyclesperiteration'] = int(total_steps / (niterations * npop))
|
49 |
args['topn'] = 10
|
50 |
-
args['parsimony'] =
|
|
|
51 |
args['annealing'] = True
|
52 |
|
53 |
if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
|
54 |
print("Bad parameters")
|
55 |
return {'status': 'ok', 'loss': np.inf}
|
56 |
|
57 |
-
|
58 |
args['weightDoNothing'] = 1.0
|
59 |
-
|
60 |
-
maxTime = 30
|
61 |
-
ntrials = 2
|
62 |
-
equation_file = f'.hall_of_fame_{np.random.rand():f}.csv'
|
63 |
|
64 |
with temp_seed(0):
|
65 |
-
X = np.random.randn(100,
|
66 |
|
67 |
-
eval_str = [
|
68 |
-
"np.sign(X[:, 2])*np.abs(X[:, 2])**
|
69 |
"np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
|
70 |
-
"
|
71 |
-
"
|
|
|
72 |
|
73 |
print(f"Starting", str(args))
|
74 |
try:
|
75 |
trials = []
|
76 |
-
for i in range(
|
77 |
print(f"Starting test {i}")
|
78 |
for j in range(ntrials):
|
79 |
print(f"Starting trial {j}")
|
80 |
-
|
81 |
-
|
82 |
procs=4,
|
|
|
83 |
binary_operators=["plus", "mult", "pow", "div"],
|
84 |
-
unary_operators=["cos", "exp", "sin", "
|
85 |
-
equation_file=equation_file,
|
86 |
-
timeout=maxTime,
|
87 |
maxsize=25,
|
88 |
-
|
89 |
**args)
|
90 |
if len(trial) == 0: raise ValueError
|
91 |
trials.append(
|
@@ -109,8 +97,6 @@ def run_trial(args):
|
|
109 |
|
110 |
|
111 |
space = {
|
112 |
-
'niterations': hp.qlognormal('niterations', np.log(10), 1.0, 1),
|
113 |
-
'npop': hp.qlognormal('npop', np.log(100), 1.0, 1),
|
114 |
'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
|
115 |
'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
|
116 |
'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
|
@@ -126,8 +112,6 @@ space = {
|
|
126 |
|
127 |
################################################################################
|
128 |
|
129 |
-
|
130 |
-
|
131 |
def merge_trials(trials1, trials2_slice):
|
132 |
"""Merge two hyperopt trials objects
|
133 |
|
|
|
34 |
"""
|
35 |
|
36 |
print("Running on", args)
|
37 |
+
args['niterations'] = 100
|
38 |
+
args['npop'] = 100
|
39 |
+
args['ncyclesperiteration'] = 1000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
args['topn'] = 10
|
41 |
+
args['parsimony'] = 0.0
|
42 |
+
args['useFrequency'] = True
|
43 |
args['annealing'] = True
|
44 |
|
45 |
if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
|
46 |
print("Bad parameters")
|
47 |
return {'status': 'ok', 'loss': np.inf}
|
48 |
|
|
|
49 |
args['weightDoNothing'] = 1.0
|
50 |
+
ntrials = 3
|
|
|
|
|
|
|
51 |
|
52 |
with temp_seed(0):
|
53 |
+
X = np.random.randn(100, 10)*3
|
54 |
|
55 |
+
eval_str = [
|
56 |
+
"np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
|
57 |
"np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
|
58 |
+
"(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)",
|
59 |
+
"X[:, 0] * np.sin(2*np.pi * (X[:, 1] * X[:, 2] - X[:, 3] / X[:, 4])) + 3.0"
|
60 |
+
]
|
61 |
|
62 |
print(f"Starting", str(args))
|
63 |
try:
|
64 |
trials = []
|
65 |
+
for i in range(len(eval_str)):
|
66 |
print(f"Starting test {i}")
|
67 |
for j in range(ntrials):
|
68 |
print(f"Starting trial {j}")
|
69 |
+
y = eval(eval_str[i])
|
70 |
+
trial = pysr.pysr(X, y,
|
71 |
procs=4,
|
72 |
+
populations=20,
|
73 |
binary_operators=["plus", "mult", "pow", "div"],
|
74 |
+
unary_operators=["cos", "exp", "sin", "logm", "abs"],
|
|
|
|
|
75 |
maxsize=25,
|
76 |
+
constraints={'pow': (-1, 1)},
|
77 |
**args)
|
78 |
if len(trial) == 0: raise ValueError
|
79 |
trials.append(
|
|
|
97 |
|
98 |
|
99 |
space = {
|
|
|
|
|
100 |
'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
|
101 |
'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
|
102 |
'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
|
|
|
112 |
|
113 |
################################################################################
|
114 |
|
|
|
|
|
115 |
def merge_trials(trials1, trials2_slice):
|
116 |
"""Merge two hyperopt trials objects
|
117 |
|
docs/options.md
CHANGED
@@ -22,7 +22,7 @@ These are described below
|
|
22 |
The program will output a pandas DataFrame containing the equations,
|
23 |
mean square error, and complexity. It will also dump to a csv
|
24 |
at the end of every iteration,
|
25 |
-
which is `
|
26 |
equations to stdout.
|
27 |
|
28 |
## Operators
|
|
|
22 |
The program will output a pandas DataFrame containing the equations,
|
23 |
mean square error, and complexity. It will also dump to a csv
|
24 |
at the end of every iteration,
|
25 |
+
which is `hall_of_fame_{date_time}.csv` by default. It also prints the
|
26 |
equations to stdout.
|
27 |
|
28 |
## Operators
|
julia/sr.jl
CHANGED
@@ -1086,7 +1086,12 @@ function fullRun(niterations::Integer;
|
|
1086 |
end
|
1087 |
println("Started!")
|
1088 |
cycles_complete = npopulations * niterations
|
1089 |
-
|
|
|
|
|
|
|
|
|
|
|
1090 |
|
1091 |
last_print_time = time()
|
1092 |
num_equations = 0.0
|
@@ -1212,15 +1217,19 @@ function fullRun(niterations::Integer;
|
|
1212 |
deleteat!(equation_speed, 1)
|
1213 |
end
|
1214 |
average_speed = sum(equation_speed)/length(equation_speed)
|
1215 |
-
@printf("\n")
|
1216 |
-
@printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
|
1217 |
-
@printf("Hall of Fame:\n")
|
1218 |
-
@printf("-----------------------------------------\n")
|
1219 |
-
@printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
|
1220 |
curMSE = baselineMSE
|
1221 |
-
@printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
|
1222 |
lastMSE = curMSE
|
1223 |
lastComplexity = 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1224 |
|
1225 |
for size=1:actualMaxsize
|
1226 |
if hallOfFame.exists[size]
|
@@ -1246,7 +1255,9 @@ function fullRun(niterations::Integer;
|
|
1246 |
delta_c = size - lastComplexity
|
1247 |
delta_l_mse = log(curMSE/lastMSE)
|
1248 |
score = convert(Float32, -delta_l_mse/delta_c)
|
1249 |
-
|
|
|
|
|
1250 |
lastMSE = curMSE
|
1251 |
lastComplexity = size
|
1252 |
end
|
|
|
1086 |
end
|
1087 |
println("Started!")
|
1088 |
cycles_complete = npopulations * niterations
|
1089 |
+
if warmupMaxsize != 0
|
1090 |
+
curmaxsize += 1
|
1091 |
+
if curmaxsize > maxsize
|
1092 |
+
curmaxsize = maxsize
|
1093 |
+
end
|
1094 |
+
end
|
1095 |
|
1096 |
last_print_time = time()
|
1097 |
num_equations = 0.0
|
|
|
1217 |
deleteat!(equation_speed, 1)
|
1218 |
end
|
1219 |
average_speed = sum(equation_speed)/length(equation_speed)
|
|
|
|
|
|
|
|
|
|
|
1220 |
curMSE = baselineMSE
|
|
|
1221 |
lastMSE = curMSE
|
1222 |
lastComplexity = 0
|
1223 |
+
if verbosity > 0
|
1224 |
+
@printf("\n")
|
1225 |
+
@printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
|
1226 |
+
cycles_elapsed = npopulations * niterations - cycles_complete
|
1227 |
+
@printf("Progress: %d / %d total iterations (%.3f%%)\n", cycles_elapsed, npopulations * niterations, 100.0*cycles_elapsed/(npopulations*niterations))
|
1228 |
+
@printf("Hall of Fame:\n")
|
1229 |
+
@printf("-----------------------------------------\n")
|
1230 |
+
@printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
|
1231 |
+
@printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
|
1232 |
+
end
|
1233 |
|
1234 |
for size=1:actualMaxsize
|
1235 |
if hallOfFame.exists[size]
|
|
|
1255 |
delta_c = size - lastComplexity
|
1256 |
delta_l_mse = log(curMSE/lastMSE)
|
1257 |
score = convert(Float32, -delta_l_mse/delta_c)
|
1258 |
+
if verbosity > 0
|
1259 |
+
@printf("%-10d %-8.3e %-8.3e %-s\n" , size, curMSE, score, stringTree(member.tree))
|
1260 |
+
end
|
1261 |
lastMSE = curMSE
|
1262 |
lastComplexity = size
|
1263 |
end
|
pysr/sr.py
CHANGED
@@ -7,6 +7,11 @@ import pandas as pd
|
|
7 |
import sympy
|
8 |
from sympy import sympify, Symbol, lambdify
|
9 |
import subprocess
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
global_equation_file = 'hall_of_fame.csv'
|
12 |
global_n_features = None
|
@@ -79,7 +84,7 @@ def pysr(X=None, y=None, weights=None,
|
|
79 |
nrestarts=3,
|
80 |
timeout=None,
|
81 |
extra_sympy_mappings={},
|
82 |
-
equation_file=
|
83 |
test='simple1',
|
84 |
verbosity=1e9,
|
85 |
maxsize=20,
|
@@ -92,6 +97,8 @@ def pysr(X=None, y=None, weights=None,
|
|
92 |
warmupMaxsize=0,
|
93 |
constraints={},
|
94 |
useFrequency=False,
|
|
|
|
|
95 |
limitPowComplexity=False, #deprecated
|
96 |
threads=None, #deprecated
|
97 |
julia_optimization=3,
|
@@ -178,6 +185,8 @@ def pysr(X=None, y=None, weights=None,
|
|
178 |
and use that instead of parsimony to explore equation space. Will
|
179 |
naturally find equations of all complexities.
|
180 |
:param julia_optimization: int, Optimization level (0, 1, 2, 3)
|
|
|
|
|
181 |
:returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
|
182 |
(as strings).
|
183 |
|
@@ -188,6 +197,9 @@ def pysr(X=None, y=None, weights=None,
|
|
188 |
raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
|
189 |
if maxdepth is None:
|
190 |
maxdepth = maxsize
|
|
|
|
|
|
|
191 |
|
192 |
if isinstance(X, pd.DataFrame):
|
193 |
variable_names = list(X.columns)
|
@@ -215,13 +227,11 @@ def pysr(X=None, y=None, weights=None,
|
|
215 |
X = X[:, selection]
|
216 |
|
217 |
if use_custom_variable_names:
|
218 |
-
variable_names = variable_names[selection]
|
219 |
|
220 |
if populations is None:
|
221 |
populations = procs
|
222 |
|
223 |
-
rand_string = f'{"".join([str(np.random.rand())[2] for i in range(20)])}'
|
224 |
-
|
225 |
if isinstance(binary_operators, str): binary_operators = [binary_operators]
|
226 |
if isinstance(unary_operators, str): unary_operators = [unary_operators]
|
227 |
|
@@ -241,7 +251,18 @@ def pysr(X=None, y=None, weights=None,
|
|
241 |
y = eval(eval_str)
|
242 |
print("Running on", eval_str)
|
243 |
|
244 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
245 |
|
246 |
def_hyperparams = ""
|
247 |
|
@@ -273,7 +294,7 @@ def pysr(X=None, y=None, weights=None,
|
|
273 |
elif op == 'mult':
|
274 |
# Make sure the complex expression is in the left side.
|
275 |
if constraints[op][0] == -1:
|
276 |
-
continue
|
277 |
elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
|
278 |
constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
|
279 |
|
@@ -298,8 +319,7 @@ const bin_constraints = ["""
|
|
298 |
first = False
|
299 |
constraints_str += "]"
|
300 |
|
301 |
-
|
302 |
-
def_hyperparams += f"""include("{pkg_directory}/operators.jl")
|
303 |
{constraints_str}
|
304 |
const binops = {'[' + ', '.join(binary_operators) + ']'}
|
305 |
const unaops = {'[' + ', '.join(unary_operators) + ']'}
|
@@ -375,34 +395,35 @@ end"""
|
|
375 |
|
376 |
def_hyperparams += op_runner
|
377 |
|
378 |
-
|
379 |
-
X_str = 'transpose([' + str(X.tolist()).replace(']', '').replace(',', '').replace('[', '') + '])'
|
380 |
-
else:
|
381 |
-
X_str = str(X.tolist()).replace('],', '];').replace(',', '')
|
382 |
-
y_str = str(y.tolist())
|
383 |
|
384 |
-
|
385 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
386 |
|
387 |
if weights is not None:
|
388 |
-
|
389 |
-
|
390 |
-
const weights = convert(Array{Float32, 1}, """f"{weight_str})"
|
391 |
|
392 |
if use_custom_variable_names:
|
393 |
def_hyperparams += f"""
|
394 |
const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
|
395 |
|
396 |
-
with open(
|
397 |
print(def_hyperparams, file=f)
|
398 |
|
399 |
-
with open(
|
400 |
print(def_datasets, file=f)
|
401 |
|
402 |
-
with open(
|
403 |
-
print(f'@everywhere include("
|
404 |
-
print(f'@everywhere include("
|
405 |
-
print(f'@everywhere include("{
|
406 |
print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
|
407 |
print(f'rmprocs(nprocs)', file=f)
|
408 |
|
@@ -410,7 +431,7 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
|
|
410 |
command = [
|
411 |
f'julia', f'-O{julia_optimization:d}',
|
412 |
f'-p', f'{procs}',
|
413 |
-
|
414 |
]
|
415 |
if timeout is not None:
|
416 |
command = [f'timeout', f'{timeout}'] + command
|
@@ -439,6 +460,9 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
|
|
439 |
print("Killing process... will return when done.")
|
440 |
process.kill()
|
441 |
|
|
|
|
|
|
|
442 |
return get_hof()
|
443 |
|
444 |
|
@@ -550,4 +574,8 @@ def best_callable(equations=None):
|
|
550 |
if equations is None: equations = get_hof()
|
551 |
return best_row(equations)['lambda_format']
|
552 |
|
553 |
-
|
|
|
|
|
|
|
|
|
|
7 |
import sympy
|
8 |
from sympy import sympify, Symbol, lambdify
|
9 |
import subprocess
|
10 |
+
import tempfile
|
11 |
+
import shutil
|
12 |
+
from pathlib import Path
|
13 |
+
from datetime import datetime
|
14 |
+
|
15 |
|
16 |
global_equation_file = 'hall_of_fame.csv'
|
17 |
global_n_features = None
|
|
|
84 |
nrestarts=3,
|
85 |
timeout=None,
|
86 |
extra_sympy_mappings={},
|
87 |
+
equation_file=None,
|
88 |
test='simple1',
|
89 |
verbosity=1e9,
|
90 |
maxsize=20,
|
|
|
97 |
warmupMaxsize=0,
|
98 |
constraints={},
|
99 |
useFrequency=False,
|
100 |
+
tempdir=None,
|
101 |
+
delete_tempfiles=True,
|
102 |
limitPowComplexity=False, #deprecated
|
103 |
threads=None, #deprecated
|
104 |
julia_optimization=3,
|
|
|
185 |
and use that instead of parsimony to explore equation space. Will
|
186 |
naturally find equations of all complexities.
|
187 |
:param julia_optimization: int, Optimization level (0, 1, 2, 3)
|
188 |
+
:param tempdir: str or None, directory for the temporary files
|
189 |
+
:param delete_tempfiles: bool, whether to delete the temporary files after finishing
|
190 |
:returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
|
191 |
(as strings).
|
192 |
|
|
|
197 |
raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
|
198 |
if maxdepth is None:
|
199 |
maxdepth = maxsize
|
200 |
+
if equation_file is None:
|
201 |
+
date_time = datetime.now().strftime("%Y-%m-%d_%H%M%S.%f")[:-3]
|
202 |
+
equation_file = 'hall_of_fame_' + date_time + '.csv'
|
203 |
|
204 |
if isinstance(X, pd.DataFrame):
|
205 |
variable_names = list(X.columns)
|
|
|
227 |
X = X[:, selection]
|
228 |
|
229 |
if use_custom_variable_names:
|
230 |
+
variable_names = [variable_names[selection[i]] for i in range(len(selection))]
|
231 |
|
232 |
if populations is None:
|
233 |
populations = procs
|
234 |
|
|
|
|
|
235 |
if isinstance(binary_operators, str): binary_operators = [binary_operators]
|
236 |
if isinstance(unary_operators, str): unary_operators = [unary_operators]
|
237 |
|
|
|
251 |
y = eval(eval_str)
|
252 |
print("Running on", eval_str)
|
253 |
|
254 |
+
# System-independent paths
|
255 |
+
pkg_directory = Path(__file__).parents[1] / 'julia'
|
256 |
+
pkg_filename = pkg_directory / "sr.jl"
|
257 |
+
operator_filename = pkg_directory / "operators.jl"
|
258 |
+
|
259 |
+
tmpdir = Path(tempfile.mkdtemp(dir=tempdir))
|
260 |
+
hyperparam_filename = tmpdir / f'hyperparams.jl'
|
261 |
+
dataset_filename = tmpdir / f'dataset.jl'
|
262 |
+
runfile_filename = tmpdir / f'runfile.jl'
|
263 |
+
X_filename = tmpdir / "X.csv"
|
264 |
+
y_filename = tmpdir / "y.csv"
|
265 |
+
weights_filename = tmpdir / "weights.csv"
|
266 |
|
267 |
def_hyperparams = ""
|
268 |
|
|
|
294 |
elif op == 'mult':
|
295 |
# Make sure the complex expression is in the left side.
|
296 |
if constraints[op][0] == -1:
|
297 |
+
continue
|
298 |
elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
|
299 |
constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
|
300 |
|
|
|
319 |
first = False
|
320 |
constraints_str += "]"
|
321 |
|
322 |
+
def_hyperparams += f"""include("{_escape_filename(operator_filename)}")
|
|
|
323 |
{constraints_str}
|
324 |
const binops = {'[' + ', '.join(binary_operators) + ']'}
|
325 |
const unaops = {'[' + ', '.join(unary_operators) + ']'}
|
|
|
395 |
|
396 |
def_hyperparams += op_runner
|
397 |
|
398 |
+
def_datasets = """using DelimitedFiles"""
|
|
|
|
|
|
|
|
|
399 |
|
400 |
+
np.savetxt(X_filename, X, delimiter=',')
|
401 |
+
np.savetxt(y_filename, y, delimiter=',')
|
402 |
+
if weights is not None:
|
403 |
+
np.savetxt(weights_filename, weights, delimiter=',')
|
404 |
+
|
405 |
+
def_datasets += f"""
|
406 |
+
const X = readdlm("{_escape_filename(X_filename)}", ',', Float32, '\\n')
|
407 |
+
const y = readdlm("{_escape_filename(y_filename)}", ',', Float32, '\\n')"""
|
408 |
|
409 |
if weights is not None:
|
410 |
+
def_datasets += f"""
|
411 |
+
const weights = readdlm("{_escape_filename(weights_filename)}", ',', Float32, '\\n')"""
|
|
|
412 |
|
413 |
if use_custom_variable_names:
|
414 |
def_hyperparams += f"""
|
415 |
const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
|
416 |
|
417 |
+
with open(hyperparam_filename, 'w') as f:
|
418 |
print(def_hyperparams, file=f)
|
419 |
|
420 |
+
with open(dataset_filename, 'w') as f:
|
421 |
print(def_datasets, file=f)
|
422 |
|
423 |
+
with open(runfile_filename, 'w') as f:
|
424 |
+
print(f'@everywhere include("{_escape_filename(hyperparam_filename)}")', file=f)
|
425 |
+
print(f'@everywhere include("{_escape_filename(dataset_filename)}")', file=f)
|
426 |
+
print(f'@everywhere include("{_escape_filename(pkg_filename)}")', file=f)
|
427 |
print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
|
428 |
print(f'rmprocs(nprocs)', file=f)
|
429 |
|
|
|
431 |
command = [
|
432 |
f'julia', f'-O{julia_optimization:d}',
|
433 |
f'-p', f'{procs}',
|
434 |
+
str(runfile_filename),
|
435 |
]
|
436 |
if timeout is not None:
|
437 |
command = [f'timeout', f'{timeout}'] + command
|
|
|
460 |
print("Killing process... will return when done.")
|
461 |
process.kill()
|
462 |
|
463 |
+
if delete_tempfiles:
|
464 |
+
shutil.rmtree(tmpdir)
|
465 |
+
|
466 |
return get_hof()
|
467 |
|
468 |
|
|
|
574 |
if equations is None: equations = get_hof()
|
575 |
return best_row(equations)['lambda_format']
|
576 |
|
577 |
+
def _escape_filename(filename):
|
578 |
+
"""Turns a file into a string representation with correctly escaped backslashes"""
|
579 |
+
repr = str(filename)
|
580 |
+
repr = repr.replace('\\', '\\\\')
|
581 |
+
return repr
|
setup.py
CHANGED
@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
|
|
5 |
|
6 |
setuptools.setup(
|
7 |
name="pysr", # Replace with your own username
|
8 |
-
version="0.3.
|
9 |
author="Miles Cranmer",
|
10 |
author_email="[email protected]",
|
11 |
description="Simple and efficient symbolic regression",
|
|
|
5 |
|
6 |
setuptools.setup(
|
7 |
name="pysr", # Replace with your own username
|
8 |
+
version="0.3.36",
|
9 |
author="Miles Cranmer",
|
10 |
author_email="[email protected]",
|
11 |
description="Simple and efficient symbolic regression",
|
test/travis.sh
DELETED
@@ -1,5 +0,0 @@
|
|
1 |
-
#!/bin/bash
|
2 |
-
sudo python3 -m pip install numpy pandas &&
|
3 |
-
sudo python3 setup.py install &&
|
4 |
-
python3 test/test.py
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|