Dhananjay Ashok commited on
Commit
7be8652
·
unverified ·
2 Parent(s): 70dedf9 1175632

Merge pull request #1 from MilesCranmer/master

Browse files
.gitignore CHANGED
@@ -1,6 +1,7 @@
1
  .dataset*.jl
2
  .hyperparams*.jl
3
  *.csv
 
4
  performance*txt
5
  *.out
6
  trials*
 
1
  .dataset*.jl
2
  .hyperparams*.jl
3
  *.csv
4
+ *.bkup
5
  performance*txt
6
  *.out
7
  trials*
.travis.yml CHANGED
@@ -1,20 +1,32 @@
1
  language: julia
2
- os: linux
3
- dist: bionic
4
-
5
  julia:
6
  - 1
7
 
8
- addons:
9
- apt:
10
- packages:
11
- - python3-pip
12
- - python3-setuptools
 
 
 
 
 
 
 
 
 
 
 
 
13
 
 
14
 
15
  before_script:
16
- - export PATH=$HOME/.local/bin:$PATH
17
 
18
  script:
19
- - julia --color=yes -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
20
- - ./test/travis.sh
 
 
 
1
  language: julia
 
 
 
2
  julia:
3
  - 1
4
 
5
+ jobs:
6
+ include:
7
+ - name: "Linux"
8
+ os: linux
9
+ dist: bionic
10
+ before_install: sudo apt-get -y install python3-pip python3-setuptools
11
+ env: PY=python3 SETUPPREFIX="--user"
12
+ - name: "macOS"
13
+ os: osx
14
+ before_install: python3 --version; pip3 --version; sw_vers
15
+ env: PY=python3
16
+ - name: "Windows"
17
+ os: windows
18
+ before_install:
19
+ - choco install python --version 3.8.0
20
+ - python -m pip install --upgrade pip
21
+ env: PATH=/c/Python38:/c/Python38/Scripts:$PATH PY=python
22
 
23
+ install: pip3 install --upgrade pip
24
 
25
  before_script:
26
+ - julia --color=yes -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
27
 
28
  script:
29
+ - pip3 install numpy pandas
30
+ - $PY setup.py install $SETUPPREFIX
31
+ - PATH=$HOME/.local/bin:$PATH $PY test/test.py
32
+
README.md CHANGED
@@ -1,5 +1,7 @@
1
  # [PySR.jl](https://github.com/MilesCranmer/PySR)
2
 
 
 
3
  [![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
4
  [![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
5
  [![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
@@ -47,10 +49,11 @@ then instructions for [mac](https://julialang.org/downloads/platform/#macos)
47
  and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
48
  (Don't use the `conda-forge` version; it doesn't seem to work properly.)
49
  Then, at the command line,
50
- install the `Optim` and `SpecialFunctions` packages via:
 
51
 
52
  ```bash
53
- julia -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
54
  ```
55
 
56
  For python, you need to have Python 3, numpy, sympy, and pandas installed.
@@ -73,8 +76,10 @@ y = 2*np.cos(X[:, 3]) + X[:, 0]**2 - 2
73
 
74
  # Learn equations
75
  equations = pysr(X, y, niterations=5,
76
- binary_operators=["plus", "mult"],
77
- unary_operators=["cos", "exp", "sin"])
 
 
78
 
79
  ...# (you can use ctl-c to exit early)
80
 
 
1
  # [PySR.jl](https://github.com/MilesCranmer/PySR)
2
 
3
+ (pronounced like *py* as in python, and then *sur* as in surface)
4
+
5
  [![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
6
  [![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
7
  [![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
 
49
  and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
50
  (Don't use the `conda-forge` version; it doesn't seem to work properly.)
51
  Then, at the command line,
52
+ install and precompile the `Optim` and `SpecialFunctions`
53
+ packages via:
54
 
55
  ```bash
56
+ julia -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
57
  ```
58
 
59
  For python, you need to have Python 3, numpy, sympy, and pandas installed.
 
76
 
77
  # Learn equations
78
  equations = pysr(X, y, niterations=5,
79
+ binary_operators=["plus", "mult"],
80
+ unary_operators=[
81
+ "cos", "exp", "sin", #Pre-defined library of operators (see https://pysr.readthedocs.io/en/latest/docs/operators/)
82
+ "inv(x) = 1/x"]) # Define your own operator! (Julia syntax)
83
 
84
  ...# (you can use ctl-c to exit early)
85
 
TODO.md CHANGED
@@ -58,19 +58,23 @@
58
  - [x] Consider printing output sorted by score, not by complexity.
59
  - [x] Increase max complexity slowly over time up to the actual max.
60
  - [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
 
 
61
  - [ ] Sort these todo lists by priority
62
 
63
  ## Feature ideas
64
 
65
- - [ ] Do printing from Python side. Then we can do simplification and pretty-printing.
 
 
66
  - [ ] Cross-validation
67
- - [ ] Sympy printing
68
  - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
69
  - [ ] Add function to plot equations
70
  - [ ] Refresh screen rather than dumping to stdout?
71
  - [ ] Add ability to save state from python
72
  - [ ] Additional degree operators?
73
  - [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
 
74
  - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
75
  - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
76
  - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
@@ -100,6 +104,7 @@
100
 
101
  - [ ] How hard is it to turn the recursive array evaluation into a for loop?
102
  - [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
 
103
  - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
104
  - Ongoing in cluster branch
105
  - [ ] Performance: try inling things?
 
58
  - [x] Consider printing output sorted by score, not by complexity.
59
  - [x] Increase max complexity slowly over time up to the actual max.
60
  - [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
61
+ - [x] Do printing from Python side. Then we can do simplification and pretty-printing.
62
+ - [x] Sympy printing
63
  - [ ] Sort these todo lists by priority
64
 
65
  ## Feature ideas
66
 
67
+ - [ ] Other default losses (e.g., abs, other likelihoods, or just allow user to pass this as a string).
68
+ - [ ] Other dtypes available
69
+ - [ ] NDSA-II
70
  - [ ] Cross-validation
 
71
  - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
72
  - [ ] Add function to plot equations
73
  - [ ] Refresh screen rather than dumping to stdout?
74
  - [ ] Add ability to save state from python
75
  - [ ] Additional degree operators?
76
  - [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
77
+ - Much easier option: simply flatten the output vector, and set the index as another input feature. The equation learned will be a single equation containing indices as a feature.
78
  - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
79
  - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
80
  - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
 
104
 
105
  - [ ] How hard is it to turn the recursive array evaluation into a for loop?
106
  - [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
107
+ - in array branch
108
  - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
109
  - Ongoing in cluster branch
110
  - [ ] Performance: try inling things?
hyperparamopt.py → benchmarks/hyperparamopt.py RENAMED
@@ -34,58 +34,46 @@ def run_trial(args):
34
  """
35
 
36
  print("Running on", args)
37
- for key in 'niterations npop'.split(' '):
38
- args[key] = int(args[key])
39
-
40
-
41
- total_steps = 10*100*1000
42
- niterations = args['niterations']
43
- npop = args['npop']
44
- if niterations == 0 or npop == 0:
45
- print("Bad parameters")
46
- return {'status': 'ok', 'loss': np.inf}
47
-
48
- args['ncyclesperiteration'] = int(total_steps / (niterations * npop))
49
  args['topn'] = 10
50
- args['parsimony'] = 1e-3
 
51
  args['annealing'] = True
52
 
53
  if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
54
  print("Bad parameters")
55
  return {'status': 'ok', 'loss': np.inf}
56
 
57
-
58
  args['weightDoNothing'] = 1.0
59
-
60
- maxTime = 30
61
- ntrials = 2
62
- equation_file = f'.hall_of_fame_{np.random.rand():f}.csv'
63
 
64
  with temp_seed(0):
65
- X = np.random.randn(100, 5)*3
66
 
67
- eval_str = ["np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
68
- "np.sign(X[:, 2])*np.abs(X[:, 2])**3.5 + 1/(np.abs(X[:, 0])+1)",
69
  "np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
70
- "1.0 + 3*X[:, 0]**2 - 0.5*X[:, 0]**3 + 0.1*X[:, 0]**4",
71
- "(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)"]
 
72
 
73
  print(f"Starting", str(args))
74
  try:
75
  trials = []
76
- for i in range(3, 6):
77
  print(f"Starting test {i}")
78
  for j in range(ntrials):
79
  print(f"Starting trial {j}")
80
- trial = pysr.pysr(
81
- test=f"simple{i}",
82
  procs=4,
 
83
  binary_operators=["plus", "mult", "pow", "div"],
84
- unary_operators=["cos", "exp", "sin", "loga", "abs"],
85
- equation_file=equation_file,
86
- timeout=maxTime,
87
  maxsize=25,
88
- verbosity=0,
89
  **args)
90
  if len(trial) == 0: raise ValueError
91
  trials.append(
@@ -109,8 +97,6 @@ def run_trial(args):
109
 
110
 
111
  space = {
112
- 'niterations': hp.qlognormal('niterations', np.log(10), 1.0, 1),
113
- 'npop': hp.qlognormal('npop', np.log(100), 1.0, 1),
114
  'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
115
  'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
116
  'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
@@ -126,8 +112,6 @@ space = {
126
 
127
  ################################################################################
128
 
129
-
130
-
131
  def merge_trials(trials1, trials2_slice):
132
  """Merge two hyperopt trials objects
133
 
 
34
  """
35
 
36
  print("Running on", args)
37
+ args['niterations'] = 100
38
+ args['npop'] = 100
39
+ args['ncyclesperiteration'] = 1000
 
 
 
 
 
 
 
 
 
40
  args['topn'] = 10
41
+ args['parsimony'] = 0.0
42
+ args['useFrequency'] = True
43
  args['annealing'] = True
44
 
45
  if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
46
  print("Bad parameters")
47
  return {'status': 'ok', 'loss': np.inf}
48
 
 
49
  args['weightDoNothing'] = 1.0
50
+ ntrials = 3
 
 
 
51
 
52
  with temp_seed(0):
53
+ X = np.random.randn(100, 10)*3
54
 
55
+ eval_str = [
56
+ "np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
57
  "np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
58
+ "(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)",
59
+ "X[:, 0] * np.sin(2*np.pi * (X[:, 1] * X[:, 2] - X[:, 3] / X[:, 4])) + 3.0"
60
+ ]
61
 
62
  print(f"Starting", str(args))
63
  try:
64
  trials = []
65
+ for i in range(len(eval_str)):
66
  print(f"Starting test {i}")
67
  for j in range(ntrials):
68
  print(f"Starting trial {j}")
69
+ y = eval(eval_str[i])
70
+ trial = pysr.pysr(X, y,
71
  procs=4,
72
+ populations=20,
73
  binary_operators=["plus", "mult", "pow", "div"],
74
+ unary_operators=["cos", "exp", "sin", "logm", "abs"],
 
 
75
  maxsize=25,
76
+ constraints={'pow': (-1, 1)},
77
  **args)
78
  if len(trial) == 0: raise ValueError
79
  trials.append(
 
97
 
98
 
99
  space = {
 
 
100
  'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
101
  'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
102
  'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
 
112
 
113
  ################################################################################
114
 
 
 
115
  def merge_trials(trials1, trials2_slice):
116
  """Merge two hyperopt trials objects
117
 
docs/options.md CHANGED
@@ -22,7 +22,7 @@ These are described below
22
  The program will output a pandas DataFrame containing the equations,
23
  mean square error, and complexity. It will also dump to a csv
24
  at the end of every iteration,
25
- which is `hall_of_fame.csv` by default. It also prints the
26
  equations to stdout.
27
 
28
  ## Operators
 
22
  The program will output a pandas DataFrame containing the equations,
23
  mean square error, and complexity. It will also dump to a csv
24
  at the end of every iteration,
25
+ which is `hall_of_fame_{date_time}.csv` by default. It also prints the
26
  equations to stdout.
27
 
28
  ## Operators
julia/sr.jl CHANGED
@@ -1086,7 +1086,12 @@ function fullRun(niterations::Integer;
1086
  end
1087
  println("Started!")
1088
  cycles_complete = npopulations * niterations
1089
- curmaxsize += 1
 
 
 
 
 
1090
 
1091
  last_print_time = time()
1092
  num_equations = 0.0
@@ -1212,15 +1217,19 @@ function fullRun(niterations::Integer;
1212
  deleteat!(equation_speed, 1)
1213
  end
1214
  average_speed = sum(equation_speed)/length(equation_speed)
1215
- @printf("\n")
1216
- @printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
1217
- @printf("Hall of Fame:\n")
1218
- @printf("-----------------------------------------\n")
1219
- @printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
1220
  curMSE = baselineMSE
1221
- @printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
1222
  lastMSE = curMSE
1223
  lastComplexity = 0
 
 
 
 
 
 
 
 
 
 
1224
 
1225
  for size=1:actualMaxsize
1226
  if hallOfFame.exists[size]
@@ -1246,7 +1255,9 @@ function fullRun(niterations::Integer;
1246
  delta_c = size - lastComplexity
1247
  delta_l_mse = log(curMSE/lastMSE)
1248
  score = convert(Float32, -delta_l_mse/delta_c)
1249
- @printf("%-10d %-8.3e %-8.3e %-s\n" , size, curMSE, score, stringTree(member.tree))
 
 
1250
  lastMSE = curMSE
1251
  lastComplexity = size
1252
  end
 
1086
  end
1087
  println("Started!")
1088
  cycles_complete = npopulations * niterations
1089
+ if warmupMaxsize != 0
1090
+ curmaxsize += 1
1091
+ if curmaxsize > maxsize
1092
+ curmaxsize = maxsize
1093
+ end
1094
+ end
1095
 
1096
  last_print_time = time()
1097
  num_equations = 0.0
 
1217
  deleteat!(equation_speed, 1)
1218
  end
1219
  average_speed = sum(equation_speed)/length(equation_speed)
 
 
 
 
 
1220
  curMSE = baselineMSE
 
1221
  lastMSE = curMSE
1222
  lastComplexity = 0
1223
+ if verbosity > 0
1224
+ @printf("\n")
1225
+ @printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
1226
+ cycles_elapsed = npopulations * niterations - cycles_complete
1227
+ @printf("Progress: %d / %d total iterations (%.3f%%)\n", cycles_elapsed, npopulations * niterations, 100.0*cycles_elapsed/(npopulations*niterations))
1228
+ @printf("Hall of Fame:\n")
1229
+ @printf("-----------------------------------------\n")
1230
+ @printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
1231
+ @printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
1232
+ end
1233
 
1234
  for size=1:actualMaxsize
1235
  if hallOfFame.exists[size]
 
1255
  delta_c = size - lastComplexity
1256
  delta_l_mse = log(curMSE/lastMSE)
1257
  score = convert(Float32, -delta_l_mse/delta_c)
1258
+ if verbosity > 0
1259
+ @printf("%-10d %-8.3e %-8.3e %-s\n" , size, curMSE, score, stringTree(member.tree))
1260
+ end
1261
  lastMSE = curMSE
1262
  lastComplexity = size
1263
  end
pysr/sr.py CHANGED
@@ -7,6 +7,11 @@ import pandas as pd
7
  import sympy
8
  from sympy import sympify, Symbol, lambdify
9
  import subprocess
 
 
 
 
 
10
 
11
  global_equation_file = 'hall_of_fame.csv'
12
  global_n_features = None
@@ -79,7 +84,7 @@ def pysr(X=None, y=None, weights=None,
79
  nrestarts=3,
80
  timeout=None,
81
  extra_sympy_mappings={},
82
- equation_file='hall_of_fame.csv',
83
  test='simple1',
84
  verbosity=1e9,
85
  maxsize=20,
@@ -92,6 +97,8 @@ def pysr(X=None, y=None, weights=None,
92
  warmupMaxsize=0,
93
  constraints={},
94
  useFrequency=False,
 
 
95
  limitPowComplexity=False, #deprecated
96
  threads=None, #deprecated
97
  julia_optimization=3,
@@ -178,6 +185,8 @@ def pysr(X=None, y=None, weights=None,
178
  and use that instead of parsimony to explore equation space. Will
179
  naturally find equations of all complexities.
180
  :param julia_optimization: int, Optimization level (0, 1, 2, 3)
 
 
181
  :returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
182
  (as strings).
183
 
@@ -188,6 +197,9 @@ def pysr(X=None, y=None, weights=None,
188
  raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
189
  if maxdepth is None:
190
  maxdepth = maxsize
 
 
 
191
 
192
  if isinstance(X, pd.DataFrame):
193
  variable_names = list(X.columns)
@@ -215,13 +227,11 @@ def pysr(X=None, y=None, weights=None,
215
  X = X[:, selection]
216
 
217
  if use_custom_variable_names:
218
- variable_names = variable_names[selection]
219
 
220
  if populations is None:
221
  populations = procs
222
 
223
- rand_string = f'{"".join([str(np.random.rand())[2] for i in range(20)])}'
224
-
225
  if isinstance(binary_operators, str): binary_operators = [binary_operators]
226
  if isinstance(unary_operators, str): unary_operators = [unary_operators]
227
 
@@ -241,7 +251,18 @@ def pysr(X=None, y=None, weights=None,
241
  y = eval(eval_str)
242
  print("Running on", eval_str)
243
 
244
- pkg_directory = '/'.join(__file__.split('/')[:-2] + ['julia'])
 
 
 
 
 
 
 
 
 
 
 
245
 
246
  def_hyperparams = ""
247
 
@@ -273,7 +294,7 @@ def pysr(X=None, y=None, weights=None,
273
  elif op == 'mult':
274
  # Make sure the complex expression is in the left side.
275
  if constraints[op][0] == -1:
276
- continue
277
  elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
278
  constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
279
 
@@ -298,8 +319,7 @@ const bin_constraints = ["""
298
  first = False
299
  constraints_str += "]"
300
 
301
-
302
- def_hyperparams += f"""include("{pkg_directory}/operators.jl")
303
  {constraints_str}
304
  const binops = {'[' + ', '.join(binary_operators) + ']'}
305
  const unaops = {'[' + ', '.join(unary_operators) + ']'}
@@ -375,34 +395,35 @@ end"""
375
 
376
  def_hyperparams += op_runner
377
 
378
- if X.shape[1] == 1:
379
- X_str = 'transpose([' + str(X.tolist()).replace(']', '').replace(',', '').replace('[', '') + '])'
380
- else:
381
- X_str = str(X.tolist()).replace('],', '];').replace(',', '')
382
- y_str = str(y.tolist())
383
 
384
- def_datasets = """const X = convert(Array{Float32, 2}, """f"{X_str})""""
385
- const y = convert(Array{Float32, 1}, """f"{y_str})"
 
 
 
 
 
 
386
 
387
  if weights is not None:
388
- weight_str = str(weights.tolist())
389
- def_datasets += """
390
- const weights = convert(Array{Float32, 1}, """f"{weight_str})"
391
 
392
  if use_custom_variable_names:
393
  def_hyperparams += f"""
394
  const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
395
 
396
- with open(f'/tmp/.hyperparams_{rand_string}.jl', 'w') as f:
397
  print(def_hyperparams, file=f)
398
 
399
- with open(f'/tmp/.dataset_{rand_string}.jl', 'w') as f:
400
  print(def_datasets, file=f)
401
 
402
- with open(f'/tmp/.runfile_{rand_string}.jl', 'w') as f:
403
- print(f'@everywhere include("/tmp/.hyperparams_{rand_string}.jl")', file=f)
404
- print(f'@everywhere include("/tmp/.dataset_{rand_string}.jl")', file=f)
405
- print(f'@everywhere include("{pkg_directory}/sr.jl")', file=f)
406
  print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
407
  print(f'rmprocs(nprocs)', file=f)
408
 
@@ -410,7 +431,7 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
410
  command = [
411
  f'julia', f'-O{julia_optimization:d}',
412
  f'-p', f'{procs}',
413
- f'/tmp/.runfile_{rand_string}.jl',
414
  ]
415
  if timeout is not None:
416
  command = [f'timeout', f'{timeout}'] + command
@@ -439,6 +460,9 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
439
  print("Killing process... will return when done.")
440
  process.kill()
441
 
 
 
 
442
  return get_hof()
443
 
444
 
@@ -550,4 +574,8 @@ def best_callable(equations=None):
550
  if equations is None: equations = get_hof()
551
  return best_row(equations)['lambda_format']
552
 
553
-
 
 
 
 
 
7
  import sympy
8
  from sympy import sympify, Symbol, lambdify
9
  import subprocess
10
+ import tempfile
11
+ import shutil
12
+ from pathlib import Path
13
+ from datetime import datetime
14
+
15
 
16
  global_equation_file = 'hall_of_fame.csv'
17
  global_n_features = None
 
84
  nrestarts=3,
85
  timeout=None,
86
  extra_sympy_mappings={},
87
+ equation_file=None,
88
  test='simple1',
89
  verbosity=1e9,
90
  maxsize=20,
 
97
  warmupMaxsize=0,
98
  constraints={},
99
  useFrequency=False,
100
+ tempdir=None,
101
+ delete_tempfiles=True,
102
  limitPowComplexity=False, #deprecated
103
  threads=None, #deprecated
104
  julia_optimization=3,
 
185
  and use that instead of parsimony to explore equation space. Will
186
  naturally find equations of all complexities.
187
  :param julia_optimization: int, Optimization level (0, 1, 2, 3)
188
+ :param tempdir: str or None, directory for the temporary files
189
+ :param delete_tempfiles: bool, whether to delete the temporary files after finishing
190
  :returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
191
  (as strings).
192
 
 
197
  raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
198
  if maxdepth is None:
199
  maxdepth = maxsize
200
+ if equation_file is None:
201
+ date_time = datetime.now().strftime("%Y-%m-%d_%H%M%S.%f")[:-3]
202
+ equation_file = 'hall_of_fame_' + date_time + '.csv'
203
 
204
  if isinstance(X, pd.DataFrame):
205
  variable_names = list(X.columns)
 
227
  X = X[:, selection]
228
 
229
  if use_custom_variable_names:
230
+ variable_names = [variable_names[selection[i]] for i in range(len(selection))]
231
 
232
  if populations is None:
233
  populations = procs
234
 
 
 
235
  if isinstance(binary_operators, str): binary_operators = [binary_operators]
236
  if isinstance(unary_operators, str): unary_operators = [unary_operators]
237
 
 
251
  y = eval(eval_str)
252
  print("Running on", eval_str)
253
 
254
+ # System-independent paths
255
+ pkg_directory = Path(__file__).parents[1] / 'julia'
256
+ pkg_filename = pkg_directory / "sr.jl"
257
+ operator_filename = pkg_directory / "operators.jl"
258
+
259
+ tmpdir = Path(tempfile.mkdtemp(dir=tempdir))
260
+ hyperparam_filename = tmpdir / f'hyperparams.jl'
261
+ dataset_filename = tmpdir / f'dataset.jl'
262
+ runfile_filename = tmpdir / f'runfile.jl'
263
+ X_filename = tmpdir / "X.csv"
264
+ y_filename = tmpdir / "y.csv"
265
+ weights_filename = tmpdir / "weights.csv"
266
 
267
  def_hyperparams = ""
268
 
 
294
  elif op == 'mult':
295
  # Make sure the complex expression is in the left side.
296
  if constraints[op][0] == -1:
297
+ continue
298
  elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
299
  constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
300
 
 
319
  first = False
320
  constraints_str += "]"
321
 
322
+ def_hyperparams += f"""include("{_escape_filename(operator_filename)}")
 
323
  {constraints_str}
324
  const binops = {'[' + ', '.join(binary_operators) + ']'}
325
  const unaops = {'[' + ', '.join(unary_operators) + ']'}
 
395
 
396
  def_hyperparams += op_runner
397
 
398
+ def_datasets = """using DelimitedFiles"""
 
 
 
 
399
 
400
+ np.savetxt(X_filename, X, delimiter=',')
401
+ np.savetxt(y_filename, y, delimiter=',')
402
+ if weights is not None:
403
+ np.savetxt(weights_filename, weights, delimiter=',')
404
+
405
+ def_datasets += f"""
406
+ const X = readdlm("{_escape_filename(X_filename)}", ',', Float32, '\\n')
407
+ const y = readdlm("{_escape_filename(y_filename)}", ',', Float32, '\\n')"""
408
 
409
  if weights is not None:
410
+ def_datasets += f"""
411
+ const weights = readdlm("{_escape_filename(weights_filename)}", ',', Float32, '\\n')"""
 
412
 
413
  if use_custom_variable_names:
414
  def_hyperparams += f"""
415
  const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
416
 
417
+ with open(hyperparam_filename, 'w') as f:
418
  print(def_hyperparams, file=f)
419
 
420
+ with open(dataset_filename, 'w') as f:
421
  print(def_datasets, file=f)
422
 
423
+ with open(runfile_filename, 'w') as f:
424
+ print(f'@everywhere include("{_escape_filename(hyperparam_filename)}")', file=f)
425
+ print(f'@everywhere include("{_escape_filename(dataset_filename)}")', file=f)
426
+ print(f'@everywhere include("{_escape_filename(pkg_filename)}")', file=f)
427
  print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
428
  print(f'rmprocs(nprocs)', file=f)
429
 
 
431
  command = [
432
  f'julia', f'-O{julia_optimization:d}',
433
  f'-p', f'{procs}',
434
+ str(runfile_filename),
435
  ]
436
  if timeout is not None:
437
  command = [f'timeout', f'{timeout}'] + command
 
460
  print("Killing process... will return when done.")
461
  process.kill()
462
 
463
+ if delete_tempfiles:
464
+ shutil.rmtree(tmpdir)
465
+
466
  return get_hof()
467
 
468
 
 
574
  if equations is None: equations = get_hof()
575
  return best_row(equations)['lambda_format']
576
 
577
+ def _escape_filename(filename):
578
+ """Turns a file into a string representation with correctly escaped backslashes"""
579
+ repr = str(filename)
580
+ repr = repr.replace('\\', '\\\\')
581
+ return repr
setup.py CHANGED
@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
5
 
6
  setuptools.setup(
7
  name="pysr", # Replace with your own username
8
- version="0.3.28",
9
  author="Miles Cranmer",
10
  author_email="[email protected]",
11
  description="Simple and efficient symbolic regression",
 
5
 
6
  setuptools.setup(
7
  name="pysr", # Replace with your own username
8
+ version="0.3.36",
9
  author="Miles Cranmer",
10
  author_email="[email protected]",
11
  description="Simple and efficient symbolic regression",
test/travis.sh DELETED
@@ -1,5 +0,0 @@
1
- #!/bin/bash
2
- sudo python3 -m pip install numpy pandas &&
3
- sudo python3 setup.py install &&
4
- python3 test/test.py
5
-