Spaces:

MilesCranmer
/

PySR

Sleeping

App Files Files Community

Dhananjay Ashok commited on Jan 17, 2021

Commit

7be8652

unverified ·

2 Parent(s): 70dedf9 1175632

Merge pull request #1 from MilesCranmer/master

Browse files

Files changed (10) hide show

.gitignore +1 -0
.travis.yml +23 -11
README.md +9 -4
TODO.md +7 -2
hyperparamopt.py → benchmarks/hyperparamopt.py +18 -34
docs/options.md +1 -1
julia/sr.jl +19 -8
pysr/sr.py +54 -26
setup.py +1 -1
test/travis.sh +0 -5

.gitignore CHANGED Viewed

@@ -1,6 +1,7 @@
 .dataset*.jl
 .hyperparams*.jl
 *.csv
 performance*txt
 *.out
 trials*

 .dataset*.jl
 .hyperparams*.jl
 *.csv
+*.bkup
 performance*txt
 *.out
 trials*

.travis.yml CHANGED Viewed

@@ -1,20 +1,32 @@
 language: julia
-os: linux
-dist: bionic
 julia:
     - 1
-addons:
-    apt:
-        packages:
-            - python3-pip
-            - python3-setuptools
 before_script:
-    - export PATH=$HOME/.local/bin:$PATH
 script:
-    - julia --color=yes -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
-    - ./test/travis.sh

 language: julia
 julia:
     - 1
+jobs:
+  include:
+    - name: "Linux"
+      os: linux
+      dist: bionic
+      before_install: sudo apt-get -y install python3-pip python3-setuptools
+      env: PY=python3 SETUPPREFIX="--user"
+    - name: "macOS"
+      os: osx
+      before_install: python3 --version; pip3 --version; sw_vers
+      env: PY=python3
+    - name: "Windows"
+      os: windows
+      before_install:
+        - choco install python --version 3.8.0
+        - python -m pip install --upgrade pip
+      env: PATH=/c/Python38:/c/Python38/Scripts:$PATH PY=python
+install: pip3 install --upgrade pip
 before_script:
+    - julia --color=yes -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
 script:
+    - pip3 install numpy pandas
+    - $PY setup.py install $SETUPPREFIX
+    - PATH=$HOME/.local/bin:$PATH $PY test/test.py

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # [PySR.jl](https://github.com/MilesCranmer/PySR)
 [![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
 [![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
 [![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
@@ -47,10 +49,11 @@ then instructions for [mac](https://julialang.org/downloads/platform/#macos)
 and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
 (Don't use the `conda-forge` version; it doesn't seem to work properly.)
 Then, at the command line,
-install the `Optim` and `SpecialFunctions` packages via:
 ```bash
-julia -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
 ```
 For python, you need to have Python 3, numpy, sympy, and pandas installed.
@@ -73,8 +76,10 @@ y = 2*np.cos(X[:, 3]) + X[:, 0]**2 - 2
 # Learn equations
 equations = pysr(X, y, niterations=5,
-        binary_operators=["plus", "mult"],
-        unary_operators=["cos", "exp", "sin"])
 ...# (you can use ctl-c to exit early)

 # [PySR.jl](https://github.com/MilesCranmer/PySR)
+(pronounced like *py* as in python, and then *sur* as in surface)
 [![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
 [![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
 [![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
 and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
 (Don't use the `conda-forge` version; it doesn't seem to work properly.)
 Then, at the command line,
+install and precompile the `Optim` and `SpecialFunctions`
+packages via:
 ```bash
+julia -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
 ```
 For python, you need to have Python 3, numpy, sympy, and pandas installed.
 # Learn equations
 equations = pysr(X, y, niterations=5,
+    binary_operators=["plus", "mult"],
+    unary_operators=[
+      "cos", "exp", "sin", #Pre-defined library of operators (see https://pysr.readthedocs.io/en/latest/docs/operators/)
+      "inv(x) = 1/x"]) # Define your own operator! (Julia syntax)
 ...# (you can use ctl-c to exit early)

TODO.md CHANGED Viewed

@@ -58,19 +58,23 @@
 - [x] Consider printing output sorted by score, not by complexity.
 - [x] Increase max complexity slowly over time up to the actual max.
 - [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
 - [ ] Sort these todo lists by priority
 ## Feature ideas
-- [ ] Do printing from Python side. Then we can do simplification and pretty-printing.
 - [ ] Cross-validation
-- [ ] Sympy printing
 - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
 - [ ] Add function to plot equations
 - [ ] Refresh screen rather than dumping to stdout?
 - [ ] Add ability to save state from python
 - [ ] Additional degree operators?
 - [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
 - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
 - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
 - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
@@ -100,6 +104,7 @@
 - [ ] How hard is it to turn the recursive array evaluation into a for loop?
 - [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
 - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
     - Ongoing in cluster branch
 - [ ] Performance: try inling things?

 - [x] Consider printing output sorted by score, not by complexity.
 - [x] Increase max complexity slowly over time up to the actual max.
 - [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
+- [x] Do printing from Python side. Then we can do simplification and pretty-printing.
+- [x] Sympy printing
 - [ ] Sort these todo lists by priority
 ## Feature ideas
+- [ ] Other default losses (e.g., abs, other likelihoods, or just allow user to pass this as a string).
+- [ ] Other dtypes available
+- [ ] NDSA-II
 - [ ] Cross-validation
 - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
 - [ ] Add function to plot equations
 - [ ] Refresh screen rather than dumping to stdout?
 - [ ] Add ability to save state from python
 - [ ] Additional degree operators?
 - [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
+    - Much easier option: simply flatten the output vector, and set the index as another input feature. The equation learned will be a single equation containing indices as a feature.
 - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
 - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
 - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
 - [ ] How hard is it to turn the recursive array evaluation into a for loop?
 - [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
+    - in array branch
 - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
     - Ongoing in cluster branch
 - [ ] Performance: try inling things?

hyperparamopt.py → benchmarks/hyperparamopt.py RENAMED Viewed

@@ -34,58 +34,46 @@ def run_trial(args):
     """
     print("Running on", args)
-    for key in 'niterations npop'.split(' '):
-        args[key] = int(args[key])
-    total_steps = 10*100*1000
-    niterations = args['niterations']
-    npop = args['npop']
-    if niterations == 0 or npop == 0:
-        print("Bad parameters")
-        return {'status': 'ok', 'loss': np.inf}
-    args['ncyclesperiteration'] = int(total_steps / (niterations * npop))
     args['topn'] = 10
-    args['parsimony'] = 1e-3
     args['annealing'] = True
     if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
         print("Bad parameters")
         return {'status': 'ok', 'loss': np.inf}
     args['weightDoNothing'] = 1.0
-    maxTime = 30
-    ntrials = 2
-    equation_file = f'.hall_of_fame_{np.random.rand():f}.csv'
     with temp_seed(0):
-        X = np.random.randn(100, 5)*3
-    eval_str = ["np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
-    "np.sign(X[:, 2])*np.abs(X[:, 2])**3.5 + 1/(np.abs(X[:, 0])+1)",
     "np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
-    "1.0 + 3*X[:, 0]**2 - 0.5*X[:, 0]**3 + 0.1*X[:, 0]**4",
-    "(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)"]
     print(f"Starting", str(args))
     try:
         trials = []
-        for i in range(3, 6):
             print(f"Starting test {i}")
             for j in range(ntrials):
                 print(f"Starting trial {j}")
-                trial = pysr.pysr(
-                    test=f"simple{i}",
                     procs=4,
                     binary_operators=["plus", "mult", "pow", "div"],
-                    unary_operators=["cos", "exp", "sin", "loga", "abs"],
-                    equation_file=equation_file,
-                    timeout=maxTime,
                     maxsize=25,
-                    verbosity=0,
                     **args)
                 if len(trial) == 0: raise ValueError
                 trials.append(
@@ -109,8 +97,6 @@ def run_trial(args):
 space = {
-    'niterations': hp.qlognormal('niterations', np.log(10), 1.0, 1),
-    'npop': hp.qlognormal('npop', np.log(100), 1.0, 1),
     'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
     'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
     'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
@@ -126,8 +112,6 @@ space = {
 ################################################################################
 def merge_trials(trials1, trials2_slice):
     """Merge two hyperopt trials objects

     """
     print("Running on", args)
+    args['niterations'] = 100
+    args['npop'] = 100
+    args['ncyclesperiteration'] = 1000
     args['topn'] = 10
+    args['parsimony'] = 0.0
+    args['useFrequency'] = True
     args['annealing'] = True
     if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
         print("Bad parameters")
         return {'status': 'ok', 'loss': np.inf}
     args['weightDoNothing'] = 1.0
+    ntrials = 3
     with temp_seed(0):
+        X = np.random.randn(100, 10)*3
+    eval_str = [
+    "np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
     "np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
+    "(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)",
+    "X[:, 0] * np.sin(2*np.pi * (X[:, 1] * X[:, 2] - X[:, 3] / X[:, 4])) + 3.0"
+    ]
     print(f"Starting", str(args))
     try:
         trials = []
+        for i in range(len(eval_str)):
             print(f"Starting test {i}")
             for j in range(ntrials):
                 print(f"Starting trial {j}")
+                y = eval(eval_str[i])
+                trial = pysr.pysr(X, y,
                     procs=4,
+                    populations=20,
                     binary_operators=["plus", "mult", "pow", "div"],
+                    unary_operators=["cos", "exp", "sin", "logm", "abs"],
                     maxsize=25,
+                    constraints={'pow': (-1, 1)},
                     **args)
                 if len(trial) == 0: raise ValueError
                 trials.append(
 space = {
     'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
     'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
     'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
 ################################################################################
 def merge_trials(trials1, trials2_slice):
     """Merge two hyperopt trials objects

docs/options.md CHANGED Viewed

@@ -22,7 +22,7 @@ These are described below
 The program will output a pandas DataFrame containing the equations,
 mean square error, and complexity. It will also dump to a csv
 at the end of every iteration,
-which is `hall_of_fame.csv` by default. It also prints the
 equations to stdout.
 ## Operators

 The program will output a pandas DataFrame containing the equations,
 mean square error, and complexity. It will also dump to a csv
 at the end of every iteration,
+which is `hall_of_fame_{date_time}.csv` by default. It also prints the
 equations to stdout.
 ## Operators

julia/sr.jl CHANGED Viewed

@@ -1086,7 +1086,12 @@ function fullRun(niterations::Integer;
     end
     println("Started!")
     cycles_complete = npopulations * niterations
-    curmaxsize += 1
     last_print_time = time()
     num_equations = 0.0
@@ -1212,15 +1217,19 @@ function fullRun(niterations::Integer;
                 deleteat!(equation_speed, 1)
             end
             average_speed = sum(equation_speed)/length(equation_speed)
-            @printf("\n")
-            @printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
-            @printf("Hall of Fame:\n")
-            @printf("-----------------------------------------\n")
-            @printf("%-10s  %-8s   %-8s  %-8s\n", "Complexity", "MSE", "Score", "Equation")
             curMSE = baselineMSE
-            @printf("%-10d  %-8.3e  %-8.3e  %-.f\n", 0, curMSE, 0f0, avgy)
             lastMSE = curMSE
             lastComplexity = 0
             for size=1:actualMaxsize
                 if hallOfFame.exists[size]
@@ -1246,7 +1255,9 @@ function fullRun(niterations::Integer;
                         delta_c = size - lastComplexity
                         delta_l_mse = log(curMSE/lastMSE)
                         score = convert(Float32, -delta_l_mse/delta_c)
-                        @printf("%-10d  %-8.3e  %-8.3e  %-s\n" , size, curMSE, score, stringTree(member.tree))
                         lastMSE = curMSE
                         lastComplexity = size
                     end

     end
     println("Started!")
     cycles_complete = npopulations * niterations
+    if warmupMaxsize != 0
+        curmaxsize += 1
+        if curmaxsize > maxsize
+            curmaxsize = maxsize
+        end
+    end
     last_print_time = time()
     num_equations = 0.0
                 deleteat!(equation_speed, 1)
             end
             average_speed = sum(equation_speed)/length(equation_speed)
             curMSE = baselineMSE
             lastMSE = curMSE
             lastComplexity = 0
+            if verbosity > 0
+                @printf("\n")
+                @printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
+                cycles_elapsed = npopulations * niterations - cycles_complete
+                @printf("Progress: %d / %d total iterations (%.3f%%)\n", cycles_elapsed, npopulations * niterations, 100.0*cycles_elapsed/(npopulations*niterations))
+                @printf("Hall of Fame:\n")
+                @printf("-----------------------------------------\n")
+                @printf("%-10s  %-8s   %-8s  %-8s\n", "Complexity", "MSE", "Score", "Equation")
+                @printf("%-10d  %-8.3e  %-8.3e  %-.f\n", 0, curMSE, 0f0, avgy)
+            end
             for size=1:actualMaxsize
                 if hallOfFame.exists[size]
                         delta_c = size - lastComplexity
                         delta_l_mse = log(curMSE/lastMSE)
                         score = convert(Float32, -delta_l_mse/delta_c)
+                        if verbosity > 0
+                            @printf("%-10d  %-8.3e  %-8.3e  %-s\n" , size, curMSE, score, stringTree(member.tree))
+                        end
                         lastMSE = curMSE
                         lastComplexity = size
                     end

pysr/sr.py CHANGED Viewed

@@ -7,6 +7,11 @@ import pandas as pd
 import sympy
 from sympy import sympify, Symbol, lambdify
 import subprocess
 global_equation_file = 'hall_of_fame.csv'
 global_n_features = None
@@ -79,7 +84,7 @@ def pysr(X=None, y=None, weights=None,
             nrestarts=3,
             timeout=None,
             extra_sympy_mappings={},
-            equation_file='hall_of_fame.csv',
             test='simple1',
             verbosity=1e9,
             maxsize=20,
@@ -92,6 +97,8 @@ def pysr(X=None, y=None, weights=None,
             warmupMaxsize=0,
             constraints={},
             useFrequency=False,
             limitPowComplexity=False, #deprecated
             threads=None, #deprecated
             julia_optimization=3,
@@ -178,6 +185,8 @@ def pysr(X=None, y=None, weights=None,
         and use that instead of parsimony to explore equation space. Will
         naturally find equations of all complexities.
     :param julia_optimization: int, Optimization level (0, 1, 2, 3)
     :returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
         (as strings).
@@ -188,6 +197,9 @@ def pysr(X=None, y=None, weights=None,
         raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
     if maxdepth is None:
         maxdepth = maxsize
     if isinstance(X, pd.DataFrame):
         variable_names = list(X.columns)
@@ -215,13 +227,11 @@ def pysr(X=None, y=None, weights=None,
         X = X[:, selection]
         if use_custom_variable_names:
-            variable_names = variable_names[selection]
     if populations is None:
         populations = procs
-    rand_string = f'{"".join([str(np.random.rand())[2] for i in range(20)])}'
     if isinstance(binary_operators, str): binary_operators = [binary_operators]
     if isinstance(unary_operators, str): unary_operators = [unary_operators]
@@ -241,7 +251,18 @@ def pysr(X=None, y=None, weights=None,
         y = eval(eval_str)
         print("Running on", eval_str)
-    pkg_directory = '/'.join(__file__.split('/')[:-2] + ['julia'])
     def_hyperparams = ""
@@ -273,7 +294,7 @@ def pysr(X=None, y=None, weights=None,
         elif op == 'mult':
             # Make sure the complex expression is in the left side.
             if constraints[op][0] == -1:
-                continue
             elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
                 constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
@@ -298,8 +319,7 @@ const bin_constraints = ["""
         first = False
     constraints_str += "]"
-    def_hyperparams += f"""include("{pkg_directory}/operators.jl")
 {constraints_str}
 const binops = {'[' + ', '.join(binary_operators) + ']'}
 const unaops = {'[' + ', '.join(unary_operators) + ']'}
@@ -375,34 +395,35 @@ end"""
     def_hyperparams += op_runner
-    if X.shape[1] == 1:
-        X_str = 'transpose([' + str(X.tolist()).replace(']', '').replace(',', '').replace('[', '') + '])'
-    else:
-        X_str = str(X.tolist()).replace('],', '];').replace(',', '')
-    y_str = str(y.tolist())
-    def_datasets = """const X = convert(Array{Float32, 2}, """f"{X_str})""""
-const y = convert(Array{Float32, 1}, """f"{y_str})"
     if weights is not None:
-        weight_str = str(weights.tolist())
-        def_datasets += """
-const weights = convert(Array{Float32, 1}, """f"{weight_str})"
     if use_custom_variable_names:
         def_hyperparams += f"""
 const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
-    with open(f'/tmp/.hyperparams_{rand_string}.jl', 'w') as f:
         print(def_hyperparams, file=f)
-    with open(f'/tmp/.dataset_{rand_string}.jl', 'w') as f:
         print(def_datasets, file=f)
-    with open(f'/tmp/.runfile_{rand_string}.jl', 'w') as f:
-        print(f'@everywhere include("/tmp/.hyperparams_{rand_string}.jl")', file=f)
-        print(f'@everywhere include("/tmp/.dataset_{rand_string}.jl")', file=f)
-        print(f'@everywhere include("{pkg_directory}/sr.jl")', file=f)
         print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
         print(f'rmprocs(nprocs)', file=f)
@@ -410,7 +431,7 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
     command = [
         f'julia', f'-O{julia_optimization:d}',
         f'-p', f'{procs}',
-        f'/tmp/.runfile_{rand_string}.jl',
         ]
     if timeout is not None:
         command = [f'timeout', f'{timeout}'] + command
@@ -439,6 +460,9 @@ const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
         print("Killing process... will return when done.")
         process.kill()
     return get_hof()
@@ -550,4 +574,8 @@ def best_callable(equations=None):
     if equations is None: equations = get_hof()
     return best_row(equations)['lambda_format']

 import sympy
 from sympy import sympify, Symbol, lambdify
 import subprocess
+import tempfile
+import shutil
+from pathlib import Path
+from datetime import datetime
 global_equation_file = 'hall_of_fame.csv'
 global_n_features = None
             nrestarts=3,
             timeout=None,
             extra_sympy_mappings={},
+            equation_file=None,
             test='simple1',
             verbosity=1e9,
             maxsize=20,
             warmupMaxsize=0,
             constraints={},
             useFrequency=False,
+            tempdir=None,
+            delete_tempfiles=True,
             limitPowComplexity=False, #deprecated
             threads=None, #deprecated
             julia_optimization=3,
         and use that instead of parsimony to explore equation space. Will
         naturally find equations of all complexities.
     :param julia_optimization: int, Optimization level (0, 1, 2, 3)
+    :param tempdir: str or None, directory for the temporary files
+    :param delete_tempfiles: bool, whether to delete the temporary files after finishing
     :returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
         (as strings).
         raise ValueError("The limitPowComplexity kwarg is deprecated. Use constraints.")
     if maxdepth is None:
         maxdepth = maxsize
+    if equation_file is None:
+        date_time = datetime.now().strftime("%Y-%m-%d_%H%M%S.%f")[:-3]
+        equation_file = 'hall_of_fame_' + date_time + '.csv'
     if isinstance(X, pd.DataFrame):
         variable_names = list(X.columns)
         X = X[:, selection]
         if use_custom_variable_names:
+            variable_names = [variable_names[selection[i]] for i in range(len(selection))]
     if populations is None:
         populations = procs
     if isinstance(binary_operators, str): binary_operators = [binary_operators]
     if isinstance(unary_operators, str): unary_operators = [unary_operators]
         y = eval(eval_str)
         print("Running on", eval_str)
+    # System-independent paths
+    pkg_directory = Path(__file__).parents[1] / 'julia'
+    pkg_filename = pkg_directory / "sr.jl"
+    operator_filename = pkg_directory / "operators.jl"
+    tmpdir = Path(tempfile.mkdtemp(dir=tempdir))
+    hyperparam_filename = tmpdir / f'hyperparams.jl'
+    dataset_filename = tmpdir / f'dataset.jl'
+    runfile_filename = tmpdir / f'runfile.jl'
+    X_filename = tmpdir / "X.csv"
+    y_filename = tmpdir / "y.csv"
+    weights_filename = tmpdir / "weights.csv"
     def_hyperparams = ""
         elif op == 'mult':
             # Make sure the complex expression is in the left side.
             if constraints[op][0] == -1:
+                continue
             elif constraints[op][1] == -1 or constraints[op][0] < constraints[op][1]:
                 constraints[op][0], constraints[op][1] = constraints[op][1], constraints[op][0]
         first = False
     constraints_str += "]"
+    def_hyperparams += f"""include("{_escape_filename(operator_filename)}")
 {constraints_str}
 const binops = {'[' + ', '.join(binary_operators) + ']'}
 const unaops = {'[' + ', '.join(unary_operators) + ']'}
     def_hyperparams += op_runner
+    def_datasets = """using DelimitedFiles"""
+    np.savetxt(X_filename, X, delimiter=',')
+    np.savetxt(y_filename, y, delimiter=',')
+    if weights is not None:
+        np.savetxt(weights_filename, weights, delimiter=',')
+    def_datasets += f"""
+const X = readdlm("{_escape_filename(X_filename)}", ',', Float32, '\\n')
+const y = readdlm("{_escape_filename(y_filename)}", ',', Float32, '\\n')"""
     if weights is not None:
+        def_datasets += f"""
+const weights = readdlm("{_escape_filename(weights_filename)}", ',', Float32, '\\n')"""
     if use_custom_variable_names:
         def_hyperparams += f"""
 const varMap = {'["' + '", "'.join(variable_names) + '"]'}"""
+    with open(hyperparam_filename, 'w') as f:
         print(def_hyperparams, file=f)
+    with open(dataset_filename, 'w') as f:
         print(def_datasets, file=f)
+    with open(runfile_filename, 'w') as f:
+        print(f'@everywhere include("{_escape_filename(hyperparam_filename)}")', file=f)
+        print(f'@everywhere include("{_escape_filename(dataset_filename)}")', file=f)
+        print(f'@everywhere include("{_escape_filename(pkg_filename)}")', file=f)
         print(f'fullRun({niterations:d}, npop={npop:d}, ncyclesperiteration={ncyclesperiteration:d}, fractionReplaced={fractionReplaced:f}f0, verbosity=round(Int32, {verbosity:f}), topn={topn:d})', file=f)
         print(f'rmprocs(nprocs)', file=f)
     command = [
         f'julia', f'-O{julia_optimization:d}',
         f'-p', f'{procs}',
+        str(runfile_filename),
         ]
     if timeout is not None:
         command = [f'timeout', f'{timeout}'] + command
         print("Killing process... will return when done.")
         process.kill()
+    if delete_tempfiles:
+        shutil.rmtree(tmpdir)
     return get_hof()
     if equations is None: equations = get_hof()
     return best_row(equations)['lambda_format']
+def _escape_filename(filename):
+    """Turns a file into a string representation with correctly escaped backslashes"""
+    repr = str(filename)
+    repr = repr.replace('\\', '\\\\')
+    return repr

setup.py CHANGED Viewed

@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
 setuptools.setup(
     name="pysr", # Replace with your own username
-    version="0.3.28",
     author="Miles Cranmer",
     author_email="[email protected]",
     description="Simple and efficient symbolic regression",

 setuptools.setup(
     name="pysr", # Replace with your own username
+    version="0.3.36",
     author="Miles Cranmer",
     author_email="[email protected]",
     description="Simple and efficient symbolic regression",

test/travis.sh DELETED Viewed

@@ -1,5 +0,0 @@
-#!/bin/bash
-sudo python3 -m pip install numpy pandas &&
-    sudo python3 setup.py install &&
-    python3 test/test.py