Spaces:
Sleeping
Eureqa.jl
Symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution and simulated annealing.
Installation
Install Julia. Then, at the command line,
install the Optim
package via: julia -e 'import Pkg; Pkg.add("Optim")'
.
For python, you need to have Python 3, numpy, and pandas installed.
Running:
You can either call the program by calling the eureqa
function from eureqa.py
,
or execute the program from the command line with, for example:
python eureqa.py --threads 8 --binary-operators plus mult pow --npop 200
Here is the full list of arguments:
usage: eureqa.py [-h] [--threads THREADS] [--parsimony PARSIMONY]
[--alpha ALPHA] [--maxsize MAXSIZE]
[--niterations NITERATIONS] [--npop NPOP]
[--ncyclesperiteration NCYCLESPERITERATION] [--topn TOPN]
[--fractionReplacedHof FRACTIONREPLACEDHOF]
[--fractionReplaced FRACTIONREPLACED] [--migration MIGRATION]
[--hofMigration HOFMIGRATION]
[--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS]
[--annealing ANNEALING] [--equation_file EQUATION_FILE]
[--test TEST]
[--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]]
[--unary-operators UNARY_OPERATORS]
optional arguments:
-h, --help show this help message and exit
--threads THREADS Number of threads (default: 4)
--parsimony PARSIMONY
How much to punish complexity (default: 0.001)
--alpha ALPHA Scaling of temperature (default: 10)
--maxsize MAXSIZE Max size of equation (default: 20)
--niterations NITERATIONS
Number of total migration periods (default: 20)
--npop NPOP Number of members per population (default: 100)
--ncyclesperiteration NCYCLESPERITERATION
Number of evolutionary cycles per migration (default:
5000)
--topn TOPN How many best species to distribute from each
population (default: 10)
--fractionReplacedHof FRACTIONREPLACEDHOF
Fraction of population to replace with hall of fame
(default: 0.1)
--fractionReplaced FRACTIONREPLACED
Fraction of population to replace with best from other
populations (default: 0.1)
--migration MIGRATION
Whether to migrate (default: True)
--hofMigration HOFMIGRATION
Whether to have hall of fame migration (default: True)
--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS
Whether to use classical optimization on constants
before every migration (doesn't impact performance
that much) (default: True)
--annealing ANNEALING
Whether to use simulated annealing (default: True)
--equation_file EQUATION_FILE
File to dump best equations to (default:
hall_of_fame.csv)
--test TEST Which test to run (default: simple1)
--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]
Binary operators. Make sure they are defined in
operators.jl (default: ['plus', 'mult'])
--unary-operators UNARY_OPERATORS
Unary operators. Make sure they are defined in
operators.jl (default: ['exp', 'sin', 'cos'])
Modification
You can add more operators in operators.jl
, or use default
Julia ones. Make sure all operators are defined for scalar Float32
.
Then just specify the operator names in your call, as above.
You can also change the dataset learned on by passing in X
and y
as
numpy arrays to eureqa(...)
.
One can also adjust the relative probabilities of each operation here,
inside eureqa.jl
:
weights = [8, 1, 1, 1, 0.1, 0.5, 2]
for:
- Perturb constant
- Mutate operator
- Append a node
- Delete a subtree
- Simplify equation
- Randomize completely
- Do nothing
TODO
- Hyperparameter tune
- Add mutation for constant<->variable
- Create a benchmark for accuracy
- Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
- Performance:
- Use an enum for functions instead of storing them?
- Current most expensive operations:
- deepcopy() before the mutate, to see whether to accept or not.
- Seems like its necessary right now. But still by far the slowest option.
- Calculating the loss function - there is duplicate calculations happening.
- Declaration of the weights array every iteration
- deepcopy() before the mutate, to see whether to accept or not.
- Add interface for either defining an operation to learn, or loading in arbitrary dataset.
- Could just write out the dataset in julia, or load it.
- Create a Python interface
- Explicit constant optimization on hall-of-fame
- Create method to find and return all constants, from left to right
- Create method to find and set all constants, in same order
- Pull up some optimization algorithm and add it. Keep the package small!
- Create a benchmark for speed
- Simplify subtrees with only constants beneath them. Or should I? Maybe randomly simplify sometimes?
- Record hall of fame
- Optionally (with hyperparameter) migrate the hall of fame, rather than current bests
- Test performance of reduced precision integers
- No effect
- Create struct to pass through all hyperparameters, instead of treating as constants
- Make sure doesn't affect performance