Spaces:
Running
Eureqa.jl
Symbolic regression built on Eureqa, and interfaced by Python. Uses regularized evolution and simulated annealing.
Running:
You can execute the program from the command line with, for example:
python eureqa.py --threads 8 --binary-operators plus mult
You can see all hyperparameters in the function eureqa
inside eureqa.py
.
This function generates Julia code which is then executed
by eureqa.jl
and paralleleureqa.jl
.
Modification
You can change the binary and unary operators in hyperparams.jl
here:
const binops = [plus, mult]
const unaops = [sin, cos, exp];
E.g., you can add the function for powers with:
pow(x::Float32, y::Float32)::Float32 = sign(x)*abs(x)^y
const binops = [plus, mult, pow]
You can change the dataset here:
const X = convert(Array{Float32, 2}, randn(100, 5)*2)
# Here is the function we want to learn (x2^2 + cos(x3) - 5)
const y = convert(Array{Float32, 1}, ((cx,)->cx^2).(X[:, 2]) + cos.(X[:, 3]) .- 5)
by either loading in a dataset, or modifying the definition of y
.
(The .
are are used for vectorization of a scalar function)
Hyperparameters
Annealing allows each evolutionary cycle to turn down the exploration rate over time: at the end (temperature 0), it will only select solutions better than existing solutions.
The following parameter, parsimony, is how much to punish complex solutions:
const parsimony = 0.01
Finally, the following determins how much to scale temperature by (T between 0 and 1).
const alpha = 10.0
Larger alpha means more exploration.
One can also adjust the relative probabilities of each operation here:
weights = [8, 1, 1, 1, 0.1, 0.5, 2]
for:
- Perturb constant
- Mutate operator
- Append a node
- Delete a subtree
- Simplify equation
- Randomize completely
- Do nothing
TODO
- Hyperparameter tune
- Add mutation for constant<->variable
- Create a Python interface
- Create a benchmark for accuracy
- Create struct to pass through all hyperparameters, instead of treating as constants
- Make sure doesn't affect performance
- Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
- Performance:
- Use an enum for functions instead of storing them?
- Current most expensive operations:
- deepcopy() before the mutate, to see whether to accept or not.
- Seems like its necessary right now. But still by far the slowest option.
- Calculating the loss function - there is duplicate calculations happening.
- Declaration of the weights array every iteration
- deepcopy() before the mutate, to see whether to accept or not.
- Explicit constant optimization on hall-of-fame
- Create method to find and return all constants, from left to right
- Create method to find and set all constants, in same order
- Pull up some optimization algorithm and add it. Keep the package small!
- Create a benchmark for speed
- Simplify subtrees with only constants beneath them. Or should I? Maybe randomly simplify sometimes?
- Record hall of fame
- Optionally (with hyperparameter) migrate the hall of fame, rather than current bests
- Test performance of reduced precision integers
- No effect