Spaces:
Sleeping
Tuning and Workflow Tips
I give a short guide below on how I like to tune PySR for my applications.
First, my general tips would be to avoid using redundant operators, like how pow
and square
and cube
are equivalent. The fewer operators the better; only use operators you need.
When running PySR, I usually do the following:
I run from IPython on the head node of a slurm cluster. Passing cluster_manager="slurm"
will make PySR set up a run over the entire allocation. I set procs
equal to the total number of cores over my entire allocation.
- Use the default parameters.
- Use only the operators I think it needs and no more.
- Set
niterations
to some very large value, so it just runs for a week until my job finishes. If the equation looks good, I quit the job early. - Increase
populations
to3*num_cores
. - Set
ncyclesperiteration
to maybe5000
or so, until the head node occupation is under10%
. - Set
constraints
andnested_constraints
as strict as possible. These can help quite a bit with exploration. - Set
maxdepth
as strict as possible. - Set
maxsize
a bit larger than the final size you want. e.g., if you want a final equation of size30
, you might set this to35
, so that it has a bit of room to explore. - Set
parsimony
equal to about the minimum loss you would expect, divided by 5-10. e.g., if you expect the final equation to have a loss of0.001
, you might setparsimony=0.0001
. - Set
weight_optimize
to some larger value, maybe0.001
. This is very important ifncyclesperiteration
is large, so that optimization happens more frequently. - Set
turbo
toTrue
. This may or not work, if there's an error just turn it off (some operators are not SIMD-capable). If it does work, it should give you a nice 20% speedup.
Since I am running in IPython, I can just hit "q" to stop the job, tweak the hyperparameters, and then start the search again.
Some things I try out to see if they help:
- Play around with
complexity_of_operators
. Set operators you dislike (e.g.,pow
) to have a larger complexity. - Try setting
adaptive_parsimony_scaling
a bit larger, maybe up to1000
. - Sometimes I try using
warmup_maxsize_by
. This is useful if you find that the search finds a very complex equation very quickly, and then gets stuck. It basically forces it to start at the simpler equations and build up complexity slowly. - Play around with different losses:
i. I typically try
L2DistLoss()
andL1DistLoss()
. L1 loss is more robust to outliers compared to L2, so is often a good choice for a noisy dataset. ii. I might also provide theweights
parameter tofit
if there is some reasonable choice of weighting. For example, maybe I know the signal-to-noise of a particular row ofy
- I would set that SNR equal to the weights. Or, perhaps I do some sort of importance sampling, and weight the rows by importance.
Very rarely I might also try tuning the mutation weights, the crossover probability, or the optimization parameters. I never use denoise
or select_k_features
as I find they aren't very useful.
For large datasets I usually just randomly sample ~1000 points or so. In case all the points matter, I might use batching=True
.
If I find the equations get very complex and I'm not sure if they are numerically precise, I might set precision=64
.
You might also wish to explore the discussions page for more tips, and to see if anyone else has had similar questions.