PySR searches for symbolic expressions which optimize a particular objective.
https://github.com/MilesCranmer/PySR/assets/7593028/c8511a49-b408-488f-8f18-b1749078268f
# PySR: High-Performance Symbolic Regression in Python and Julia
| **Docs** | **Forums** | **Paper** | **colab demo** |
|:---:|:---:|:---:|:---:|
|[![Documentation](https://github.com/MilesCranmer/PySR/actions/workflows/docs.yml/badge.svg)](https://astroautomata.com/PySR/)|[![Discussions](https://img.shields.io/badge/discussions-github-informational)](https://github.com/MilesCranmer/PySR/discussions)|[![Paper](https://img.shields.io/badge/arXiv-2305.01582-b31b1b)](https://arxiv.org/abs/2305.01582)|[![Colab](https://img.shields.io/badge/colab-notebook-yellow)](https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb)|
| **pip** | **conda** | **Stats** |
| :---: | :---: | :---: |
|[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)|[![Conda Version](https://img.shields.io/conda/vn/conda-forge/pysr.svg)](https://anaconda.org/conda-forge/pysr)|
pip: [![Downloads](https://static.pepy.tech/badge/pysr)](https://pypi.org/project/pysr/)
conda: [![Anaconda-Server Badge](https://anaconda.org/conda-forge/pysr/badges/downloads.svg)](https://anaconda.org/conda-forge/pysr)
|
If you find PySR useful, please cite the paper [arXiv:2305.01582](https://arxiv.org/abs/2305.01582).
If you've finished a project with PySR, please submit a PR to showcase your work on the [research showcase page](https://astroautomata.com/PySR/papers)!
**Contents**:
- [Contributors](#contributors-)
- [Why PySR?](#why-pysr)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [โ Documentation](https://astroautomata.com/PySR)
### Test status
| **Linux** | **Windows** | **macOS** |
|---|---|---|
|[![Linux](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml)|[![Windows](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml)|[![macOS](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml)|
| **Docker** | **Conda** | **Coverage** |
|[![Docker](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml)|[![conda-forge](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml)|[![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/PySR/badge.svg?branch=master&service=github)](https://coveralls.io/github/MilesCranmer/PySR)|
## Why PySR?
PySR is an open-source tool for *Symbolic Regression*: a machine learning
task where the goal is to find an interpretable symbolic expression that optimizes some objective.
Over a period of several years, PySR has been engineered from the ground up
to be (1) as high-performance as possible,
(2) as configurable as possible, and (3) easy to use.
PySR is developed alongside the Julia library [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl),
which forms the powerful search engine of PySR.
The details of these algorithms are described in the [PySR paper](https://arxiv.org/abs/2305.01582).
Symbolic regression works best on low-dimensional datasets, but
one can also extend these approaches to higher-dimensional
spaces by using "*Symbolic Distillation*" of Neural Networks, as explained in
[2006.11287](https://arxiv.org/abs/2006.11287), where we apply
it to N-body problems. Here, one essentially uses
symbolic regression to convert a neural net
to an analytic equation. Thus, these tools simultaneously present
an explicit and powerful way to interpret deep neural networks.
## Installation
### Pip
You can install PySR with pip:
```bash
pip install pysr
```
Julia dependencies will be installed at first import.
### Conda
Similarly, with conda:
```bash
conda install -c conda-forge pysr
```
### Docker
You can also use the `Dockerfile` to install PySR in a docker container
1. Clone this repo.
2. Within the repo's directory, build the docker container:
```bash
docker build -t pysr .
```
3. You can then start the container with an IPython execution with:
```bash
docker run -it --rm pysr ipython
```
For more details, see the [docker section](#docker).
---
### Troubleshooting
One issue you might run into can result in a hard crash at import with
a message like "`GLIBCXX_...` not found". This is due to another one of the Python dependencies
loading an incorrect `libstdc++` library. To fix this, you should modify your
`LD_LIBRARY_PATH` variable to reference the Julia libraries. For example, if the Julia
version of `libstdc++.so` is located in `$HOME/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/lib/julia/`
(which likely differs on your system!), you could add:
```
export LD_LIBRARY_PATH=$HOME/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/lib/julia/:$LD_LIBRARY_PATH
```
to your `.bashrc` or `.zshrc` file.
## Quickstart
You might wish to try the interactive tutorial [here](https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb), which uses the notebook in `examples/pysr_demo.ipynb`.
In practice, I highly recommend using IPython rather than Jupyter, as the printing is much nicer.
Below is a quick demo here which you can paste into a Python runtime.
First, let's import numpy to generate some test data:
```python
import numpy as np
X = 2 * np.random.randn(100, 5)
y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5
```
We have created a dataset with 100 datapoints, with 5 features each.
The relation we wish to model is $2.5382 \cos(x_3) + x_0^2 - 0.5$.
Now, let's create a PySR model and train it.
PySR's main interface is in the style of scikit-learn:
```python
from pysr import PySRRegressor
model = PySRRegressor(
niterations=40, # < Increase me for better results
binary_operators=["+", "*"],
unary_operators=[
"cos",
"exp",
"sin",
"inv(x) = 1/x",
# ^ Custom operator (julia syntax)
],
extra_sympy_mappings={"inv": lambda x: 1 / x},
# ^ Define operator for SymPy as well
elementwise_loss="loss(prediction, target) = (prediction - target)^2",
# ^ Custom loss function (julia syntax)
)
```
This will set up the model for 40 iterations of the search code, which contains hundreds of thousands of mutations and equation evaluations.
Let's train this model on our dataset:
```python
model.fit(X, y)
```
Internally, this launches a Julia process which will do a multithreaded search for equations to fit the dataset.
Equations will be printed during training, and once you are satisfied, you may
quit early by hitting 'q' and then \