MilesCranmer commited on
Commit
1e1bd80
1 Parent(s): 5527c70

Add docs on dimensional constraints

Browse files
Files changed (1) hide show
  1. docs/examples.md +89 -1
docs/examples.md CHANGED
@@ -433,9 +433,97 @@ equal to:
433
  $\frac{x_0^2 x_1 - 2.0000073}{x_2^2 - 1.0000019}$, which
434
  is nearly the same as the true equation!
435
 
 
436
 
 
 
 
437
 
438
- ## 10. Additional features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
439
 
440
  For the many other features available in PySR, please
441
  read the [Options section](options.md).
 
433
  $\frac{x_0^2 x_1 - 2.0000073}{x_2^2 - 1.0000019}$, which
434
  is nearly the same as the true equation!
435
 
436
+ ## 10. Dimensional constraints
437
 
438
+ One other feature we can exploit is dimensional analysis.
439
+ Say that we know the physical units of each feature and output,
440
+ and we want to find an expression that is dimensionally consistent.
441
 
442
+ We can do this as follows, using `DynamicQuantities.jl` to assign units,
443
+ passing a string specifying the units for each variable.
444
+ First, let's make some data on Newton's law of gravitation, using
445
+ astropy for units:
446
+
447
+ ```python
448
+ import numpy as np
449
+ from astropy import units as u, constants as const
450
+
451
+ M = (np.random.rand(100) + 0.1) * const.M_sun
452
+ m = 100 * (np.random.rand(100) + 0.1) * u.kg
453
+ r = (np.random.rand(100) + 0.1) * const.R_earth
454
+ G = const.G
455
+
456
+ F = G * M * m / r**2
457
+ ```
458
+
459
+ We can see the units of `F` with `F.unit`.
460
+
461
+ Now, let's create our model.
462
+ Since this data has such a large dynamic range,
463
+ let's also create a custom loss function
464
+ that looks at the error in log-space:
465
+
466
+ ```python
467
+ loss = """function loss_fnc(prediction, target)
468
+ scatter_loss = abs(log((abs(prediction)+1e-20) / (abs(target)+1e-20)))
469
+ sign_loss = 10 * (sign(prediction) - sign(target))^2
470
+ return scatter_loss + sign_loss
471
+ end
472
+ """
473
+ ```
474
+
475
+ Now let's define our model:
476
+
477
+ ```python
478
+ model = PySRRegressor(
479
+ binary_operators=["+", "-", "*", "/"],
480
+ unary_operators=["square"],
481
+ loss=loss,
482
+ complexity_of_constants=2,
483
+ maxsize=25,
484
+ niterations=100,
485
+ populations=50,
486
+ # Amount to penalize dimensional violations:
487
+ dimensional_constraint_penalty=10**5,
488
+ )
489
+ ```
490
+
491
+ and fit it, passing the unit information.
492
+ To do this, we need to use the format of [DynamicQuantities.jl](https://symbolicml.org/DynamicQuantities.jl/dev/#Usage).
493
+
494
+ ```python
495
+ # Get numerical arrays to fit:
496
+ X = pd.DataFrame(dict(
497
+ M=M.value,
498
+ m=m.value,
499
+ r=r.value,
500
+ ))
501
+ y = F.value
502
+
503
+ model.fit(
504
+ X,
505
+ y,
506
+ X_units=["Constants.M_sun", "kg", "Constants.R_earth"],
507
+ y_units="kg * m / s^2"
508
+ )
509
+ ```
510
+
511
+ You can observe that all expressions with a loss under
512
+ our penalty are dimensionally consistent!
513
+ (The `"[⋅]"` indicates free units in a constant, which can cancel out other units in the expression.)
514
+ For example,
515
+
516
+ ```julia
517
+ "y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅])"
518
+ ```
519
+
520
+ would indicate that the expression is dimensionally consistent, with
521
+ a constant `"2.6353e-22[m s⁻²]"`.
522
+
523
+ Note that this expression has a large dynamic range so may be difficult to find. Consider searching with a larger `niterations` if needed.
524
+
525
+
526
+ ## 11. Additional features
527
 
528
  For the many other features available in PySR, please
529
  read the [Options section](options.md).