Spaces:

sebdg
/

ai-cookbook

Running

File size: 5,203 Bytes

e3bf489


## **1. Sigmoid (Logistic)**

**Formula:** σ(x) = 1 / (1 + exp(-x))

**Strengths:** Maps any real-valued number to a value between 0 and 1, making it suitable for binary classification problems.

**Weaknesses:** Saturates (i.e., output values approach 0 or 1) for large inputs, leading to vanishing gradients during backpropagation.

**Usage:** Binary classification, logistic regression.

## **2. Hyperbolic Tangent (Tanh)**

**Formula:** tanh(x) = 2 / (1 + exp(-2x)) - 1

**Strengths:** Similar to sigmoid, but maps to (-1, 1), which can be beneficial for some models.

**Weaknesses:** Also saturates, leading to vanishing gradients.

**Usage:** Similar to sigmoid, but with a larger output range.

## **3. Rectified Linear Unit (ReLU)**

**Formula:** f(x) = max(0, x)

**Strengths:** Computationally efficient, non-saturating, and easy to compute.

**Weaknesses:** Not differentiable at x=0, which can cause issues during optimization.

**Usage:** Default activation function in many deep learning frameworks, suitable for most neural networks.

## **4. Leaky ReLU**

**Formula:** f(x) = max(αx, x), where α is a small constant (e.g., 0.01)

**Strengths:** Similar to ReLU, but allows a small fraction of the input to pass through, helping with dying neurons.

**Weaknesses:** Still non-differentiable at x=0.

**Usage:** Alternative to ReLU, especially when dealing with dying neurons.

## **5. Swish**

**Formula:** f(x) = x \* g(x), where g(x) is a learned function (e.g., sigmoid or ReLU)

**Strengths:** Self-gated, adaptive, and non-saturating.

**Weaknesses:** Computationally expensive, requires additional learnable parameters.

**Usage:** Can be used in place of ReLU or other activations, but may not always outperform them.

## **6. Softmax**

**Formula:** softmax(x) = exp(x) / Σ exp(x)

**Strengths:** Normalizes output to ensure probabilities sum to 1, making it suitable for multi-class classification.

**Weaknesses:** Only suitable for output layers with multiple classes.

**Usage:** Output layer activation for multi-class classification problems.

## **7. Softsign**

**Formula:** f(x) = x / (1 + |x|)

**Strengths:** Similar to sigmoid, but with a more gradual slope.

**Weaknesses:** Not commonly used, may not provide significant benefits over sigmoid or tanh.

**Usage:** Alternative to sigmoid or tanh in certain situations.

## **8. ArcTan**

**Formula:** f(x) = arctan(x)

**Strengths:** Non-saturating, smooth, and continuous.

**Weaknesses:** Not commonly used, may not outperform other activations.

**Usage:** Experimental or niche applications.

## **9. SoftPlus**

**Formula:** f(x) = log(1 + exp(x))

**Strengths:** Smooth, continuous, and non-saturating.

**Weaknesses:** Not commonly used, may not outperform other activations.

**Usage:** Experimental or niche applications.

## **10. Gaussian Error Linear Unit (GELU)**

**Formula:** f(x) = x \* Φ(x), where Φ is the cumulative distribution function of the standard normal distribution

**Strengths:** Non-saturating, smooth, and computationally efficient.

**Weaknesses:** Not as well-studied as ReLU or other activations.

**Usage:** Alternative to ReLU, especially in Bayesian neural networks.

## **11. Mish**

**Formula:** f(x) = x \* tanh(softplus(x))

**Strengths:** Non-saturating, smooth, and computationally efficient.

**Weaknesses:** Not as well-studied as ReLU or other activations.

**Usage:** Alternative to ReLU, especially in computer vision tasks.

## **12. Silu (SiLU)**

**Formula:** f(x) = x \* sigmoid(x)

**Strengths:** Non-saturating, smooth, and computationally efficient.

**Weaknesses:** Not as well-studied as ReLU or other activations.

**Usage:** Alternative to ReLU, especially in computer vision tasks.

## **13. GELU Approximation (GELU Approx.)**

**Formula:** f(x) ≈ 0.5 \* x \* (1 + tanh(√(2/π) \* (x + 0.044715 \* x^3)))

**Strengths:** Fast, non-saturating, and smooth.

**Weaknesses:** Approximation, not exactly equal to GELU.

**Usage:** Alternative to GELU, especially when computational efficiency is crucial.

## **14. SELU (Scaled Exponential Linear Unit)**

**Formula:** f(x) = λ { x if x > 0, α(e^x - 1) if x ≤ 0 }

**Strengths:** Self-normalizing, non-saturating, and computationally efficient.

**Weaknesses:** Requires careful initialization and α tuning.

**Usage:** Alternative to ReLU, especially in deep neural networks.

When choosing an activation function, consider the following:

* **Non-saturation:** Avoid activations that saturate (e.g., sigmoid, tanh) to prevent vanishing gradients.

* **Computational efficiency:** Choose activations that are computationally efficient (e.g., ReLU, Swish) for large models or real-time applications.

* **Smoothness:** Smooth activations (e.g., GELU, Mish) can help with optimization and convergence.

* **Domain knowledge:** Select activations based on the problem domain and desired output (e.g., softmax for multi-class classification).

* **Experimentation:** Try different activations and evaluate their performance on your specific task.