bayes_opt Optimizer in optimagic

This tutorial demonstrates how to use the "bayes_opt" optimizer in optimagic. To use it, you need to have bayesian-optimization package installed. You can install it with the following command:

pip install bayesian-optimization

When to use Bayesian Optimization:

  • Function evaluations are expensive (e.g., simulations, experiments)

  • The function is a black box(it cannot be expressed in closed form)

  • You have a limited budget of function evaluations

  • When gradients are unavailable or computationally expensive to obtain

Key Concepts

Gaussian Processes (GP)

The GP serves as a probabilistic model of your objective function. It provides both a mean prediction and uncertainty estimates.

Acquisition Functions

These functions use the GP’s predictions to decide where to evaluate next.

Common acquisition functions include:

  • Upper Confidence Bound (UCB): Balances mean prediction with uncertainty

  • Expected Improvement (EI): Expected improvement over the current best

  • Probability of Improvement (POI): Probability of improving over the current best

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import optimagic as om
from bayes_opt import acquisition

Basic Usage of the bayes_opt Optimizer

Let’s start with a simple example using a sphere function

def sphere(params):
    return params @ params
# Note: bayes_opt requires bounds for all parameters
bounds = om.Bounds(
    lower=np.full(2, -10.0),
    upper=np.full(2, 10.0)
)
bayesopt_res = om.minimize(
    fun=sphere,
    params=np.arange(2),
    algorithm="bayes_opt",
    bounds=bounds,
    algo_options={"seed": 1}
)

bayesopt_res.params
array([-0.01718043, -0.02854996])

Acquisition Functions in the bayes_opt Optimizer

In Bayesian optimization, the acquisition function decides where to sample next. It controls the trade-off between exploration (search new areas) and exploitation (focus on good areas).

optimagic lets you set the acquisition function in different ways:

1. Using No Acquisition Function (Default)

# Uses package defaults: UCB for unconstrained, EI for constrained
acquisition_function = None

2. Using a String (Built-in acquisition functions)

You can pass any of the following strings to select a standard acquisition function:

  • "ucb" / "upper_confidence_bound" – Upper Confidence Bound

  • "ei" / "expected_improvement" – Expected Improvement

  • "poi" / "probability_of_improvement" – Probability of Improvement

acquisition_function_str = "ucb"

3. Using a Class (Auto-Instantiated)

You can also pass the class directly, optimagic will create an instance for it:

from bayes_opt.acquisition import UpperConfidenceBound

acquisition_function_class = UpperConfidenceBound

4. Using an Instance

from bayes_opt.acquisition import ExpectedImprovement

acquisition_function_instance = ExpectedImprovement(
    xi=0.1,
    exploration_decay=0.95,
    exploration_decay_delay=5
)

Example Run with configured acquisition functions

res = om.minimize(
    fun=sphere ,
    params=np.arange(2),
    algorithm="bayes_opt",
    bounds=om.Bounds(lower=np.full(2, -5.0), upper=np.full(2, 5.0)),
    algo_options={"seed":1, "acquisition_function": acquisition_function_str,}
        # acquisition_function can be any of:
        #   acquisition_function_str        → e.g. "ucb", "ei", "poi"
        #   acquisition_function_class      → e.g. UpperConfidenceBound
        #   acquisition_function_instance   → e.g. ExpectedImprovement(xi=0.1)
        #   None                            → defaults to "ucb"
    )

res.params
array([-0.01894659, -0.02173497])

Custom Acquisition Functions

bayesian-optimization also allows us to write our own acquisition functions by subclassing its AcquisitionFunction class. This allows you to define exploration/exploitation strategies tailored to your specific problem.

Implementation Requirements

When subclassing AcquisitionFunction, there are specific methods we must implement:

  1. base_acq(self, mean, std) method (Required): This is the core method where you define the mathematical formula for your acquisition function. It takes the predicted mean and standard deviation from the Gaussian Process and returns the acquisition value(s).

  2. suggest method (Optional but often needed): The base class provides a default implementation, but you may need to override it if you need to set up internal state (like y_max for EI/PI) before base_acq is called.

  3. get_acquisition_params and set_acquisition_params methods (Optional but recommended): These are used for retrieving and setting the internal parameters of your acquisition function. Implementing them makes your acquisition function fully configurable and serializable.

from bayes_opt.acquisition import AcquisitionFunction
class CustomAcquisition(AcquisitionFunction):
    """
    A simple custom acquisition function.

    This acquisition returns half of the predicted mean.
    It ignores the uncertainty (std), making it purely
    exploitation-oriented.
    """
    def __init__(self):
        super().__init__()

    def base_acq(self, mean, std):
        return 0.5 * mean

Using the Custom Acquisition Function

Once you have defined your custom acquisition function, you can use it in optimagic by passing an instance or a class to the acquisition_function parameter:

acquisition_function = CustomAcquisition()

res = om.minimize(
    fun=sphere ,
    params=np.arange(2),
    algorithm="bayes_opt",
    bounds=om.Bounds(lower=np.full(2, -5.0), upper=np.full(2, 5.0)),
    algo_options={"seed":1, "acquisition_function": acquisition_function,}
    )

res.params
array([-0.00687158,  0.02636662])

Meta Acquisition Functions

The bayesian-optimization package also provides meta acquisition functions that operate on other acquisition functions:

  1. GPHedge: Dynamically chooses the best acquisition function from a set of candidates based on their past performance.

  2. ConstantLiar: Used for parallelized optimization to discourage sampling near points that have already been suggested but not yet evaluated.

Here’s how to use GPHedge with multiple base acquisition functions:

1. GPHedge:

Dynamically chooses the best acquisition function from a set of candidates based on their past performance.

let’s define the Branin function, to use with Meta Acquisition functions.

def branin(params):
    """The Branin function - a classic optimization test function.

    Has three global minima at approximately:
    - (-π, 12.275)
    - (π, 2.275)
    - (9.42478, 2.475)

    Global minimum value: 0.397887
    """
    x1, x2 = params[0], params[1]

    a = 1
    b = 5.1 / (4 * np.pi**2)
    c = 5 / np.pi
    r = 6
    s = 10
    t = 1 / (8 * np.pi)

    term1 = a * (x2 - b * x1**2 + c * x1 - r)**2
    term2 = s * (1 - t) * np.cos(x1)
    term3 = s

    return term1 + term2 + term3
from bayes_opt.acquisition import GPHedge, UpperConfidenceBound, ExpectedImprovement

# Create a list of base acquisition functions
base_acquisitions = [
    UpperConfidenceBound(kappa=2.576),
    ExpectedImprovement(xi=0.01),
    # Add more as needed
]

gphedge_acq = GPHedge(base_acquisitions)

result = om.minimize(
    fun=branin,
    params=np.array([1.0, 1.0]),
    algorithm="bayes_opt",
    bounds=bounds,
    algo_options={
        "acquisition_function": gphedge_acq,
        "seed": 42
    }
)

result.params, result.fun
(array([9.41613534, 2.42211064]), 0.4003261302890433)

2. ConstantLiar

ConstantLiar is used for parallelized optimization. It discourages sampling near points that have already been suggested but not yet evaluated.

from bayes_opt.acquisition import ConstantLiar, UpperConfidenceBound

base_acq = UpperConfidenceBound(kappa=2.576)

constant_liar_acq = ConstantLiar(base_acquisition=base_acq, strategy="max")

# Use in optimization (Note: ConstantLiar is primarily for async optimization)
result = om.minimize(
    fun=sphere,
    params=np.array([1.0, 1.0]),
    algorithm="bayes_opt",
    bounds=bounds,
    algo_options={
        "acquisition_function": constant_liar_acq,
        "seed": 42
    }
)

result.params
array([-0.05066015, -0.00433807])

Exploration vs Exploitation Trade-off

When using Bayesian optimization, the acquisition function decides where to sample next. It balances exploration (try new areas) vs exploitation (refine known good areas).

  • Exploration: Sampling in regions with high uncertainty

  • Exploitation: Sampling in regions with high predicted values

Sequential Domain Reduction (SDR)

Sequential Domain Reduction (SDR) progressively narrows the search space around promising regions. This can significantly improve optimization, especially for high-dimensional problems.

SDR Parameters

  • enable_sdr: Enable/disable Sequential Domain Reduction

  • sdr_gamma_osc: Controls oscillation damping (default: 0.7)

  • sdr_gamma_pan: Controls panning behavior (default: 1.0)

  • sdr_eta: Zooming parameter for region shrinking (default: 0.9)

  • sdr_minimum_window: Minimum window size (default: 0.0)

SDR Example

def ackley(x):
    """Global minimum: f(x*) = 0 at x* = (0, 0)"""
    x0, x1 = x
    arg1 = -0.2 * np.sqrt(0.5 * (x0 ** 2 + x1 ** 2))
    arg2 = 0.5 * (np.cos(2 * np.pi * x0) + np.cos(2 * np.pi * x1))
    return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e

start_params = np.array([2.0, 2.0])
bounds = om.Bounds(
    lower=np.array([-32.768, -32.768]),
    upper=np.array([32.768, 32.768])
)
# Standard Bayesian Optimization without SDR
result_standard = om.minimize(
    fun=ackley,
    params=start_params,
    algorithm="bayes_opt",
    bounds=bounds,
    algo_options={
        "enable_sdr": False,
        "n_iter": 50,
        "init_points": 2,
        "seed": 1,
        "acquisition_function": "ucb",
    }
)

print("Standard Bayesian Optimization:")
print("Best function value:", result_standard.fun)
print("Best parameters:", result_standard.x)
Standard Bayesian Optimization:
Best function value: 2.523271014697603
Best parameters: [-0.25964833 -0.20623602]
# Bayesian Optimization with SDR
result_sdr = om.minimize(
    fun=ackley,
    params=start_params,
    algorithm="bayes_opt",
    bounds=bounds,
    algo_options={
        "enable_sdr": True,
        "sdr_minimum_window": 0.5,
        "sdr_gamma_osc": 0.7,
        "sdr_gamma_pan": 1.0,
        "sdr_eta": 0.9,
        "n_iter": 50,
        "init_points": 2,
        "seed": 1,
        "acquisition_function": "ucb",
    }
)

print("Bayesian Optimization with SDR:")
print("Best function value:", result_sdr.fun)
print("Best parameters:", result_sdr.x)
Bayesian Optimization with SDR:
Best function value: 0.014415407122172486
Best parameters: [-0.00351133 -0.00337898]
# Compare convergence behavior
results = {
    "Standard BO": result_standard,
    "BO with SDR": result_sdr
}

# SDR typically converges faster than standard BO
fig = om.criterion_plot(results)
fig.show()

Gaussian Process Configuration

"bayesian-optimization" uses a Gaussian Process (GP) as the surrogate model. Its behavior can be tuned with these options via algo_options:

  • alpha: noise level in function evaluations

    • lower values (e.g.,1e-6): assumes nearly precise function evaluations

    • higher values (e.g., 1e-2): assumes noisy evaluations

  • n_restarts: Number of times to restart the optimization.

  • seed → ensures reproducible results.

algo_options = {
    "alpha": 1e-3,
    "n_restarts": 5,
    "seed": 42,
}

result_configured = om.minimize(
    fun=sphere,
    params=np.array([3.0, 3.0]),
    algorithm="bayes_opt",
    bounds=om.Bounds(lower=np.full(2, -5.0), upper=np.full(2, 5.0)),
    algo_options=algo_options
)

print("Configured GP results:")
print(f"  Best value: {result_configured.fun}")
print(f"  Function evaluations: {result_configured.n_fun_evals}")
Configured GP results:
  Best value: 0.0023458694358570046
  Function evaluations: 31

Summary

Bayesian optimization is a powerful tool for optimizing expensive black-box functions. Key takeaways:

  1. Choose the right acquisition function based on your exploration/exploitation needs

  2. Tune acquisition parameters like kappa (UCB) or xi (EI) to control the trade-off

  3. Use SDR for high-dimensional problems to focus the search

  4. Configure the GP properly with appropriate noise levels and restarts

For more detailed information, check out the bayesian-optimization documentation.