Natural Evolution Strategies¶

NES

A family of numerical optimization algorithms for black-box problems which iteratively update a search distribution by using an estimated gradient on its distribution parameters.

A parameterized search distribution is used to produce a batch of search points.
The fitness function is evaluated at each search point.
A search gradient is estimated using the parameters of the search points with respect to the expected fitness.
A gradient ascent step is taken along the natural gradient, which is a second order method that renormalizes the update with respect to uncertainty (prevents oscillations, premature convergence, and undesirable local effects).

The core idea is to use search gradients to update parameters where the search gradient is the sampled gradient of expected fitness. Let $θ$ be the parameters of the search distribution $π (x | θ)$ and $f (x)$ the fitness function evaluated at $x$ . The objective is to maximize the expected fitness under the distribution.

$J (θ) = E_{θ} [f (x)] = \int f (x) π (x | θ) d x$

The gradient can be written as:

$\nabla_{θ} J (θ) = E_{θ} [f (z) \nabla_{θ} l o g π (z | θ)]$

The Monte Carlo estimate from samples $z_{1}, . . ., z_{λ}$ as:

$\nabla_{θ} J (θ) \approx \frac{1}{λ} \sum_{k = 1}^{λ} f (z_{k}) \nabla_{θ} l o g π (z_{k} | θ)$

The parameters are then updated with a standard ascent scheme:

$θ \leftarrow θ + η \nabla_{θ} J (θ)$

The natural gradient accounts for uncertainty in the second order updates by removing the dependence on parameterization by instead relying on the KL divergence between probability distributions where the Fisher information matrix defines the local curvature in distribution space.

$\begin{aligned} (1) & F & = \int π (z | θ) \nabla_{θ} l o g π (z | θ) \nabla_{θ} l o g π (z | θ)^{⊤} d z \\ (2) & \approx \frac{1}{λ} \sum_{k = 1}^{λ} \nabla_{θ} l o g π (z_{k} | θ) \nabla_{θ} l o g π (z_{k} | θ)^{⊤} \end{aligned}$

The ascent scheme then becomes:

$θ \leftarrow θ + η F^{- 1} \nabla_{θ} J (θ)$