r"""
This module implements a two-stage algorithm for mapping the boundary of the
:math:`(1-\alpha)` confidence region in the full parameter space of a
:class:`~spey.StatisticalModel`. It uses frequentist approach to find the iso-surface of
the profile likelihood ratio
.. math::
\Delta\chi^2(\mu) = -2\log\frac{\mathcal{L}(\mu, \hat{\theta})}{\mathcal{L}(\hat{\mu}, \hat{\theta})}
= \chi^2_{D,1-\alpha} \ ,
where :math:`\mu` are parameters of interest, :math:`\theta` are the nuisance parameters and
:math:`\chi^2_{D,1-\alpha}` is the :math:`\chi^2` distribution at :math:`1-\alpha` confidence
region with D degrees of freedom.
Mathematical Framework
======================
Let :math:`\theta \in \mathbb{R}^k` be the model parameter vector,
:math:`\hat\theta` the maximum-likelihood estimate (MLE), and
.. math::
\mathrm{NLL}(\theta) = -\log\mathcal{L}(\theta)
the negative log-likelihood. Under Wilks' theorem, the test statistic
.. math::
\chi^2(\theta) = 2\bigl[\mathrm{NLL}(\theta) - \mathrm{NLL}(\hat\theta)\bigr]
follows a :math:`\chi^2_k` distribution asymptotically. The
:math:`(1-\alpha)` **confidence region** is
.. math::
\mathcal{C}_\alpha = \bigl\{\theta : \chi^2(\theta) \le \Delta_\alpha\bigr\},
\qquad \Delta_\alpha = F^{-1}_{\chi^2_k}(1-\alpha),
and its :math:`(k-1)`-dimensional boundary — the **contour** — satisfies
.. math::
\mathrm{NLL}(\theta) = T,
\qquad T = \mathrm{NLL}(\hat\theta) + \tfrac{\Delta_\alpha}{2}.
Algorithm
=========
Stage 1 — Pre-whitening
-----------------------
The Hessian of the NLL at the MLE equals the observed Fisher information:
.. math::
G = \nabla^2 \mathrm{NLL}(\hat\theta)
= -\nabla^2 \log\mathcal{L}(\hat\theta).
:math:`G` is positive (semi-)definite at a proper minimum. After Cholesky
factorisation :math:`G = LL^T`, the **whitened coordinate**
.. math::
\varphi = L(\theta - \hat\theta),
\qquad \theta = \hat\theta + L^{-T}\varphi
makes the contour approximately a :math:`(k-1)`-sphere of radius
:math:`\sqrt{\Delta_\alpha}`:
.. math::
\mathrm{NLL}\!\bigl(\hat\theta + L^{-T}\varphi\bigr)
\approx \mathrm{NLL}(\hat\theta) + \tfrac{1}{2}|\varphi|^2 + O(|\varphi|^3).
Sampling uniform random directions on :math:`S^{k-1}` in :math:`\varphi`-space
therefore gives approximately uniform coverage of the contour, even when the
original parameter space is strongly anisotropic.
Stage 2 — Radial search
------------------------
For each unit vector :math:`\hat{e} \in S^{k-1}` (drawn by normalising a
standard Gaussian), define the one-dimensional profile
.. math::
f(r) = \mathrm{NLL}\!\bigl(\hat\theta + L^{-T}(r\hat{e})\bigr) - T.
:math:`f` is negative at :math:`r=0` (since :math:`\mathrm{NLL}(\hat\theta) < T`)
and positive for large :math:`r`. The root :math:`r^*` — found via Brent's
method — yields the contour point
.. math::
\theta^* = \hat\theta + L^{-T}(r^*\hat{e}).
Stage 3 — Gap detection
------------------------
After the radial search, :math:`M \gg N` candidate unit vectors are sampled
uniformly on :math:`S^{k-1}`. For each candidate the maximum cosine
similarity to any radially-found direction is computed. Candidates with the
smallest maximum similarity correspond to the largest angular gaps.
For each gap direction a **dedicated radial search** is run to locate the
exact contour crossing along that direction. This ensures every RATTLE chain
starts directly on the contour in the sparse region, rather than at the
nearest dense radial point which may be far away geodesically.
Stage 4 — Constrained Hamiltonian Monte Carlo (RATTLE)
-------------------------------------------------------
Let :math:`C(\theta) = \mathrm{NLL}(\theta) - T` be the constraint function.
Starting from a radial contour point, the RATTLE integrator
:cite:t:`ANDERSEN198324` walks along :math:`\partial\mathcal{C}_\alpha`
while preserving the constraint at each step. One leapfrog step with
step size :math:`\varepsilon` reads
.. math::
p_{1/2} &= p_0
- \tfrac{\varepsilon}{2}\,\nabla\mathrm{NLL}(\theta_0), \\[3pt]
\theta' &= \theta_0 + \varepsilon\, p_{1/2}, \\[3pt]
\theta_1 &= \theta' - \lambda\,\nabla\mathrm{NLL}(\theta'),
\quad \lambda = \frac{\mathrm{NLL}(\theta')-T}
{|\nabla\mathrm{NLL}(\theta')|^2}, \\[3pt]
p' &= p_{1/2} - \tfrac{\varepsilon}{2}\,\nabla\mathrm{NLL}(\theta_1), \\[3pt]
p_1 &= p' - \frac{p' \cdot \nabla\mathrm{NLL}(\theta_1)}
{|\nabla\mathrm{NLL}(\theta_1)|^2}
\,\nabla\mathrm{NLL}(\theta_1).
The third equation is the SHAKE projection onto the constraint surface,
iterated via Newton's method until
:math:`|\mathrm{NLL}(\theta_1) - T| < \varepsilon_\text{tol}`.
The fifth equation projects the momentum onto the tangent space of the
constraint, ensuring :math:`p_1 \perp \nabla C(\theta_1)`.
Stage 5 — Adaptive coverage improvement
-----------------------------------------
After Stage 4, all collected contour points (radial + RATTLE) are gathered
into a single point cloud. A k-d tree is used to compute the
:math:`k_\text{NN}`-th nearest-neighbour distance for every point. The
:math:`n_\text{chains}` most isolated points (those with the largest
:math:`k_\text{NN}` distance) seed a new round of RATTLE chains that explore
the sparsely covered neighbourhood. This pass is repeated
:math:`n_\text{passes} - 1` times, providing bias-free coverage of regions
that are geometrically sparse on the contour even when they are not easily
detected as angular gaps from the MLE.
.. currentmodule:: spey.multiparameter
Functions
---------
.. autosummary::
:toctree: ../_generated/
contour.find_contour
contour.ContourResult
"""
import logging
import warnings
from dataclasses import dataclass
from typing import List, Optional, Tuple
import autograd
import autograd.numpy as anp
import numpy as np
from joblib import Parallel, cpu_count, delayed
from scipy.optimize import minimize as _sp_minimize
from scipy.optimize import root_scalar
from scipy.stats import chi2 as chi2_dist
from tqdm import tqdm
log = logging.getLogger("Spey")
__all__ = ["ContourResult", "find_contour"]
# ---------------------------------------------------------------------------
# Public result container
# ---------------------------------------------------------------------------
[docs]
@dataclass
class ContourResult:
r"""
Container for the output of :func:`find_contour`.
Attributes:
theta_mle (``np.ndarray``): Maximum-likelihood estimate
:math:`\hat\theta`, shape ``(k,)``.
nll_min (``float``): Minimum negative log-likelihood
:math:`\mathrm{NLL}(\hat\theta)`.
threshold (``float``): NLL value that defines the contour boundary,
:math:`T = \mathrm{NLL}(\hat\theta) + \Delta_\alpha / 2`.
delta (``float``): Chi-squared quantile
:math:`\Delta_\alpha = F^{-1}_{\chi^2_k}(1-\alpha)`.
contour_points (``np.ndarray``): Points on the contour boundary,
shape ``(n_points, k)``. Every row :math:`\theta^*` satisfies
:math:`|\mathrm{NLL}(\theta^*) - T| \lesssim \varepsilon_\text{tol}`.
from_radial (``np.ndarray``): Boolean mask of shape ``(n_points,)``.
``True`` for points produced by the radial search; ``False`` for
points added by the constrained RATTLE walk.
parameter_names (``Optional[List[str]]``): Names of the :math:`k`
parameters in the same order as the columns of
:attr:`contour_points`, or ``None`` when the model does not
provide names.
confidence_level (``float``): The confidence level :math:`1-\alpha`,
e.g. ``0.95``.
dof (``int``): Degrees of freedom :math:`k` (number of model
parameters).
"""
theta_mle: np.ndarray
nll_min: float
threshold: float
delta: float
contour_points: np.ndarray
from_radial: np.ndarray
parameter_names: Optional[List[str]]
confidence_level: float
dof: int
[docs]
def find_contour(
stat_model,
confidence_level: float = 0.95,
poi_indices: list[int] = None,
n_radial: int = 300,
n_hmc_chains: int = 10,
n_hmc_steps: int = 500,
hmc_step_size: float = 0.05,
n_gap_candidates: int = 3000,
max_radial_bracket: float = 30.0,
newton_tol: float = 1e-9,
newton_max_iter: int = 100,
whitener_regularisation: float = 1e-8,
random_seed: Optional[int] = None,
n_jobs: int = cpu_count() - 1,
bounds: Optional[List[Tuple[Optional[float], Optional[float]]]] = None,
n_coverage_passes: int = 2,
coverage_knn: int = 5,
n_multistart: int = 20,
basin_dedup_tol: float = 1e-3,
) -> ContourResult:
r"""
Find the :math:`(1-\alpha)` chi-squared confidence contour of a
:class:`~spey.StatisticalModel` in its full parameter space.
The algorithm operates in four stages (see the module docstring for the
full mathematical derivation):
1. **Pre-whitening** — compute the Fisher information :math:`G` at the
MLE via the Hessian of the log-likelihood and Cholesky-factor it so
that :math:`\varphi = L(\theta-\hat\theta)` maps the contour to an
approximate sphere.
2. **Radial search** — shoot :math:`N` random rays from :math:`\hat\theta`
in whitened space and locate each ray's crossing of the NLL threshold
:math:`T` with Brent's method.
3. **Gap detection** — sample :math:`M \gg N` candidate directions and
select the :math:`n_\text{chains}` directions least covered by the
radial set as seeds for RATTLE chains.
4. **Constrained RATTLE HMC** — for each seed direction, start a
Hamiltonian Monte Carlo chain that walks along the constraint manifold
:math:`\mathrm{NLL}(\theta) = T`, projecting both position and momentum
at every step.
.. versionadded:: 0.2.7
Args:
stat_model: A :class:`~spey.StatisticalModel` whose backend is
``"default.multivariate_normal"``. The backend must expose
``get_logpdf_func()`` and ``get_hessian_logpdf_func()``; these are
used to build the NLL, its gradient (via ``autograd``), and the
Fisher information for whitening.
confidence_level (``float``, default ``0.95``): Confidence level
:math:`1-\alpha`. The chi-squared threshold is
:math:`\Delta_\alpha = F^{-1}_{\chi^2_k}(1-\alpha)`.
poi_indices (``list[int]``, default ``None``): Indices of the model
parameters (in the full parameter space, excluding :math:`\mu`)
to include in the contour search. :math:`\mu` itself is always
excluded and held fixed at **1**. Parameters *not* listed here
— and not equal to :math:`\mu` — are treated as **profile
parameters**: they are numerically minimised at every NLL
evaluation so that the algorithm maps the *marginal* confidence
boundary in the ``poi_indices`` subspace. When ``None`` all
non-:math:`\mu` parameters form the contour space and no
profiling is performed.
n_radial (``int``, default ``300``): Number of random radial
directions to explore in Stage 2.
n_hmc_chains (``int``, default ``10``): Number of RATTLE chains to
run in Stage 4, one per detected gap direction.
n_hmc_steps (``int``, default ``500``): Number of leapfrog steps per
RATTLE chain. Each step produces one contour point (after the
constraint projection).
hmc_step_size (``float``, default ``0.05``): Leapfrog step size
:math:`\varepsilon` in whitened-coordinate units.
n_gap_candidates (``int``, default ``3000``): Number of candidate
directions sampled for gap detection. Larger values give a more
thorough search at modest extra cost.
max_radial_bracket (``float``, default ``30.0``): Upper bracket for
the radial root-finder in whitened-space units. If the contour
lies further than this from the MLE in some direction, that
direction is skipped.
newton_tol (``float``, default ``1e-9``): Convergence tolerance
:math:`\varepsilon_\text{tol}` for the Newton projection step in
RATTLE (i.e. for :math:`|\mathrm{NLL}(\theta)-T|`).
newton_max_iter (``int``, default ``100``): Maximum Newton iterations
in the constraint projection step.
whitener_regularisation (``float``, default ``1e-8``): Minimum
eigenvalue enforced on the Fisher information matrix before
Cholesky factorisation, preventing singular whitening transforms
when the likelihood is locally flat.
random_seed (``Optional[int]``, default ``None``): Seed for the
:class:`numpy.random.Generator` used throughout, enabling
reproducible results.
n_jobs (``int``, default ``cpu_count() - 1``): Number of parallel
worker processes for the RATTLE HMC chains (Stage 4). Each
chain is independent and runs in a separate process via
``joblib.Parallel``. Pass ``1`` to disable parallelism.
Values :math:`\le 0` are clamped to ``1``. Note: parallel execution
requires the NLL and gradient closures to be picklable; if
pickling fails, fall back to ``n_jobs=1``.
bounds (default ``None``): Parameter bounds for each signal parameter
(i.e. every parameter accepted by ``signal_yields``, in the same
order). Each element is a ``(lower, upper)`` pair where either
value may be ``None`` to indicate no bound on that side. For
example, ``[(0.0, None), (-1.0, 1.0)]`` constrains the first
parameter to be non-negative and the second to ``[-1, 1]``.
When ``None`` the whole list is passed, no boundaries are applied.
The radial search caps its bracket so that roots beyond a bound
are never sought; the RATTLE walk terminates a chain the moment
it steps outside the bounded region.
n_coverage_passes (``int``, default ``2``): Total number of HMC
exploration passes. Pass 1 seeds chains from the angular gap
directions (Stage 3/4). Each subsequent pass (up to
``n_coverage_passes - 1`` additional rounds) uses
:math:`k`-nearest-neighbour distances among all collected
contour points to identify the most isolated regions and seeds
new RATTLE chains there (Stage 5). Set to ``1`` to disable
the adaptive coverage stage and reproduce the original
single-pass behaviour.
coverage_knn (``int``, default ``5``): The :math:`k` in the
:math:`k`-NN distance used by Stage 5 to measure local point
density. Larger values smooth the density estimate over a wider
neighbourhood; smaller values focus on very localised gaps.
Must satisfy ``coverage_knn >= 1``.
n_multistart (``int``, default ``20``): Total number of random starts
used to discover *disjoint* local minima of the NLL (basins)
before the per-basin contour search runs. When the NLL has more
than one local minimum below the confidence threshold — e.g. an
EFT likelihood with an interference term — each such minimum
contributes its own component to the confidence region, so all
of them must be mapped to recover the full contour. A single
start can only ever find one. Values :math:`\le 1` disable
multi-start and reproduce the legacy single-anchor behaviour.
basin_dedup_tol (``float``, default ``1e-3``): Euclidean tolerance in
contour-parameter space for clustering duplicate minima found by
the multi-start search. Two candidates within this distance are
treated as the same basin; only the one with the lower NLL is
kept.
Returns:
:class:`~spey.multiparameter.contour.ContourResult`
A dataclass containing
:attr:`~spey.multiparameter.contour.ContourResult.theta_mle`,
:attr:`~spey.multiparameter.contour.ContourResult.nll_min`,
:attr:`~spey.multiparameter.contour.ContourResult.threshold`,
:attr:`~spey.multiparameter.contour.ContourResult.contour_points`,
and associated metadata.
Raises:
ValueError: If the model has fewer than one parameter.
RuntimeError: If the MLE optimisation fails to converge.
Examples:
.. code:: python3
import numpy as np
import spey
from spey.multiparameter.contour import find_contour
base_signal = np.array([12.0, 15.0])
bkg = np.array([50.0, 48.0])
data = np.array([36.0, 33.0])
cov = np.array([[144.0, 13.0], [25.0, 256.0]])
def signal(pars):
return base_signal * (1.0 + pars[0])
pdf_wrapper = spey.get_backend('default.multivariate_normal')
stat_model = pdf_wrapper(
signal_yields=signal,
background_yields=bkg,
data=data,
covariance_matrix=cov,
n_signal_parameters=1,
analysis="demo",
)
result = find_contour(
stat_model,
confidence_level=0.95,
n_radial=200,
n_hmc_chains=5,
random_seed=42
)
print("MLE :", result.theta_mle)
print("NLL min :", result.nll_min)
print("Contour points:", result.contour_points.shape)
"""
if confidence_level <= 0.0 or confidence_level >= 1.0:
raise ValueError(f"confidence_level must be in (0, 1); got {confidence_level}.")
n_jobs = max(1, n_jobs)
rng = np.random.default_rng(random_seed)
# ------------------------------------------------------------------
# Introspect the model
# ------------------------------------------------------------------
cfg = stat_model.config()
k_total: int = cfg.npar # total number of model parameters
poi_idx: int = cfg.poi_index # index of the signal-strength parameter mu
cfg_bounds: list[tuple[float, float]] = cfg.suggested_bounds
# All non-mu parameter indices, sorted ascending.
all_signal_indices: List[int] = [i for i in range(k_total) if i != poi_idx]
# Contour parameter indices: the subspace whose boundary we map.
# Parameters listed in poi_indices (minus mu) are the contour axes.
if poi_indices is None:
contour_indices: List[int] = all_signal_indices
else:
contour_indices: List[int] = sorted(i for i in poi_indices if i != poi_idx)
# Profile (nuisance) parameter indices: non-mu parameters NOT in the
# contour subspace. These are minimised at every NLL evaluation so
# that the algorithm maps the *marginal* confidence boundary.
_contour_set = set(contour_indices)
profile_indices: List[int] = [i for i in all_signal_indices if i not in _contour_set]
k: int = len(contour_indices) # dimension of the contour search space
k_profile: int = len(profile_indices)
if k < 1:
raise ValueError(
"Model must have at least one non-POI (signal shape) parameter "
"for contour finding."
)
if bounds is not None and len(bounds) != k:
raise ValueError(
f"bounds must have one entry per signal parameter (expected {k}, "
f"got {len(bounds)})."
)
if bounds is None:
bounds = [cfg_bounds[idx] for idx in contour_indices]
# Parameter names for the contour subspace only (mu excluded)
_all_names: Optional[List[str]] = cfg.parameter_names
param_names: Optional[List[str]] = (
[_all_names[i] for i in contour_indices] if _all_names is not None else None
)
# ------------------------------------------------------------------
# Stage 1: find MLE over ALL signal parameters with mu fixed at 1.
# This gives the correct nll_min (threshold base) and the initial
# values for profile parameters.
# ------------------------------------------------------------------
logpdf_fn = stat_model.backend.get_logpdf_func()
theta_mle_all, nll_min = _find_mle(stat_model, all_signal_indices)
log.debug("MLE all signal params = %s, NLL_min = %.6f", theta_mle_all, nll_min)
# Map global indices → positions within all_signal_indices
_sig_pos = {idx: pos for pos, idx in enumerate(all_signal_indices)}
_contour_pos = [_sig_pos[i] for i in contour_indices]
_profile_pos = [_sig_pos[i] for i in profile_indices]
theta_mle = theta_mle_all[_contour_pos] # (k,)
theta_mle_profile = theta_mle_all[_profile_pos] if profile_indices else np.empty(0)
# ------------------------------------------------------------------
# Build NLL callables in the contour-parameter subspace (mu = 1).
#
# Two regimes:
# k_profile == 0 — all signal params are contour params; use
# autograd-differentiable NLL directly.
# k_profile > 0 — profile parameters are minimised at every NLL
# evaluation; gradient via the envelope theorem.
# ------------------------------------------------------------------
# Will be re-assigned in the k_profile > 0 branch; left as None otherwise
# so the name is resolvable from the per-basin loop's closure below
# (where it is only called when k_profile > 0).
_profile_opt = None
if k_profile == 0:
# All signal parameters are contour parameters (current behaviour).
# anp.concatenate keeps the computation graph autograd-differentiable.
def _nll(signal_pars: np.ndarray):
r"""
Autograd-compatible NLL as a function of all signal parameters.
:math:`\mu` is held fixed at **1** by inserting it at position
*poi_idx* of the full parameter vector.
"""
full_pars = anp.concatenate(
[
signal_pars[:poi_idx],
anp.array([1.0]),
signal_pars[poi_idx:],
]
)
return -logpdf_fn(full_pars)
_grad_nll_auto = autograd.grad(_nll) # pylint: disable=no-value-for-parameter
def nll_scalar(contour_pars: np.ndarray) -> float:
"""Plain-float NLL (mu=1 fixed) for scipy root-finders."""
return float(_nll(np.asarray(contour_pars, dtype=float)))
def grad_nll(contour_pars: np.ndarray) -> np.ndarray:
r"""Gradient :math:`\nabla_{\theta_c}\mathrm{NLL}`, mu=1 fixed."""
return np.array(
_grad_nll_auto(np.asarray(contour_pars, dtype=float)), dtype=float
)
else:
# Profile regime: minimise over profile_indices at each evaluation.
# Gradient uses the envelope theorem:
# ∇_{θ_c} NLL_profiled = [∇_{full} NLL]_{contour columns}
# evaluated at the profile-optimal point.
#
# The inner minimisation uses the profile-parameter subspace directly
# with bounds and constraints from prepare_for_fit, matching the
# SLSQP defaults used by the rest of spey.
_contour_arr = np.array(contour_indices, dtype=int)
_profile_arr = np.array(profile_indices, dtype=int)
_theta_profile_init = theta_mle_profile.copy()
def _assemble(cp: np.ndarray, pp: np.ndarray) -> np.ndarray:
"""Build full parameter vector: mu=1, contour=cp, profile=pp."""
full = np.empty(k_total, dtype=float)
full[poi_idx] = 1.0
full[_contour_arr] = cp
full[_profile_arr] = pp
return full
# Prepare the fit once to access the model configuration (bounds,
# suggested_init) and any backend constraints.
_spey_fit_args = stat_model.prepare_for_fit()
_model_cfg = _spey_fit_args["model_configuration"]
_backend_constraints = _spey_fit_args.get("constraints", [])
# Bounds for the profile parameters only (taken from model config).
_all_bounds = list(_model_cfg.suggested_bounds)
_profile_bounds = [_all_bounds[i] for i in profile_indices]
def _profile_opt(cp: np.ndarray) -> np.ndarray:
r"""
Minimise the NLL over profile parameters with :math:`\mu` and
the contour parameters fixed.
The optimisation is performed directly in the reduced
:math:`k_\text{profile}`-dimensional profile subspace — not
over the full parameter vector — so no equality constraints are
needed and SLSQP never encounters an over-constrained problem.
Bounds are taken from ``model_configuration.suggested_bounds``
for the profile-parameter slots. Any backend constraints (which
are typically defined on the full parameter vector) are adapted
by embedding the current contour values into the constraint
function via a closure.
The optimisation uses SLSQP with ``tol=1e-6`` and
``maxiter=10000``, matching the defaults in
:func:`~spey.optimizer.scipy_tools.minimize`. The initial
point for the profile parameters is the joint signal-MLE value
:math:`\hat\theta_p`.
"""
def obj(pp: np.ndarray) -> float:
return float(-logpdf_fn(_assemble(cp, pp)))
# Adapt backend constraints (full-param) to profile-param space.
constraints = []
for c in _backend_constraints:
orig_fun = c["fun"]
def _wrapped(pp, _f=orig_fun, _cp=cp):
return _f(_assemble(_cp, pp))
constraints.append({**c, "fun": _wrapped})
res = _sp_minimize(
obj,
x0=_theta_profile_init,
method="SLSQP",
bounds=_profile_bounds,
constraints=constraints,
tol=1e-6,
options={"maxiter": 10000, "disp": False},
)
return res.x if res.success else _theta_profile_init
_grad_logpdf_auto = autograd.grad( # pylint: disable = no-value-for-parameter
logpdf_fn
)
def nll_scalar(contour_pars: np.ndarray) -> float:
"""Profiled NLL: min over profile params with contour fixed."""
cp = np.asarray(contour_pars, dtype=float)
pp = _profile_opt(cp)
return float(-logpdf_fn(_assemble(cp, pp)))
def grad_nll(contour_pars: np.ndarray) -> np.ndarray:
r"""
Gradient of profiled NLL via the envelope theorem.
At the profile optimum :math:`\theta_p^*(\theta_c)`, the
gradient of :math:`\mathrm{NLL}` w.r.t. :math:`\theta_p` is
zero, so:
.. math::
\nabla_{\theta_c}\mathrm{NLL}_\text{prof}(\theta_c)
= \bigl[\nabla_\text{full}\,\mathrm{NLL}\bigr]_{\text{contour cols}}
\Big|_{\theta_p = \theta_p^*(\theta_c)}.
"""
cp = np.asarray(contour_pars, dtype=float)
pp = _profile_opt(cp)
full = _assemble(cp, pp)
full_grad_logpdf = np.array(_grad_logpdf_auto(full), dtype=float)
return -full_grad_logpdf[_contour_arr]
# ------------------------------------------------------------------
# Stage 1c: enumerate disjoint basins of the (profiled) NLL via
# multi-start local minimisation. A non-convex NLL can have more than
# one basin below the confidence threshold — e.g. an EFT likelihood
# with a quadratic-interference signal — and each such basin contributes
# its own component to the boundary. A single MLE can only discover
# one, so we enumerate them here and run the contour search from each.
# ------------------------------------------------------------------
basins = _find_basins_contour_subspace(
nll_scalar,
grad_nll,
bounds,
theta_primary=theta_mle,
nll_primary=nll_min,
k=k,
n_multistart=max(1, n_multistart),
rng=rng,
dedup_tol=basin_dedup_tol,
)
# The global minimum may now be at a different basin than the primary fit.
nll_min = basins[0][1]
theta_mle = basins[0][0].copy()
# ------------------------------------------------------------------
# Compute chi-squared threshold
# DoF = k = number of contour parameters (profiled params are not DoF)
# ------------------------------------------------------------------
delta = float(chi2_dist.ppf(confidence_level, df=k))
threshold = nll_min + delta
log.debug("delta(chi2, df=%d) = %.4f, NLL threshold = %.6f", k, delta, threshold)
# A basin contributes to the confidence region iff its NLL is below the
# threshold. A small slack absorbs numerical noise from the multi-start
# minimiser versus the threshold computation.
_basin_slack = 1e-6
active_basins: List[Tuple[np.ndarray, float]] = [
(t, n) for (t, n) in basins if n <= threshold + _basin_slack
]
if not active_basins:
active_basins = [(theta_mle, nll_min)]
log.debug(
"find_contour: %d basin(s) discovered, %d active (NLL ≤ %.6f).",
len(basins),
len(active_basins),
threshold,
)
hessian_fn = stat_model.backend.get_hessian_logpdf_func()
knn_k = max(1, coverage_knn)
_contour_idx_arr = np.array(contour_indices, dtype=int)
_profile_idx_arr = (
np.array(profile_indices, dtype=int)
if profile_indices
else np.empty(0, dtype=int)
)
def _profile_pp_at(cp: np.ndarray) -> np.ndarray:
"""Optimum profile-parameter vector at contour point ``cp``."""
if k_profile == 0:
return np.empty(0, dtype=float)
# _profile_opt is bound in the k_profile > 0 branch above.
return _profile_opt(cp) # noqa: E501
all_radial_pts_list: List[np.ndarray] = []
all_hmc_pts_list: List[np.ndarray] = []
for b_idx, (basin_theta, basin_nll) in enumerate(active_basins):
# ------------------------------------------------------------------
# Per-basin Stage 1: build whitener from the Fisher information at
# this basin's point. Uses Schur complement when profile parameters
# are present.
# ------------------------------------------------------------------
theta_basin_full = np.empty(k_total, dtype=float)
theta_basin_full[poi_idx] = 1.0
theta_basin_full[_contour_idx_arr] = basin_theta
if k_profile > 0:
theta_basin_full[_profile_idx_arr] = _profile_pp_at(basin_theta)
L_basin, L_inv_T_basin = _build_whitener(
hessian_fn,
theta_basin_full,
contour_indices,
profile_indices,
k,
whitener_regularisation,
)
# Whitening closures specific to this basin.
def _to_theta(
phi: np.ndarray,
_anchor: np.ndarray = basin_theta,
_M: np.ndarray = L_inv_T_basin,
) -> np.ndarray:
return _anchor + _M @ phi
def _max_r_for_dir_basin(
e: np.ndarray,
_anchor: np.ndarray = basin_theta,
_M: np.ndarray = L_inv_T_basin,
) -> float:
if bounds is None:
return np.inf
step = _M @ e
r_max = np.inf
for i, (lo, hi) in enumerate(bounds):
if hi is not None and step[i] > 1e-30:
r_max = min(r_max, (hi - _anchor[i]) / step[i])
if lo is not None and step[i] < -1e-30:
r_max = min(r_max, (lo - _anchor[i]) / step[i])
return float(r_max)
# Stage 2: radial search from this basin.
basin_radial_pts, basin_radial_dirs = _radial_search(
nll_scalar,
basin_theta,
threshold,
_to_theta,
k,
n_radial,
max_radial_bracket,
rng,
_max_r_for_dir_basin,
)
log.debug(
"Basin %d (NLL=%.6f): radial search %d/%d directions converged.",
b_idx,
basin_nll,
len(basin_radial_pts),
n_radial,
)
if len(basin_radial_pts) > 0:
all_radial_pts_list.append(basin_radial_pts)
# Stages 3–5 are only meaningful when the tangent space is non-trivial.
if k == 1 or len(basin_radial_pts) == 0:
continue
# Stage 3: angular gap detection relative to *this* basin's radial set.
gap_dirs = _detect_gap_directions(
basin_radial_dirs, k, n_gap_candidates, n_hmc_chains, rng
)
# Stage 4: targeted radial crossings at each gap direction → HMC seeds.
gap_starts = _find_gap_starts(
gap_dirs,
nll_scalar,
basin_theta,
threshold,
_to_theta,
max_radial_bracket,
_max_r_for_dir_basin,
basin_radial_pts,
basin_radial_dirs,
)
basin_hmc_pts: np.ndarray
if len(gap_starts) == 0:
basin_hmc_pts = np.empty((0, k), dtype=float)
else:
basin_hmc_pts = _run_hmc_chains(
nll_scalar,
grad_nll,
threshold,
np.array(gap_starts, dtype=float),
n_hmc_steps,
hmc_step_size,
newton_tol,
newton_max_iter,
rng,
n_jobs,
bounds,
)
log.debug(
"Basin %d: %d gap seeds → %d RATTLE points.",
b_idx,
len(gap_starts),
len(basin_hmc_pts),
)
# Stage 5: adaptive kNN-based reseeding, scoped to this basin's cloud.
_basin_pts_list: List[np.ndarray] = []
if len(basin_radial_pts) > 0:
_basin_pts_list.append(basin_radial_pts)
if len(basin_hmc_pts) > 0:
_basin_pts_list.append(basin_hmc_pts)
collected = (
np.vstack(_basin_pts_list)
if _basin_pts_list
else np.empty((0, k), dtype=float)
)
for pass_idx in range(n_coverage_passes - 1):
sparse_starts = _find_sparse_seeds(collected, n_hmc_chains, knn_k, rng)
if sparse_starts is None or len(sparse_starts) == 0:
break
log.debug(
"Basin %d, Stage 5 pass %d/%d: %d sparse seeds.",
b_idx,
pass_idx + 2,
n_coverage_passes,
len(sparse_starts),
)
new_pts = _run_hmc_chains(
nll_scalar,
grad_nll,
threshold,
sparse_starts,
n_hmc_steps,
hmc_step_size,
newton_tol,
newton_max_iter,
rng,
n_jobs,
bounds,
)
if len(new_pts) > 0:
basin_hmc_pts = (
np.vstack([basin_hmc_pts, new_pts])
if len(basin_hmc_pts) > 0
else new_pts
)
collected = np.vstack([collected, new_pts])
if len(basin_hmc_pts) > 0:
all_hmc_pts_list.append(basin_hmc_pts)
radial_points = (
np.vstack(all_radial_pts_list)
if all_radial_pts_list
else np.empty((0, k), dtype=float)
)
hmc_points = (
np.vstack(all_hmc_pts_list) if all_hmc_pts_list else np.empty((0, k), dtype=float)
)
log.debug(
"Across %d basin(s): %d radial + %d HMC contour points.",
len(active_basins),
len(radial_points),
len(hmc_points),
)
# ------------------------------------------------------------------
# Assemble result
# ------------------------------------------------------------------
if len(radial_points) > 0 and len(hmc_points) > 0:
all_points = np.vstack([radial_points, hmc_points])
from_radial = np.concatenate(
[
np.ones(len(radial_points), dtype=bool),
np.zeros(len(hmc_points), dtype=bool),
]
)
elif len(radial_points) > 0:
all_points = radial_points
from_radial = np.ones(len(radial_points), dtype=bool)
elif len(hmc_points) > 0:
all_points = hmc_points
from_radial = np.zeros(len(hmc_points), dtype=bool)
else:
warnings.warn(
"No contour points were found. The confidence region may be "
"unbounded or the NLL landscape is numerically ill-conditioned.",
RuntimeWarning,
stacklevel=2,
)
all_points = np.empty((0, k), dtype=float)
from_radial = np.empty(0, dtype=bool)
return ContourResult(
theta_mle=theta_mle,
nll_min=nll_min,
threshold=threshold,
delta=delta,
contour_points=all_points,
from_radial=from_radial,
parameter_names=param_names,
confidence_level=confidence_level,
dof=k,
)
# ---------------------------------------------------------------------------
# Private helpers
# ---------------------------------------------------------------------------
def _find_mle(
stat_model,
signal_indices: List[int],
) -> Tuple[np.ndarray, float]:
r"""
Find the MLE of the signal parameters with :math:`\mu` fixed at **1**.
Calls :meth:`~spey.StatisticalModel.maximize_likelihood` with
``fixed_poi_value=1.0`` (which fixes :math:`\mu` via the ``fit``
optimizer) and ``poi_indices=signal_indices`` to retrieve the fitted
values for the requested parameters.
Should be called with ``all_signal_indices`` (all non-:math:`\mu`
parameters) so that the returned ``nll_min`` is the global minimum
over the full signal-parameter space and profile parameters start
from their optimal values.
Args:
stat_model: A :class:`~spey.StatisticalModel` instance.
signal_indices (``List[int]``): Indices of the parameters to
optimise (typically all non-:math:`\mu` parameters).
Returns:
``Tuple[np.ndarray, float]``:
``(theta_mle, nll_min)`` — the MLE signal-parameter vector of
shape ``(k,)`` (where ``k = len(signal_indices)``) and the
minimum NLL value :math:`\mathrm{NLL}(\mu{=}1,\,\hat\theta_s)`.
Raises:
RuntimeError: If ``maximize_likelihood`` returns a non-finite NLL.
"""
signal_dict, nll_min = stat_model.maximize_likelihood(
poi_indices=signal_indices,
return_nll=True,
fixed_poi_value=1.0, # fixes pars[poi_idx]=1; flows to fit()
)
theta_mle = np.array([signal_dict[i] for i in signal_indices], dtype=float)
nll_min = float(nll_min)
if not np.isfinite(nll_min):
raise RuntimeError(
f"maximize_likelihood returned a non-finite NLL ({nll_min}). "
"The model may be ill-conditioned."
)
return theta_mle, nll_min
def _find_basins_contour_subspace(
nll_scalar,
grad_nll,
bounds: List[Tuple[Optional[float], Optional[float]]],
theta_primary: np.ndarray,
nll_primary: float,
k: int,
n_multistart: int,
rng: np.random.Generator,
fallback_span: float = 5.0,
dedup_tol: float = 1e-3,
) -> List[Tuple[np.ndarray, float]]:
r"""
Enumerate distinct local minima of the (possibly profiled) contour-space NLL.
Starting points are drawn as follows: slot ``j`` uses ``U(lo, hi)`` when
both bounds are finite, otherwise a finite fallback span around
``theta_primary[j]``. Each start is polished with L-BFGS-B using the
supplied analytic gradient. Results are merged with ``theta_primary`` and
deduplicated in :math:`\theta`-space at tolerance *dedup_tol*, keeping the
lowest-NLL representative of each cluster.
Used to locate *disjoint* basins whose NLL is below the confidence
threshold: a single-start MLE cannot discover these, so the radial search
anchored only at the primary fit systematically misses parts of the
contour.
Args:
nll_scalar: NLL function returning a plain ``float``.
grad_nll: Gradient of NLL returning ``np.ndarray``.
bounds: Contour-space parameter bounds (one ``(lower, upper)`` per
contour parameter; ``None`` entries mean unbounded).
theta_primary: Result of the primary single-start MLE. Always kept
in the output unless dominated by a lower-NLL duplicate.
nll_primary: NLL at *theta_primary*.
k: Number of contour parameters.
n_multistart: Total number of random starts (including *theta_primary*).
Values below 1 are clamped to 1.
rng: Random generator used for starting-point sampling.
fallback_span: Half-width used to sample around *theta_primary* when
a bound is ``None``.
dedup_tol: Euclidean tolerance in :math:`\theta`-space for clustering
duplicate minima.
Returns:
Sorted list of ``(theta, nll)`` tuples, ascending by NLL. The first
entry is the global minimum found.
"""
results: List[Tuple[np.ndarray, float]] = [
(np.asarray(theta_primary, dtype=float).copy(), float(nll_primary))
]
n_extra = max(0, int(n_multistart) - 1)
for _ in range(n_extra):
start = np.empty(k, dtype=float)
for j, (lo, hi) in enumerate(bounds):
lo_f = lo if (lo is not None and np.isfinite(lo)) else None
hi_f = hi if (hi is not None and np.isfinite(hi)) else None
if lo_f is not None and hi_f is not None:
start[j] = float(rng.uniform(lo_f, hi_f))
elif lo_f is not None:
start[j] = float(lo_f + rng.uniform(0.0, fallback_span))
elif hi_f is not None:
start[j] = float(hi_f - rng.uniform(0.0, fallback_span))
else:
start[j] = float(
theta_primary[j] + rng.uniform(-fallback_span, fallback_span)
)
sp_bounds = [
(
lo if (lo is not None and np.isfinite(lo)) else None,
hi if (hi is not None and np.isfinite(hi)) else None,
)
for (lo, hi) in bounds
]
try:
res = _sp_minimize(
nll_scalar,
x0=start,
jac=grad_nll,
method="L-BFGS-B",
bounds=sp_bounds,
options={"maxiter": 500, "ftol": 1e-10, "gtol": 1e-8},
)
except Exception as exc: # noqa: BLE001
log.debug("_find_basins: L-BFGS-B from start %s failed: %s", start, exc)
continue
if not np.isfinite(res.fun):
continue
results.append((np.asarray(res.x, dtype=float).copy(), float(res.fun)))
results.sort(key=lambda t: t[1])
unique: List[Tuple[np.ndarray, float]] = []
for theta, nll in results:
is_new = True
for ut, _ in unique:
if float(np.linalg.norm(theta - ut)) < dedup_tol:
is_new = False
break
if is_new:
unique.append((theta, nll))
log.debug(
"_find_basins: %d starts → %d raw results → %d unique basins " "(best NLL %.6f).",
1 + n_extra,
len(results),
len(unique),
unique[0][1] if unique else float("nan"),
)
return unique
def _build_whitener(
hessian_fn,
theta_mle_full: np.ndarray,
contour_indices: List[int],
profile_indices: List[int],
k: int,
reg: float,
) -> Tuple[np.ndarray, np.ndarray]:
r"""
Construct the whitening matrices :math:`L` and :math:`L^{-T}` from the
**marginal** Fisher information for the contour parameters at the MLE.
The full observed Fisher information is evaluated at
:math:`(\mu{=}1,\,\hat\theta_s)`:
.. math::
G_\text{full} = -\nabla^2 \log\mathcal{L}(\mu{=}1,\,\hat\theta_s).
When there are no profile parameters the contour block is used directly:
.. math::
G = \bigl[G_\text{full}\bigr]_{ij},
\quad i,j \in \text{contour\_indices}.
When profile parameters are present the **Schur complement** gives the
marginal precision matrix that matches the curvature of the profiled NLL:
.. math::
G = G_{cc} - G_{cp}\,G_{pp}^{-1}\,G_{pc},
where :math:`G_{cc}`, :math:`G_{cp}`, :math:`G_{pp}` are the
contour–contour, contour–profile, and profile–profile blocks of
:math:`G_\text{full}`, respectively.
After spectral regularisation and Cholesky factorisation
:math:`G = LL^T`, the whitened coordinate
:math:`\varphi = L(\theta_c - \hat\theta_c)` maps the contour to
approximately a :math:`(k-1)`-sphere.
If any computation fails, the identity is used as a fallback.
Args:
hessian_fn: Callable returned by
:meth:`~spey.base.BackendBase.get_hessian_logpdf_func`.
theta_mle_full (``np.ndarray``): Full MLE parameter vector with
:math:`\mu=1` inserted at the correct position.
contour_indices (``List[int]``): Indices of the contour parameters.
profile_indices (``List[int]``): Indices of the profile parameters
(empty list when all signal params are contour params).
k (``int``): Number of contour parameters (``len(contour_indices)``).
reg (``float``): Minimum eigenvalue for spectral regularisation.
Returns:
``Tuple[np.ndarray, np.ndarray]``:
``(L, L_inv_T)`` — lower-triangular Cholesky factor and its
transpose-inverse, each of shape ``(k, k)``.
"""
try:
H_logpdf_full = np.array(hessian_fn(theta_mle_full.astype(float)), dtype=float)
ci = np.array(contour_indices)
G_cc = -H_logpdf_full[np.ix_(ci, ci)]
if profile_indices:
# Schur complement: marginal precision for contour params,
# accounting for the profiled (free) parameters.
pi = np.array(profile_indices)
G_pp = -H_logpdf_full[np.ix_(pi, pi)]
G_cp = -H_logpdf_full[np.ix_(ci, pi)]
G_pc = -H_logpdf_full[np.ix_(pi, ci)]
try:
G = G_cc - G_cp @ np.linalg.solve(G_pp, G_pc)
except np.linalg.LinAlgError:
warnings.warn(
"Schur complement solve failed; falling back to "
"block-diagonal approximation for whitening.",
RuntimeWarning,
stacklevel=3,
)
G = G_cc
else:
G = G_cc
except Exception as exc: # noqa: BLE001
warnings.warn(
f"Hessian computation failed ({exc}); falling back to identity "
"whitening (equivalent to no pre-conditioning).",
RuntimeWarning,
stacklevel=3,
)
return np.eye(k), np.eye(k)
# Spectral regularisation: clip small / negative eigenvalues
w, V = np.linalg.eigh(G)
w = np.maximum(w, reg)
G_reg = (V * w) @ V.T
try:
L = np.linalg.cholesky(G_reg)
except np.linalg.LinAlgError:
warnings.warn(
"Cholesky factorisation failed after regularisation; "
"falling back to identity whitening.",
RuntimeWarning,
stacklevel=3,
)
return np.eye(k), np.eye(k)
L_inv = np.linalg.solve(L, np.eye(k)) # L^{-1}
L_inv_T = L_inv.T # L^{-T}
return L, L_inv_T
def _radial_search(
nll_scalar,
theta_mle: np.ndarray,
threshold: float,
to_theta,
k: int,
n_samples: int,
max_bracket: float,
rng: np.random.Generator,
max_r_for_dir=None,
) -> Tuple[np.ndarray, np.ndarray]:
r"""
Locate contour points by root-finding along random rays in whitened space.
For each sample a unit vector :math:`\hat{e}` is drawn uniformly on
:math:`S^{k-1}` by normalising a standard Gaussian vector. The
one-dimensional function
.. math::
f(r) = \mathrm{NLL}\!\bigl(\theta(r)\bigr) - T,
\qquad \theta(r) = \hat\theta + L^{-T}(r\hat{e}),
satisfies :math:`f(0) < 0` (MLE is inside the contour) and
:math:`f(r) \to +\infty` as :math:`r \to \infty`. The unique root
:math:`r^*` is found with Brent's method
(:func:`scipy.optimize.root_scalar`).
The upper bracket is initialised at :math:`r_\text{init} = \sqrt{\Delta}`
(the approximate whitened-space radius) and doubled until :math:`f > 0`
or the maximum :math:`r_\text{max}` is reached.
Args:
nll_scalar: NLL function returning a plain ``float``.
theta_mle (``np.ndarray``): MLE, shape ``(k,)``.
threshold (``float``): NLL threshold :math:`T`.
to_theta: Callable mapping whitened coordinate :math:`\varphi` to
:math:`\theta`, i.e. :math:`\theta = \hat\theta + L^{-T}\varphi`.
k (``int``): Parameter-space dimension.
n_samples (``int``): Number of ray directions to try.
max_bracket (``float``): Upper bound on the radial search in
whitened-space units.
rng (``np.random.Generator``): Random number generator.
max_r_for_dir: Optional callable ``e -> float`` that returns the
maximum :math:`r` allowed in direction *e* before a parameter
bound is violated. ``None`` means no bounds are active.
Returns:
``Tuple[np.ndarray, np.ndarray]``:
``(contour_points, directions)`` — arrays of shape
``(n_found, k)`` and ``(n_found, k)`` respectively, where
*n_found* ≤ *n_samples*.
"""
nll_at_mle = nll_scalar(theta_mle)
# Guard: if numerical NLL at MLE already exceeds threshold, something is wrong
if nll_at_mle >= threshold:
warnings.warn(
f"NLL at MLE ({nll_at_mle:.6f}) ≥ threshold ({threshold:.6f}). "
"The MLE may not be well-converged; radial search will be skipped.",
RuntimeWarning,
stacklevel=3,
)
return np.empty((0, k), dtype=float), np.empty((0, k), dtype=float)
# Approximate whitened-space radius for initial bracket
r_init = float(np.sqrt(max(threshold - nll_at_mle, 1.0) * 2.0))
contour_points: List[np.ndarray] = []
directions: List[np.ndarray] = []
for _ in range(n_samples):
# Sample uniform direction on S^{k-1}
z = rng.standard_normal(size=k)
norm = np.linalg.norm(z)
if norm < 1e-15:
continue
e = z / norm
def f(r: float) -> float: # noqa: ANN001
return nll_scalar(to_theta(r * e)) - threshold
# Effective upper bracket: limited by max_bracket AND parameter bounds
r_bounds_max: float = max_r_for_dir(e) if max_r_for_dir is not None else np.inf
effective_max: float = min(max_bracket, r_bounds_max)
if effective_max < 1e-10:
# MLE is at or past a boundary in this direction; skip
continue
r_lo: float = 1e-8
r_hi: float = min(r_init, effective_max)
try:
f_lo = f(r_lo)
if f_lo >= 0.0:
# MLE not inside contour along this direction — very unusual
continue
except Exception: # noqa: BLE001
continue
try:
while f(r_hi) <= 0.0 and r_hi < effective_max:
r_hi = min(r_hi * 2.0, effective_max)
if f(r_hi) <= 0.0:
# Contour not found within max_bracket / bounds in this direction
continue
except Exception: # noqa: BLE001
continue
try:
result = root_scalar(
f,
bracket=[r_lo, r_hi],
method="brentq",
xtol=1e-12,
rtol=1e-12,
)
except ValueError:
continue
if result.converged:
theta_star = to_theta(result.root * e)
contour_points.append(theta_star)
directions.append(e)
if not contour_points:
return np.empty((0, k), dtype=float), np.empty((0, k), dtype=float)
return np.array(contour_points, dtype=float), np.array(directions, dtype=float)
def _detect_gap_directions(
found_dirs: np.ndarray,
k: int,
n_candidates: int,
n_gaps: int,
rng: np.random.Generator,
) -> np.ndarray:
r"""
Identify directions in whitened space with sparse radial coverage.
A set of :math:`M` candidate unit vectors is drawn uniformly on
:math:`S^{k-1}`. For each candidate :math:`c` the *coverage score*
.. math::
s(c) = \max_{d \in \mathcal{D}} \langle c, d \rangle
measures its cosine similarity to the nearest already-found direction
:math:`d` in the set :math:`\mathcal{D}`. A small :math:`s(c)` (close
to :math:`-1`) indicates that :math:`c` lies in an angular gap. The
:math:`n_\text{gaps}` candidates with the smallest :math:`s` are returned
as RATTLE seeds.
If no radial directions have been found, :math:`n_\text{gaps}` random
directions are returned.
Args:
found_dirs (``np.ndarray``): Unit directions already explored by the
radial search, shape ``(n_found, k)``.
k (``int``): Parameter-space dimension.
n_candidates (``int``): Number of candidate directions to sample
(:math:`M` in the description above).
n_gaps (``int``): Number of gap directions to return.
rng (``np.random.Generator``): Random number generator.
Returns:
``np.ndarray``: Gap-direction unit vectors, shape
``(n_gaps, k)``.
"""
Z = rng.standard_normal(size=(n_candidates, k))
norms = np.linalg.norm(Z, axis=1, keepdims=True)
candidates = Z / np.maximum(norms, 1e-15)
if len(found_dirs) == 0:
return candidates[:n_gaps]
# cos_sim[i, j] = dot(candidates[i], found_dirs[j])
cos_sim = candidates @ found_dirs.T # (n_candidates, n_found)
max_cos = np.max(cos_sim, axis=1) # coverage score per candidate
# Least-covered = smallest max cosine
gap_idx = np.argsort(max_cos)[:n_gaps]
return candidates[gap_idx]
def _find_gap_starts(
gap_dirs: np.ndarray,
nll_scalar,
theta_mle: np.ndarray,
threshold: float,
to_theta,
max_bracket: float,
max_r_for_dir,
radial_points: np.ndarray,
radial_dirs: np.ndarray,
) -> List[np.ndarray]:
r"""
Find a contour starting point for each gap direction.
For each unit vector :math:`\hat{e}` in *gap_dirs* the function attempts
a targeted radial search — a Brent root-finding call along the ray
:math:`\theta(r) = \hat\theta + L^{-T}(r\hat{e})` — so that the
returned point lies directly on the contour in the gap region.
If the radial search fails (the contour is not crossed within the
bracket, or the NLL evaluation raises an exception), the function falls
back to the nearest already-found radial point measured by cosine
similarity of directions. If no radial points exist at all, that gap
direction is silently skipped.
This two-step strategy ensures chains start *inside* each detected gap
rather than at a nearby dense region, which is critical for non-spherical
or multi-modal confidence contours where angular proximity in whitened
space does not imply geometric proximity on the manifold.
Args:
gap_dirs (``np.ndarray``): Unit gap directions in whitened space,
shape ``(n_gaps, k)``.
nll_scalar: NLL function returning a plain ``float``.
theta_mle (``np.ndarray``): MLE in contour-parameter space, shape
``(k,)``.
threshold (``float``): NLL threshold :math:`T`.
to_theta: Callable mapping whitened coordinate :math:`\varphi`
to signal-parameter space :math:`\theta`.
max_bracket (``float``): Maximum radial bracket in whitened space.
max_r_for_dir: Optional callable ``e -> float`` returning the
maximum allowed :math:`r` in direction *e* before a parameter
bound is hit. ``None`` means no bounds.
radial_points (``np.ndarray``): Already-found radial contour points,
shape ``(n_radial, k)``. Used as fallback starts.
radial_dirs (``np.ndarray``): Corresponding whitened-space unit
directions, shape ``(n_radial, k)``.
Returns:
``List[np.ndarray]``: Starting points (one per successfully handled
gap direction), each of shape ``(k,)``.
"""
nll_at_mle = nll_scalar(theta_mle)
r_init = float(np.sqrt(max(threshold - nll_at_mle, 1.0) * 2.0))
has_radial = len(radial_points) > 0
starts: List[np.ndarray] = []
for e in gap_dirs:
r_max = max_r_for_dir(e) if max_r_for_dir is not None else np.inf
eff_max = min(max_bracket, r_max)
found = False
if eff_max > 1e-10:
r_lo: float = 1e-8
r_hi: float = min(r_init, eff_max)
# Use a default-argument capture of e so the closure is correct
# even if Python reuses the loop variable name.
def f(r: float, _e: np.ndarray = e) -> float: # noqa: ANN001
return nll_scalar(to_theta(r * _e)) - threshold
try:
if f(r_lo) < 0.0:
while f(r_hi) <= 0.0 and r_hi < eff_max:
r_hi = min(r_hi * 2.0, eff_max)
if f(r_hi) > 0.0:
res = root_scalar(
f,
bracket=[r_lo, r_hi],
method="brentq",
xtol=1e-12,
rtol=1e-12,
)
if res.converged:
starts.append(to_theta(res.root * e))
found = True
except Exception: # noqa: BLE001
pass
if not found:
if has_radial:
nearest = int(np.argmax(radial_dirs @ e))
starts.append(radial_points[nearest].copy())
# else: no fallback available — silently skip this gap direction
return starts
def _find_sparse_seeds(
all_points: np.ndarray,
n_seeds: int,
knn_k: int,
rng: np.random.Generator,
) -> Optional[np.ndarray]:
r"""
Identify the most sparsely covered contour points via :math:`k`-NN distance.
A k-d tree is built over *all_points* and the :math:`k`-th nearest-neighbour
distance :math:`d_k(\theta^*)` is computed for every point
:math:`\theta^* \in` *all_points*. Points with the largest
:math:`d_k` values lie in the sparsest regions of the point cloud and
are ideal seeds for new RATTLE chains.
This metric is unbiased with respect to the geometry of the contour:
it detects sparse regions purely from the distribution of existing points,
regardless of their direction from the MLE. It therefore complements
the angular gap detection of Stage 3, which can miss geometric gaps on
highly curved or non-Gaussian contours.
Args:
all_points (``np.ndarray``): All contour points collected so far,
shape ``(n, k)``.
n_seeds (``int``): Number of sparse starting points to return.
knn_k (``int``): Neighbour index used for the density estimate;
clamped to ``min(knn_k, n - 1)``.
rng (``np.random.Generator``): Random number generator (reserved
for future tie-breaking; currently unused).
Returns:
``Optional[np.ndarray]``: The *n_seeds* most isolated points, shape
``(n_seeds, k)``, or ``None`` when the point cloud is too small
(:math:`n < 2`).
"""
from scipy.spatial import cKDTree # noqa: PLC0415
n = len(all_points)
if n < 2:
return None
k_eff = min(knn_k, n - 1)
tree = cKDTree(all_points)
# query k_eff + 1 neighbours: index 0 is the point itself (distance 0)
dists, _ = tree.query(all_points, k=k_eff + 1)
knn_dists = dists[:, k_eff] # distance to the k_eff-th nearest neighbour
n_out = min(n_seeds, n)
sparse_idx = np.argsort(knn_dists)[-n_out:] # largest distances first
return all_points[sparse_idx]
def _project_theta(
nll_scalar,
grad_nll,
theta: np.ndarray,
threshold: float,
tol: float,
max_iter: int,
) -> np.ndarray:
r"""
Project :math:`\theta` onto the constraint surface
:math:`\mathrm{NLL}(\theta) = T` via Newton's method (SHAKE step).
The Newton correction is
.. math::
\lambda = \frac{\mathrm{NLL}(\theta) - T}{|\nabla\mathrm{NLL}(\theta)|^2},
\qquad
\theta \leftarrow \theta - \lambda\,\nabla\mathrm{NLL}(\theta),
repeated until :math:`|\mathrm{NLL}(\theta) - T| < \varepsilon_\text{tol}`
or *max_iter* iterations are exhausted.
This is a scalar version of the SHAKE constraint-correction used in
molecular dynamics :footcite:t:`andersen1983rattle`.
Args:
nll_scalar: NLL function returning a plain ``float``.
grad_nll: Gradient of NLL, returning a ``np.ndarray``.
theta (``np.ndarray``): Starting point, shape ``(k,)``.
threshold (``float``): Target NLL value :math:`T`.
tol (``float``): Convergence tolerance on the constraint residual.
max_iter (``int``): Maximum number of Newton iterations.
Returns:
``np.ndarray``: Projected point on (or very close to) the constraint
surface, shape ``(k,)``.
"""
for _ in range(max_iter):
residual = nll_scalar(theta) - threshold
if abs(residual) < tol:
break
g = grad_nll(theta)
g_sq = float(np.dot(g, g))
if g_sq < 1e-20:
break
theta = theta - (residual / g_sq) * g
return theta
def _rattle_step(
nll_scalar,
grad_nll,
theta: np.ndarray,
p: np.ndarray,
threshold: float,
step_size: float,
tol: float,
max_iter: int,
) -> Tuple[np.ndarray, np.ndarray]:
r"""
Perform one RATTLE leapfrog step on the constraint manifold
:math:`\mathrm{NLL}(\theta) = T`.
The RATTLE algorithm :footcite:t:`andersen1983rattle` integrates
constrained Hamiltonian dynamics with the Hamiltonian
.. math::
H(\theta, p) = \tfrac{1}{2}|p|^2 + \mathrm{NLL}(\theta),
subject to the holonomic constraint
:math:`C(\theta) = \mathrm{NLL}(\theta) - T = 0`.
The update equations (one leapfrog step, step size :math:`\varepsilon`)
are:
.. math::
p_{1/2} &= p_0
- \tfrac{\varepsilon}{2}\,\nabla\mathrm{NLL}(\theta_0), \\[3pt]
\theta' &= \theta_0 + \varepsilon\, p_{1/2}, \\[3pt]
\theta_1 &= \texttt{project}(\theta'),
\quad \text{(SHAKE, Newton)} \\[3pt]
p' &= p_{1/2}
- \tfrac{\varepsilon}{2}\,\nabla\mathrm{NLL}(\theta_1), \\[3pt]
p_1 &= p' - \frac{p' \cdot \nabla\mathrm{NLL}(\theta_1)}
{|\nabla\mathrm{NLL}(\theta_1)|^2}
\,\nabla\mathrm{NLL}(\theta_1).
The final line projects the momentum onto the tangent space of the
constraint at :math:`\theta_1`, ensuring
:math:`p_1 \perp \nabla C(\theta_1)` as required for the next step.
Args:
nll_scalar: NLL function returning a plain ``float``.
grad_nll: Gradient :math:`\nabla\mathrm{NLL}`, returning
``np.ndarray``.
theta (``np.ndarray``): Current position on the constraint, shape
``(k,)``.
p (``np.ndarray``): Current momentum in the tangent space, shape
``(k,)``.
threshold (``float``): NLL threshold :math:`T`.
step_size (``float``): Leapfrog step size :math:`\varepsilon`.
tol (``float``): Newton convergence tolerance for constraint
projection.
max_iter (``int``): Maximum Newton iterations in projection.
Returns:
``Tuple[np.ndarray, np.ndarray]``:
``(theta_new, p_new)`` — updated position on the constraint and
tangent-space momentum.
"""
eps = step_size
g0 = grad_nll(theta)
# Half-kick
p_half = p - (eps / 2.0) * g0
# Drift
theta_prime = theta + eps * p_half
# SHAKE: project theta_prime onto the constraint surface
theta_new = _project_theta(
nll_scalar, grad_nll, theta_prime, threshold, tol, max_iter
)
# Half-kick at new position
g1 = grad_nll(theta_new)
p_prime = p_half - (eps / 2.0) * g1
# Project momentum to tangent space of constraint at theta_new
g1_sq = float(np.dot(g1, g1))
if g1_sq > 1e-20:
p_new = p_prime - (float(np.dot(p_prime, g1)) / g1_sq) * g1
else:
p_new = p_prime
return theta_new, p_new
def _rattle_walk(
nll_scalar,
grad_nll,
start: np.ndarray,
threshold: float,
n_steps: int,
step_size: float,
tol: float,
max_iter: int,
rng: np.random.Generator,
bounds: Optional[List[Tuple[Optional[float], Optional[float]]]] = None,
) -> np.ndarray:
r"""
Walk along the constraint manifold from *start* using RATTLE.
The chain begins by projecting *start* onto
:math:`\mathrm{NLL}(\theta) = T` (in case it is slightly off-surface
due to numerical error) and drawing an initial momentum uniformly from
the tangent space. At each step :func:`_rattle_step` is called; failed
steps (numerical errors) terminate the chain early.
The momentum is periodically re-sampled (every ``max(1, n_steps // 5)``
steps) to prevent the chain from retracing its path and to improve
coverage of the manifold.
Args:
nll_scalar: NLL function returning a plain ``float``.
grad_nll: Gradient of NLL returning ``np.ndarray``.
start (``np.ndarray``): Starting point (near the constraint),
shape ``(k,)``.
threshold (``float``): NLL threshold :math:`T`.
n_steps (``int``): Total number of leapfrog steps.
step_size (``float``): Leapfrog step size :math:`\varepsilon`.
tol (``float``): Newton projection tolerance.
max_iter (``int``): Maximum Newton iterations per step.
rng (``np.random.Generator``): Random number generator.
bounds (``Optional[List[Tuple[Optional[float], Optional[float]]]]``,
default ``None``): Per-parameter ``(lower, upper)`` bounds in
signal-parameter space. ``None`` entries indicate no bound on
that side. If any element of ``theta`` violates a finite
bound after a RATTLE step the chain is terminated early and
the points collected so far are returned.
Returns:
``np.ndarray``: Trajectory of contour points, shape
``(n_collected, k)`` where *n_collected* ≤ *n_steps* + 1.
"""
k = len(start)
# Project start onto constraint (handles small numerical drift)
theta = _project_theta(nll_scalar, grad_nll, start.copy(), threshold, tol, max_iter)
def _tangent_momentum(theta_: np.ndarray) -> np.ndarray:
r"""
Draw a random unit vector from the tangent space of the constraint
at *theta_*:
.. math::
p \sim \mathcal{N}(0, I_k),
\qquad
p \leftarrow p - \frac{p \cdot \hat{n}}{\hat{n} \cdot \hat{n}} \hat{n},
\quad \hat{n} = \nabla\mathrm{NLL}(\theta_).
"""
g = grad_nll(theta_)
g_sq = float(np.dot(g, g))
raw = rng.standard_normal(size=k)
if g_sq > 1e-20:
raw = raw - (float(np.dot(raw, g)) / g_sq) * g
n = np.linalg.norm(raw)
return raw / n if n > 1e-15 else raw
p = _tangent_momentum(theta)
resample_every = max(1, n_steps // 5)
trajectory = [theta.copy()]
for step in range(n_steps):
# Periodic momentum re-sampling to improve manifold exploration
if step > 0 and step % resample_every == 0:
p = _tangent_momentum(theta)
try:
theta, p = _rattle_step(
nll_scalar,
grad_nll,
theta,
p,
threshold,
step_size,
tol,
max_iter,
)
except Exception: # noqa: BLE001
log.debug("RATTLE step failed at step %d; terminating chain.", step)
break
# Terminate if any parameter has left its allowed range
if bounds is not None:
if any(
(lo is not None and theta[i] < lo) or (hi is not None and theta[i] > hi)
for i, (lo, hi) in enumerate(bounds)
):
log.debug(
"RATTLE chain left parameter bounds at step %d; terminating.", step
)
break
trajectory.append(theta.copy())
return np.array(trajectory, dtype=float)
def _run_hmc_chains(
nll_scalar,
grad_nll,
threshold: float,
starts: np.ndarray,
n_steps: int,
step_size: float,
tol: float,
max_iter: int,
rng: np.random.Generator,
n_jobs: int = 1,
bounds: Optional[List[Tuple[Optional[float], Optional[float]]]] = None,
) -> np.ndarray:
r"""
Launch one RATTLE chain per provided starting point.
Chains are independent and dispatched to *n_jobs* worker processes via
``joblib.Parallel``. Each worker receives its own
:class:`numpy.random.Generator` seeded from *rng* before forking so
that results are reproducible while avoiding shared-state race conditions.
Args:
nll_scalar: NLL function returning a plain ``float``.
grad_nll: Gradient of NLL returning ``np.ndarray``.
threshold (``float``): NLL threshold :math:`T`.
starts (``np.ndarray``): Starting points for each chain, shape
``(n_chains, k)``. Every row should lie on (or very close to)
the contour :math:`\mathrm{NLL}(\theta) = T`; a Newton
projection step at the beginning of :func:`_rattle_walk` corrects
any small numerical drift.
n_steps (``int``): RATTLE steps per chain.
step_size (``float``): Leapfrog step size.
tol (``float``): Newton tolerance.
max_iter (``int``): Maximum Newton iterations.
rng (``np.random.Generator``): Random number generator used to
derive per-chain seeds.
n_jobs (``int``, default ``1``): Number of parallel worker processes.
bounds (``Optional[List[Tuple[Optional[float], Optional[float]]]]``,
default ``None``): Per-parameter ``(lower, upper)`` bounds
forwarded to :func:`_rattle_walk`.
Returns:
``np.ndarray``: All RATTLE trajectory points concatenated, shape
``(n_total, k)``. Empty ``(0, k)`` array if no chains produce
points.
"""
if len(starts) == 0:
k = starts.shape[1] if starts.ndim == 2 and starts.shape[1] > 0 else 1
return np.empty((0, k), dtype=float)
n, k = starts.shape
# Pre-generate per-chain seeds from the main rng *before* forking so
# the rng state advances deterministically regardless of n_jobs.
seeds: np.ndarray = np.array(
[int(rng.integers(2**63)) for _ in range(n)], dtype=int
)
def _one_chain(starts: np.ndarray, seeds: np.ndarray, idx: int) -> np.ndarray:
start = starts[idx]
seed = seeds[idx]
chain_rng = np.random.default_rng(seed)
return _rattle_walk(
nll_scalar,
grad_nll,
start,
threshold,
n_steps,
step_size,
tol,
max_iter,
chain_rng,
bounds,
)
loop = tqdm(
iterable=range(n),
total=n,
desc="Running HMC Chain",
bar_format="{l_bar}{bar:20}{r_bar}{bar:-20b}",
)
try:
trajs = Parallel(
n_jobs=n_jobs, prefer="processes", batch_size="auto", max_nbytes="10M"
)(delayed(_one_chain)(starts, seeds, idx) for idx in loop)
except Exception as err:
log.debug("An error has occured: %s", err)
trajs = [_one_chain(starts, seeds, idx) for idx in loop]
all_trajs = [t for t in trajs if len(t) > 0]
if not all_trajs:
return np.empty((0, k), dtype=float)
return np.vstack(all_trajs)