Source code for spey.backends.default_pdf.uncertainty_synthesizer

r"""
Theoretical Uncertainties
=========================

This module implements the **log-normal morphing** scheme used to propagate signal
systematic uncertainties through the simplified-likelihood backends.

Motivation
----------

Simplified likelihood backends parametrise background uncertainties through nuisance
parameters :math:`\boldsymbol{\theta}` that enter the Poisson mean :math:`\lambda_i`.
Signal yields, however, are often treated as fixed.  When the signal itself carries
non-negligible uncertainties (e.g. from theoretical scale variations, PDF
uncertainties, or detector modelling), those uncertainties must also be profiled.

The synthesizer introduces additional nuisance parameters per uncertainty *source*
and applies a multiplicative correction to the signal yields.  This keeps the signal
positive for all nuisance values and is consistent with the log-normal morphing
conventions widely used in HEP (e.g. HistFactory).

Two morphing modes are supported, selected via the ``"type"`` key of each modifier
configuration dictionary:

* **normalization** — one nuisance parameter :math:`\theta_k` shared across all bins.
  The factor :math:`f_{i,k}(\theta_k)` is the same for every bin modulo the
  per-bin fractional variation :math:`\Delta_{i,k}`.  Use this for *correlated*
  uncertainties such as PDF variations or luminosity, where a single degree of
  freedom simultaneously shifts all bins.

* **shape** — one independent nuisance parameter :math:`\alpha_{i,k}` per bin.
  Each bin is shifted independently.  Use this for *uncorrelated* uncertainties
  such as statistical uncertainties on theory predictions or scale uncertainties
  that affect each bin differently.

Log-normal morphing
-------------------

For each uncertainty source :math:`k` and bin :math:`i`, a fractional variation
:math:`\Delta_{i,k}` is defined as

.. math::

    \Delta_{i,k} = \frac{\sigma^{(s)}_{i,k}}{n^{(s)}_i},

where :math:`n^{(s)}_i` is the nominal signal yield and :math:`\sigma^{(s)}_{i,k}`
is the absolute uncertainty from source :math:`k` in bin :math:`i`.

**Normalization modifier** (``"type": "normalization"``)

A single nuisance parameter :math:`\theta_k` is shared across all bins.  The morphing
factor for source :math:`k` is

.. math::

    f_{i,k}(\theta_k)
    = \exp\!\left[\theta_k \ln\!\bigl(1 + \Delta_{i,k}(\theta_k)\bigr)\right],

where the effective variation depends on the sign of :math:`\theta_k`:

.. math::

    \Delta_{i,k}(\theta_k) =
    \begin{cases}
        \Delta_{i,k}^{+}, & \theta_k \geq 0 \\
        \Delta_{i,k}^{-}, & \theta_k < 0
    \end{cases}

(symmetric uncertainties use :math:`\Delta^+ = \Delta^-`).

**Shape modifier** (``"type": "shape"``)

Each bin :math:`i` has its own independent nuisance parameter :math:`\alpha_{i,k}`,
so the morphing factor is

.. math::

    f_{i,k}(\alpha_{i,k})
    = \exp\!\left[\alpha_{i,k}
      \ln\!\bigl(1 + \Delta_{i,k}(\alpha_{i,k})\bigr)\right],

with :math:`\Delta_{i,k}(\alpha_{i,k})` chosen by the sign of :math:`\alpha_{i,k}` as
above.  The per-bin parameters are uncorrelated; each is constrained by an independent
standard normal prior.

**Combined modifier**

When multiple sources are present the total signal modifier is the product of the
individual morphing factors:

.. math::

    f_i(\boldsymbol{\theta}_{\rm sig})
    = \prod_k f_{i,k}
    = \exp\!\left[\sum_k \ln f_{i,k}\right].

The effective signal yield in bin :math:`i` thus becomes
:math:`\mu\, n^{(s)}_i \cdot f_i(\boldsymbol{\theta}_{\rm sig})`.

Each nuisance parameter is constrained by a standard normal prior
:math:`\mathcal{N}(\theta \mid 0, 1)`, which is added to the constraint model
of the calling backend.

Parameter layout
----------------

Signal uncertainty parameters are appended to the end of the parameter vector after
the background nuisance parameters::

    pars = [μ, θ₁, …, θ_N,  θ_{sig,1}, …]

For a **normalization** modifier, one parameter is appended; for a **shape** modifier,
:math:`N_{\rm bins}` parameters are appended (one per bin).  The ``domain`` field in
each constraint dictionary records the index of the corresponding parameter in this
vector.

Bins with zero nominal signal yield (:math:`n^{(s)}_i = 0`) produce
:math:`\Delta_{i,k} = \mathrm{NaN}`, which is replaced by 1 (no modification)
so the exponential evaluates to 1 for those bins.

.. currentmodule:: spey.backends.default_pdf.uncertainty_synthesizer

.. autosummary::
    :toctree: ../_generated/

    signal_uncertainty_synthesizer

Why there are two choices of modifiers?
---------------------------------------

This module is designed with PDF and scale uncertainties in mind. Neither PDF
uncertainties nor QCD scale uncertainties are typically pure normalisation uncertainties.
Both originate from the theoretical calculation of the signal cross-section and can alter
the shape of any differential distribution used as a discriminating observable.

**PDF uncertainties** come from the parametric uncertainty in the parton distribution functions.
Because different parton flavours and momentum fractions dominate different kinematic regimes,
varying PDF eigenvectors or replicas (following the PDF4LHC prescription in :cite:`Butterworth:2015oua`)
changes both the inclusive cross-section and the relative contributions of different kinematic
regions. In a multi-bin fit over, say, invariant mass or MET, PDF variations therefore produce
correlated, bin-dependent rate changes that affect the shape of the template.

**QCD scale uncertainties** (:math:`\mu_R` and :math:`\mu_F` variations by conventional factors
of 2) are proxies for missing higher-order corrections. Because fixed-order corrections are
themselves kinematic-dependent, e.g., they grow with the hardness of the event, scale variations
tend to affect the high-end tail of a distribution differently from the bulk. This is well
documented: for example, ATLAS measurements of t-tbar differential distributions
:cite:`Harland-Lang:2016yfn` and inclusive cross-section analyses consistently show bin-dependent
scale-variation envelopes.

In both cases, the dominant effect is often a normalisation shift, but the shape change is
physically real and in a multi-bin analysis, it is generally not safe to discard it.

"""

import warnings
from functools import partial, reduce
import logging
from typing import Any, Dict, List, Optional

import autograd.numpy as np

from spey.system.exceptions import InvalidUncertaintyDefinition

# pylint: disable=E1101,E1120

log = logging.getLogger("Spey")

#: Allowed modifier type strings.
MODIFIER_TYPES = ("normalization", "shape")


[docs] def signal_uncertainty_synthesizer( signal_yields: List[float], modifiers: List[Dict[str, Any]], n_signal_parameters: int = 0, domain_start: Optional[int] = None, ) -> Dict[str, Any]: r""" Synthesize signal uncertainties from a list of modifier configuration dictionaries. .. versionchanged:: 0.2.7 ``modifiers`` is now a list of configuration dictionaries. Each dictionary specifies the modifier type (``"normalization"`` or ``"shape"``), the per-bin uncertainty values, and an optional name. The old list-of-arrays format is no longer accepted. Each modifier applies a multiplicative correction to the signal yields via log-normal morphing. Two morphing modes are supported: **Normalization modifier** (``"type": "normalization"``) A *single* nuisance parameter :math:`\theta_k` is shared across all bins. Suitable for uncertainties that rescale the entire distribution coherently, such as PDF variations, luminosity uncertainties, or cross-section normalisation errors. The morphing factor is .. math:: f_{i,k}(\theta_k) = \exp\!\left[\theta_k \ln\!\bigl(1 + \Delta_{i,k}(\theta_k)\bigr)\right], where :math:`\Delta_{i,k} = \sigma^{(s)}_{i,k} / n^{(s)}_i` and the sign of :math:`\theta_k` selects the up or down variation. **Shape modifier** (``"type": "shape"``) One *independent* nuisance parameter :math:`\alpha_{i,k}` per bin. Suitable for uncertainties whose effect differs bin-by-bin in an unknown way, such as statistical uncertainties on theory predictions or bin-by-bin scale uncertainties. The morphing factor is .. math:: f_{i,k}(\alpha_{i,k}) = \exp\!\left[\alpha_{i,k} \ln\!\bigl(1 + \Delta_{i,k}(\alpha_{i,k})\bigr)\right], with each :math:`\alpha_{i,k}` constrained by an independent standard normal prior :math:`\mathcal{N}(\alpha_{i,k} \mid 0, 1)`. For asymmetric uncertainties the variation depends on the sign of the nuisance parameter: .. math:: \Delta_{i,k}(\theta) = \begin{cases} \Delta_{i,k}^{+}, & \theta \ge 0 \\ \Delta_{i,k}^{-}, & \theta < 0 \end{cases} All nuisance parameters are constrained by standard normal priors. .. note:: Symmetric uncertainties must be provided as ``list[float]`` in ``"uncertainties"``. Asymmetric uncertainties must be provided as ``list[tuple[float, float]]``, where each tuple is ``(up, down)`` giving the absolute up and down variations. **Modifier configuration dictionary** Each element of ``modifiers`` is a :class:`dict` with the following keys: .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Key - Required - Description * - ``"type"`` - yes - Morphing mode: ``"normalization"`` (one shared nuisance) or ``"shape"`` (one nuisance per bin). * - ``"uncertainties"`` - yes - Per-bin absolute uncertainties. Either a ``list[float]`` for symmetric uncertainties or a ``list[tuple[float, float]]`` for asymmetric ``(up, down)`` variations. Must have one entry per bin. * - ``"name"`` - no - Human-readable label used to construct parameter names (e.g. ``"pdf"`` → ``theta_sig_pdf``). Defaults to ``"mod0"``, ``"mod1"``, … when omitted. **Parameter layout with** ``n_signal_parameters`` When the calling backend uses a callable ``signal_yields`` that accepts ``n_signal_parameters`` additional free parameters, those parameters occupy indices ``1 … n_signal_parameters`` in the full parameter vector and push the background nuisance parameters (and therefore the signal-uncertainty parameters) to higher indices:: pars = [μ, sig_par_0, …, sig_par_{n-1}, θ_bkg_1, …, θ_bkg_N, θ_sig_1, …] Pass ``n_signal_parameters`` so that the domain indices are shifted accordingly. Alternatively, supply ``domain_start`` to override the auto-computed starting index. **Examples** *Normalization modifier* — a single nuisance scales the whole distribution: .. code:: python3 >>> signal_uncertainty_synthesizer( ... signal_yields=[3.0, 5.0], ... modifiers=[ ... {"type": "normalization", "name": "pdf", "uncertainties": [0.6, 1.0]}, ... ], ... ) # one nuisance parameter theta_sig_pdf at domain index 3 *Shape modifier* — independent nuisance per bin: .. code:: python3 >>> signal_uncertainty_synthesizer( ... signal_yields=[3.0, 5.0], ... modifiers=[ ... {"type": "shape", "name": "scale", "uncertainties": [0.3, 0.5]}, ... ], ... ) # two nuisance parameters theta_sig_scale_0, theta_sig_scale_1 at domain # indices 3 and 4 *Mixed modifiers* — normalization + shape in one model: .. code:: python3 >>> signal_uncertainty_synthesizer( ... signal_yields=[3.0, 5.0], ... modifiers=[ ... {"type": "normalization", "name": "pdf", ... "uncertainties": [0.6, 1.0]}, ... {"type": "shape", "name": "scale", ... "uncertainties": [(0.3, 0.4), (0.5, 0.6)]}, ... ], ... ) # 3 nuisance parameters total: theta_sig_pdf (index 3), # theta_sig_scale_0 (index 4), theta_sig_scale_1 (index 5) Args: signal_yields (``List[float]``): List of nominal signal yields per bin. Used to compute the fractional variation :math:`\Delta_{i,k}`. modifiers (``List[Dict[str, Any]]``): List of modifier configuration dictionaries. Each dictionary must contain ``"type"`` and ``"uncertainties"`` keys; ``"name"`` is optional. See table above. n_signal_parameters (``int``, default ``0``): Number of additional free parameters that a callable ``signal_yields`` function accepts. These parameters are placed *before* the background nuisance parameters in the parameter vector (immediately after :math:`\mu`), so the domain indices of the signal-uncertainty parameters are shifted by this amount. Has no effect when ``domain_start`` is supplied explicitly. domain_start (``int``, default ``None``): Explicit starting index in the parameter vector from which signal-uncertainty nuisance parameters are assigned. When ``None`` the index is computed automatically as ``1 + n_signal_parameters + N_bins``. Raises: :exc:`~spey.system.exceptions.InvalidUncertaintyDefinition`: If a modifier has an unknown ``"type"``, if the number of bins in ``"uncertainties"`` does not match ``signal_yields``, or if the uncertainty array has an unsupported number of dimensions. Returns: A dictionary with * ``"lambda"`` — callable ``(pars: np.ndarray) -> np.ndarray`` that computes the per-bin signal modifier :math:`f_i(\boldsymbol{\theta}_{\rm sig})`. * ``"constraint"`` — list of constraint dictionaries (one per nuisance parameter) for the constraint model. * ``"n_parameters"`` — total number of signal-uncertainty nuisance parameters introduced. This is 1 per normalization modifier and :math:`N_{\rm bins}` per shape modifier. * ``"parameter_names"`` — list of parameter name strings corresponding to the signal-uncertainty nuisance parameters, in parameter-vector order. """ n_bins = len(signal_yields) signal_yields_arr = np.array(signal_yields, dtype=float) lambdas = [] constraints = [] parameter_names = [] if domain_start is None: current_idx = 1 + n_signal_parameters + n_bins else: current_idx = int(domain_start) initial_idx = current_idx for mod_idx, modifier in enumerate(modifiers): mod_type = modifier.get("type") if mod_type not in MODIFIER_TYPES: raise InvalidUncertaintyDefinition( f"Unknown modifier type '{mod_type}'. " f"Expected one of {MODIFIER_TYPES}." ) values = np.array(modifier["uncertainties"]) mod_name = modifier.get("name", f"mod{mod_idx}") if len(values) != n_bins: raise InvalidUncertaintyDefinition( f"Modifier '{mod_name}': expected {n_bins} bins, got {len(values)}" ) with warnings.catch_warnings(record=True): if values.ndim == 1: delta_up = 1.0 + np.where( signal_yields_arr != 0.0, values / signal_yields_arr, 0.0 ) delta_dn = delta_up elif values.ndim == 2: delta_up = 1.0 + np.where( signal_yields_arr != 0.0, values[:, 0] / signal_yields_arr, 0.0 ) delta_dn = 1.0 + np.where( signal_yields_arr != 0.0, values[:, 1] / signal_yields_arr, 0.0 ) else: raise InvalidUncertaintyDefinition( f"Modifier '{mod_name}': unsupported uncertainty shape " f"({values.ndim}D array). " "Expected 1D for symmetric or 2D for asymmetric (up, down) uncertainties." ) # Clamp deltas to >= 1 so log() stays real and bins where the relative # variation would have driven the multiplicative factor non-positive # are treated as "no variation" (factor of 1) rather than NaN. delta_up = _validate_delta(delta_up, mod_name, "up") delta_dn = _validate_delta(delta_dn, mod_name, "dn") log_up = np.log(delta_up) log_dn = np.log(delta_dn) if mod_type == "normalization": # One nuisance parameter shared across all bins par_idx = current_idx current_idx += 1 parameter_names.append(f"theta_sig_{mod_name}") constraints.append( { "distribution_type": "normal", "args": [np.zeros(1), np.ones(1)], "kwargs": {"domain": np.r_[par_idx]}, } ) def _lam_norm( param: np.ndarray, lu: np.ndarray, ld: np.ndarray, pidx: int ) -> np.ndarray: alpha = param[pidx] return alpha * (lu if alpha > 0 else ld) lambdas.append(partial(_lam_norm, lu=log_up, ld=log_dn, pidx=par_idx)) else: # mod_type == "shape" # One independent nuisance parameter per bin bin_indices = np.arange(current_idx, current_idx + n_bins) current_idx += n_bins parameter_names.extend([f"theta_sig_{mod_name}_{i}" for i in range(n_bins)]) for par_idx in bin_indices: constraints.append( { "distribution_type": "normal", "args": [np.zeros(1), np.ones(1)], "kwargs": {"domain": np.r_[par_idx]}, } ) def _lam_shape( param: np.ndarray, lu: np.ndarray, ld: np.ndarray, indices: np.ndarray, ) -> np.ndarray: alphas = param[indices] chosen_log = np.where(alphas >= 0, lu, ld) return alphas * chosen_log lambdas.append(partial(_lam_shape, lu=log_up, ld=log_dn, indices=bin_indices)) n_parameters = current_idx - initial_idx if not lambdas: return { "lambda": lambda _: 1.0, "constraint": [], "n_parameters": 0, "parameter_names": [], } if len(lambdas) == 1: lam_fn = lambda param: np.exp(lambdas[0](param)) # noqa: E731 else: def lam_fn(param: np.ndarray) -> np.ndarray: return np.exp(reduce(lambda x, y: x + y, (lam(param) for lam in lambdas))) return { "lambda": lam_fn, "constraint": constraints, "n_parameters": n_parameters, "parameter_names": parameter_names, }
def _validate_delta(delta: np.ndarray, mod_name: str, tag: str) -> np.ndarray: """ Sanitise a per-bin multiplicative variation factor for log-normal morphing. The factor ``delta`` must satisfy ``delta > 0`` so that ``log(delta)`` is finite. Entries below 1 (which would drive the morphed signal negative at ``alpha = 1``) and any NaN entries are reported and replaced by 1 (i.e. "no variation" for that bin). Args: delta (``np.ndarray``): per-bin variation factor (up or down). mod_name (``str``): modifier name used in the warning message. tag (``str``): ``"up"`` or ``"dn"`` indicating which side. Returns: ``np.ndarray``: Sanitised delta array of the same shape, with non-finite or sub-1 entries replaced by 1. """ bad_finite = delta < 1 bad_nan = np.isnan(delta) bad = bad_finite | bad_nan if np.any(bad): log.warning( f"Modifier '{mod_name}': invalid delta_{tag} at bins " f"{np.where(bad)[0].tolist()}; clamping to 1.0." ) delta = np.where(bad_nan, 1.0, delta) return np.clip(delta, 1.0, None)