Statistical Model Combiner#

Uncorrelated Statistical Model Combiner#

This module provides UnCorrStatisticsCombiner, which combines an arbitrary collection of StatisticalModel objects under the assumption that they are statistically independent (uncorrelated). Because the analyses share no nuisance parameters, the joint likelihood factorises into a simple product, which makes the combination both exact and computationally efficient.

Mathematical Background#

Factorised likelihood

Given \(N\) independent statistical models indexed by \(i\), each with its own set of nuisance parameters \(\boldsymbol{\theta}_i\), the joint likelihood is

\[\mathcal{L}_{\rm comb}(\mu, \{\boldsymbol{\theta}_i\}) = \prod_{i=1}^{N} \mathcal{L}_i(\mu, \boldsymbol{\theta}_i),\]

where \(\mu\) is the single, shared parameter of interest (signal strength). Taking the negative logarithm converts the product into a sum,

\[-\ln\mathcal{L}_{\rm comb}(\mu, \{\boldsymbol{\theta}_i\}) = \sum_{i=1}^{N} \left[-\ln\mathcal{L}_i(\mu, \boldsymbol{\theta}_i)\right],\]

so the combined NLL is simply the sum of the individual NLLs evaluated at the same \(\mu\).

Profile likelihood ratio

The profile likelihood ratio is

\[\lambda(\mu) = \frac{\mathcal{L}_{\rm comb}(\mu,\,\hat{\hat{\boldsymbol{\theta}}}(\mu))} {\mathcal{L}_{\rm comb}(\hat\mu,\,\hat{\boldsymbol{\theta}})},\]

where \((\hat\mu, \hat{\boldsymbol{\theta}})\) denote the global maximum and \(\hat{\hat{\boldsymbol{\theta}}}(\mu)\) denotes the conditional maximum for fixed \(\mu\). Because the analyses are independent the nuisance parameters of each model are profiled separately, so

\[-2\ln\lambda_{\rm comb}(\mu) = \sum_{i=1}^{N} \left[-2\ln\lambda_i(\mu)\right].\]

This means the combined test statistic is the sum of the individual test statistics, and standard asymptotic formulae (Wald approximation, Asimov data) apply without modification.

Initialisation of \(\hat\mu\)

When no initial value is provided, UnCorrStatisticsCombiner estimates a starting point for the optimiser using the Gaussian (inverse-variance) weighted combination of the per-model best-fit signal strengths,

\[\mu_{\rm init} = \frac{\displaystyle\sum_i \hat\mu_i\,\sigma_{\hat\mu,i}^{-2}} {\displaystyle\sum_i \sigma_{\hat\mu,i}^{-2}},\]

where \(\hat\mu_i\) is the best-fit signal strength for model \(i\) and \(\sigma_{\hat\mu,i}\) is the corresponding uncertainty estimated via sigma_mu().

Asimov data

Asimov data are generated independently for each constituent model and stored in a dictionary keyed by analysis name. The combined Asimov likelihood is then computed by evaluating the factorised NLL at those synthetic observations.

References

  • G. Cowan, K. Cranmer, E. Gross, O. Vitells, Asymptotic formulae for likelihood-based tests of new physics, Eur. Phys. J. C 71 (2011) 1554, [arXiv:1007.1727].

class spey.UnCorrStatisticsCombiner(*args, ntoys: int = 1000)[source]#

Bases: HypothesisTestingBase

Combine uncorrelated (independent) statistical models.

This class accepts an arbitrary number of StatisticalModel instances and treats them as statistically independent analyses. Because the models share no nuisance parameters, the joint likelihood factorises into a product of individual likelihoods,

\[\mathcal{L}_{\rm comb}(\mu) = \prod_{i} \mathcal{L}_i(\mu),\]

and the combined negative log-likelihood (NLL) is therefore just the sum,

\[\mathrm{NLL}_{\rm comb}(\mu) = \sum_{i} \mathrm{NLL}_i(\mu).\]

The constituent models are not required to use the same backend — any mix of registered backends is supported.

The combined stack is mutable: models can be added with append() (or the @ operator, __matmul__()) and removed with remove(). Models are identified by their unique analysis string; duplicate analysis names are rejected.

Warning

UnCorrStatisticsCombiner assumes that none of the constituent statistical models share nuisance parameters. Violating this assumption leads to incorrect (over-confident) results because the nuisance parameters of each model are profiled independently rather than jointly.

Parameters:
  • *args (StatisticalModel) – One or more independent statistical models to include in the initial stack. Additional models can be added later via append().

  • ntoys (int, default 1000) – Number of pseudo-experiments used when a toy-based calculator is requested. This argument is passed through to the base class and is otherwise not used by the uncorrelated combiner itself (see is_toy_calculator_available).

Raises:

Examples

>>> import spey
>>> pdf_wrapper = spey.get_backend("default.poisson")
>>> model_A = pdf_wrapper(signal_yields=[3.0], background_yields=[50.0],
...                       data=[52], analysis="SR_A")
>>> model_B = pdf_wrapper(signal_yields=[1.5], background_yields=[20.0],
...                       data=[18], analysis="SR_B")
>>> combiner = spey.UnCorrStatisticsCombiner(model_A, model_B)
>>> combiner.exclusion_confidence_level()  # combined CLs
property analyses: List[str]#

Unique analysis identifiers of all models currently in the stack.

Returns:

Analysis names in the same order as statistical_models.

Return type:

List[str]

append(statistical_model: StatisticalModel) None[source]#

Append new independent StatisticalModel to the stack.

Parameters:

statistical_model (StatisticalModel) – new statistical model to be added to the stack.

Raises:
asimov_likelihood(poi_test: float | Dict[int | str, float] = 1.0, expected: ExpectationType = observed, return_nll: bool = True, test_statistics: str = 'qtilde', statistical_model_options: Dict[str, Dict] | None = None, **kwargs) float[source]#

Compute likelihood of the statistical model stack generated with the Asimov data.

Parameters:
  • poi_test (float, default 1.0) – parameter of interest, \(\mu\).

  • expected (ExpectationType) –

    Sets which values the fitting algorithm should focus and p-values to be computed.

    • observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).

    • aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.

    • apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.

  • return_nll (bool, default True) – If True, returns negative log-likelihood value. if False returns likelihood value.

  • test_statistics (Text, default "qtilde") –

    test statistics.

    • 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).

      Warning

      Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.

    • 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).

    • 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).

  • statistical_model_options (Dict[Text, Dict], default None) –

    backend specific options. The dictionary key needs to be the backend name and the item needs to be the dictionary holding the keyword arguments specific to that particular backend.

    >>> statistical_model_options = {"default.uncorrelated_background" : {"init_pars" : [1., 3., 4.]}}
    

  • kwargs – keyword arguments for the optimiser.

Returns:

likelihood computed for asimov data

Return type:

float

chi2(poi_test: float | Dict[int | str, float] = 1.0, poi_test_denominator: float | Dict[int | str, float] | None = None, expected: ExpectationType = observed, allow_negative_signal: bool = False, init_pars: List[float] | None = None, par_bounds: List[Tuple[float, float]] | None = None, mle_kwargs=None, likelihood_kwargs=None) float#

Compute the profile likelihood ratio \(\chi^2\) test statistic.

When poi_test_denominator=None, evaluates the profile likelihood ratio against the unconditional maximum:

\[\chi^2 = -2\log\left(\frac{\mathcal{L}(\mu,\hat\theta_\mu)}{\mathcal{L}(\hat\mu,\hat\theta)}\right)\]

When poi_test_denominator is set, it replaces the denominator with a second fixed-\(\mu\) likelihood:

\[\chi^2 = -2\log\left(\frac{\mathcal{L}(\mu,\theta_\mu)}{\mathcal{L}(\mu_{\rm denom},\theta_{\mu_{\rm denom}})}\right)\]

where \(\mu_{\rm denom}\) is poi_test_denominator which is typically zero to compare signal model with the background only model.

Parameters:
  • poi_test (PoiTest or list[float], default 1.0) – Parameter of interest, \(\mu\). A plain float (or iterable of floats) fixes the primary POI — when iterable, \(\chi^2\) is computed for each element. Alternatively, a dict of {index_or_name: value} fixes multiple parameters simultaneously (iterating over dicts is not supported).

  • poi_test_denominator (PoiTest, default None) – Parameter of interest for the denominator. Accepts the same formats as poi_test. If None the maximum likelihood is computed instead.

  • expected (ExpectationType) –

    Sets which values the fitting algorithm should focus and p-values to be computed.

    • observed: Computes the p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth (default).

    • aposteriori: Computes the expected p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth.

    • apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.

  • allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative. Only valid when poi_test_denominator=None.

  • init_pars (List[float], default None) – initial parameters for the optimiser

  • par_bounds (List[Tuple[float, float]], default None) – parameter bounds for the optimiser.

  • mle_kwargs (dict, default None) –

    Keyword arguments forwarded to the denominator evaluation. When poi_test_denominator=None they are passed to maximize_likelihood() (free fit of \(\hat\mu,\hat\theta\)); otherwise they are passed to likelihood() at the fixed \(\mu_{\rm denom}\). If None, an empty dict is used. Accepted keys:

    Consumed by prepare_for_fit:

    • do_grad (bool, default True): Request the gradient of the objective function from the backend. Falls back to False automatically if the backend raises NotImplementedError.

    • constraints (List[Dict], default []): Additional scipy-style constraint dicts appended to any backend-defined constraints.

    • fixed_poi_value (Union[float, Dict[int, float]], default None): Fix one or more POIs during the maximisation while the remaining parameters are profiled freely. Only effective in the maximize_likelihood branch (i.e. when poi_test_denominator=None); ignored (with a warning) by likelihood, which takes the fixed POIs from poi_test_denominator.

    Consumed by fit (the core optimisation loop):

    • minimizer (str, default "scipy" or the value of the SPEY_OPTIMISER environment variable): Selects the numerical minimiser. Accepted values are "scipy" and "minuit" (requires iminuit).

    • hessian (Callable[[np.ndarray], np.ndarray], default None): Hessian of the objective function. Passed to scipy as the hess argument; ignored by minuit.

    Scipy-minimiser options (used when minimizer="scipy"):

    • method (str, default "SLSQP"): Scipy optimisation method (e.g. "SLSQP", "L-BFGS-B", "trust-constr").

    • maxiter (int, default 10000): Maximum number of iterations.

    • tol (float, default 1e-6): Convergence tolerance.

    • disp (bool, default False): Print convergence messages if True.

    • ntrials (int, default 1): Number of re-tries with progressively expanded parameter bounds when the minimiser does not converge.

    Minuit-minimiser options (used when minimizer="minuit"):

    • method (str, default "migrad"): Minuit algorithm ("migrad" or "simplex").

    • maxiter (int, default 10000): Maximum number of function calls.

    • tol (float, default 1e-6): Convergence tolerance.

    • disp (int, default 0): Minuit print level (0 = silent).

    • strategy (int, default 0): Minuit strategy (0 = fast, 1 = default, 2 = slow but more accurate).

    • errordef (float, default Minuit.LIKELIHOOD): Value by which Minuit defines a one-sigma interval (0.5 for NLL, 1.0 for \(\chi^2\)).

    Unknown keys are logged as a warning and silently discarded by the minimiser.

  • likelihood_kwargs (dict, default None) –

    Keyword arguments forwarded to likelihood() when evaluating the numerator at each requested poi_test value (including every element of an iterable or scan). If None, an empty dict is used. Accepts the same keys as mle_kwargs above, with one caveat:

    • fixed_poi_value is not supported here — the numerator’s fixed POIs are controlled entirely by poi_test. Passing fixed_poi_value is logged as a warning and discarded.

Returns:

value of the \(\chi^2\).

Return type:

float

chi2_test(expected: ExpectationType = observed, confidence_level: float = 0.95, limit_type: Literal['right', 'left', 'two-sided'] = 'two-sided', allow_negative_signal: bool = None, parameter: int | str | None = None, poi_value: float = 1.0, n_scan: int = 3, n_multistart: int = 2, **kwargs) List[float]#

Determine parameter value(s) that constrain the \(\chi^2\) distribution at a specified confidence level via 1D profiling.

When parameter=None (default), the method profiles the primary POI and finds the POI values where the profile \(\chi^2\) equals the threshold chi2.isf(alpha, df=1).

When parameter is set to a nuisance parameter index or name, the POI is fixed to poi_value (default 1.0) and the method profiles the chosen nuisance parameter instead, locating the nuisance value(s) at the same \(\chi^2\) threshold. This is useful for setting 1D confidence intervals on any model parameter.

Added in version 0.2.0.

Changed in version 0.2.7: The ability to profile any given nuisance parameter has been implemented. The 1D profile is now enumerated by coarse scan plus bracketed root refinement, so non-convex likelihoods with disjoint confidence regions return every crossing in ascending order. A small multi-start is performed before the root search to harden the NLL minimum used as the anchor of the \(\chi^2\) threshold.

Attention

The degrees of freedom are set to one, referring to the single profiled parameter (either the POI or the selected nuisance parameter).

Parameters:
  • expected (ExpectationType) –

    Specifies the type of expectation for the fitting algorithm and p-value computation.

    • observed: Computes p-values using post-fit prescription, assuming experimental data as the truth.

    • apriori: Computes expected p-values using pre-fit prescription, assuming the Standard Model (SM) as the truth.

  • confidence_level (float, default 0.95) – The confidence level for the interval. Must be between 0 and 1. This refers to the total inner area under the bell curve, noted as \(CL\) below.

  • limit_type ('right', 'left' or 'two-sided', default 'two-sided') – Specifies which side of the \(\chi^2\) distribution should be constrained.

  • allow_negative_signal (bool, default None) – Controls whether the POI can be negative during the global unconstrained maximisation. If None, it is set to True for two-sided and left limits, and False for right limits. Ignored when parameter is not None (the global fit is always unconstrained in that case).

  • parameter (int or str, default None) – Index or name of the nuisance parameter to profile. When None (default) the primary POI is profiled (existing behaviour). When set, the POI is fixed to poi_value and the selected nuisance parameter is scanned instead. String values are resolved via parameter_names.

  • poi_value (float, default 1.0) – Fixed value of the primary POI when profiling a nuisance parameter (i.e. when parameter is not None). Has no effect when parameter=None. If poi_value=None, primary POI will also be minimised during optimisation.

  • n_scan (int, default 121) – Number of uniformly-spaced grid points used by the coarse scan that enumerates sign changes of the profile \(\chi^2 - \text{threshold}\) function. Each sign-change interval is then refined to full precision with toms748(). Increasing this value improves detection of narrow features in non-convex profiles at the cost of additional NLL evaluations; values below 3 are clamped to 3.

  • n_multistart (int, default 9) – Number of evenly-spaced evaluations used by the internal multi-start scan that re-anchors the NLL minimum before the root search. A bounded scalar minimisation is then run around the best point found by the scan. Increasing this value reduces the risk of the anchor being trapped in a local minimum at the cost of additional NLL evaluations; values below 2 are clamped to 2.

Keyword Arguments:
  • xtol (float, default 2e-12) – Absolute tolerance passed to toms748(). The root-finder stops when the bracket width falls below this value.

  • rtol (float, default 1e-4) – Relative tolerance passed to toms748(). The root-finder stops when the bracket width is smaller than rtol * |root|.

  • maxiter (int, default 10000) – Maximum number of function evaluations allowed inside toms748().

Returns:

Parameter value(s) at which the profile \(\chi^2\) equals the threshold, in ascending order. For a convex profile this is one value for one-sided limits and two values for two-sided limits (backwards compatible); for a non-convex profile with disjoint confidence regions every crossing is returned, so users that rely on a fixed list length should check len(result) before unpacking.

Return type:

List[float]

Raises:

ValueError – If parameter refers to the POI index, if the parameter name is not found in the model config, if the parameter index is out of range, or if the model has only one parameter (no nuisance parameters to profile).

exclusion_confidence_level(poi_test: float | Dict[int | str, float] = 1.0, expected: ExpectationType = observed, allow_negative_signal: bool = False, calculator: Literal['asymptotic', 'toy', 'chi_square'] = 'asymptotic', **kwargs) List[float]#

Compute the exclusion confidence level \(CL_s\) at a given \(\mu\).

\(CL_s\) is defined as

\[CL_s = \frac{p_{s+b}}{1 - p_b}\]

and is returned as \(1 - p\text{-value}\). The number of returned values depends on the expected mode:

  • observed → one value (fitted to observed data).

  • aposteriori / apriori → five values representing \(-2\sigma,\,-1\sigma,\,\text{central},\,+1\sigma,\,+2\sigma\) fluctuations from the background.

Parameters:
  • poi_test (PoiTest, default 1.0) – Parameter of interest \(\mu\) at which to evaluate \(CL_s\).

  • expected (ExpectationType) –

    Selects the expectation mode.

    • observed: Post-fit, returns one value (default).

    • aposteriori: Post-fit nuisance treatment, returns five expected values.

    • apriori: Pre-fit / SM hypothesis, returns five expected values.

    Setting expected="all" returns both the observed and the five expected values simultaneously.

  • allow_negative_signal (bool, default False) – When True, \(\hat\mu\) is unconstrained, switching the test statistic from \(\tilde{q}_\mu\) to \(q_\mu\).

  • calculator ('asymptotic', 'toy' or 'chi_square', default 'asymptotic') –

    • "asymptotic": Asymptotic formulae from [arXiv:1007.1727].

    • "toy": Pseudo-experiment-based p-values (requires is_toy_calculator_available).

    • "chi_square": \(\chi^2\)-based p-values; uses \(\chi^2 = -2\log[\mathcal{L}(\mu,\hat\theta_\mu)/\mathcal{L}(0,\hat\theta_0)]\).

  • kwargs

    Additional keyword arguments forwarded to the optimiser, including:

    • init_pars (List[float], default None): Initial parameter values for the optimiser.

    • par_bounds (List[Tuple[float, float]], default None): Parameter bounds for the optimiser.

Raises:

CalculatorNotAvailable – If the requested calculator is not available.

Returns:

\(CL_s\) value(s). One value for observed; five values ordered \((-2\sigma,\,-1\sigma,\,\text{central},\,+1\sigma,\,+2\sigma)\) for expected modes.

Return type:

List[float]

find_most_sensitive() StatisticalModel[source]#

Return the constituent model with the smallest expected 95 % CL exclusion cross section.

“Most sensitive” is defined as the analysis that places the tightest expected upper limit on the signal cross section (s95exp), i.e. the model for which the expected excluded signal strength is smallest.

Note

Cross-section information must be attached to each model for this method to work. See s95exp for details.

Returns:

The model with the minimum value of s95exp.

Return type:

StatisticalModel

fixed_poi_sampler(poi_test: float | Dict[int | str, float], size: int | None = None, expected: ExpectationType = observed, init_pars: List[float] | None = None, par_bounds: List[Tuple[float, float]] | None = None, **kwargs) ndarray | Callable[[int], ndarray]#

Sample data from the statistical model with fixed parameter of interest.

Parameters:
  • poi_test (PoiTest) – parameter of interest or signal strength, \(\mu\). Either a single float or a dict mapping POI indices/names to values.

  • size (int, default None) – sample size. If None a callable function will be returned which takes sample size as input.

  • expected (ExpectationType) –

    Sets which values the fitting algorithm should focus and p-values to be computed.

    • observed: Computes the p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth (default).

    • aposteriori: Computes the expected p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth.

    • apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.

  • init_pars (List[float], default None) – initial parameters for the optimiser

  • par_bounds (List[Tuple[float, float]], default None) – parameter bounds for the optimiser.

  • kwargs – keyword arguments for the optimiser.

Raises:

MethodNotAvailable – If the backend does not have a sampler implementation.

Returns:

Sampled data with shape of (size, number of bins) or callable function to sample from directly.

Return type:

Union[np.ndarray, Callable[[int], np.ndarray]]

generate_asimov_data(expected: ExpectationType = observed, test_statistic: str = 'qtilde', statistical_model_options: Dict[str, Dict] | None = None, **kwargs) Dict[str, List[float]][source]#

Generate Asimov data for the statistical model. This function calls generate_asimov_data() function for each statistical model with appropriate statistical_model_options and generates Asimov data for each statistical model independently.

Parameters:
  • expected (ExpectationType) –

    Sets which values the fitting algorithm should focus and p-values to be computed.

    • observed: Computes the p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth (default).

    • aposteriori: Computes the expected p-values with via post-fit prescriotion which means that the experimental data will be assumed to be the truth.

    • apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.

  • test_statistic (Text, default "qtilde") –

    test statistics.

    • 'qtilde': (default) performs the calculation using the alternative test statistic, \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727] (qmu_tilde()).

      Warning

      Note that this assumes that \(\hat\mu\geq0\), hence allow_negative_signal assumed to be False. If this function has been executed by user, spey assumes that this is taken care of throughout the external code consistently. Whilst computing p-values or upper limit on \(\mu\) through spey this is taken care of automatically in the backend.

    • 'q': performs the calculation using the test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727] (qmu()).

    • 'q0': performs the calculation using the discovery test statistic, see eq. (47) of [arXiv:1007.1727] \(q_{0}\) (q0()).

  • statistical_model_options (Dict[Text, Dict], default None) –

    backend specific options. The dictionary key needs to be the backend name and the item needs to be the dictionary holding the keyword arguments specific to that particular backend.

    >>> statistical_model_options = {"default.uncorrelated_background" : {"init_pars" : [1., 3., 4.]}}
    

  • kwargs – keyword arguments for the optimiser.

Returns:

Returns a dictionary for data specific to each analysis. keywords will be analysis names and the items are data.

Return type:

Dict[Text, List[float]]

property is_alive: bool#

Whether the combined model carries any non-zero signal.

Returns True if at least one model in the stack has at least one bin with a non-zero signal yield. A combiner that is not alive cannot produce a meaningful exclusion limit.

Returns:

True if any constituent model is alive, False otherwise.

Return type:

bool

property is_asymptotic_calculator_available: bool#

Whether the asymptotic calculator is available for the combined stack.

Returns True only when every constituent model supports the asymptotic calculator, because the combination must evaluate all individual likelihoods.

Returns:

bool

property is_chi_square_calculator_available: bool#

Whether the \(\chi^2\) calculator is available for the combined stack.

Returns True only when every constituent model supports the \(\chi^2\) calculator.

Returns:

bool

property is_toy_calculator_available: bool#

Whether a toy (pseudo-experiment) calculator is available.

Toy-based combination across independent analyses would require sampling from the joint distribution, which is not yet implemented. This property therefore always returns False.

Returns:

Always False.

Return type:

bool

items() Iterator[Tuple[str, StatisticalModel]][source]#

Iterate over (analysis_name, model) pairs, analogous to dict.items().

Yields:

Tuple[str, StatisticalModel] – The analysis identifier and the corresponding StatisticalModel for each entry in the stack.

likelihood(poi_test: float | Dict[int | str, float] = 1.0, expected: ExpectationType = observed, return_nll: bool = True, data: Dict[str, List[float]] | None = None, statistical_model_options: Dict[str, Dict] | None = None, **kwargs) float[source]#

Evaluate the combined (profile) likelihood at a fixed signal strength \(\mu\).

Under the uncorrelated assumption, the nuisance parameters of each model are profiled independently and the combined NLL is the sum of the individual NLLs:

\[\mathrm{NLL}_{\rm comb}(\mu) = \sum_{i=1}^{N} \mathrm{NLL}_i(\mu).\]

If any individual model raises NegativeExpectedYields (which can happen at very large or very small \(\mu\) values), a RuntimeWarning is issued and NaN is returned immediately without evaluating the remaining models.

Parameters:
  • poi_test (float, default 1.0) – Signal strength \(\mu\) at which to evaluate the combined likelihood.

  • expected (ExpectationType) –

    Controls which dataset is treated as observed when profiling nuisance parameters and computing p-values.

    • observed: Use the real experimental data (default).

    • aposteriori: Use the post-fit expected dataset (data-driven Asimov).

    • apriori: Use the pre-fit expected dataset (SM Asimov; nuisance parameters set to their nominal values).

  • return_nll (bool, default True) – If True, return the combined negative log-likelihood \(\mathrm{NLL}_{\rm comb}(\mu)\). If False, return the likelihood \(\mathcal{L}_{\rm comb}(\mu) = e^{-\mathrm{NLL}_{\rm comb}(\mu)}\).

  • data (Dict[str, List[float]], default None) – Per-analysis observed data to override the defaults stored in each model. Keys are analysis names; values are lists of bin counts or equivalent data understood by the respective backend. Analyses not present in the dictionary fall back to their internally stored data.

  • statistical_model_options (Dict[str, Dict], default None) –

    Backend- specific keyword arguments, keyed by backend type string. Each value is forwarded as **kwargs to the corresponding model’s likelihood() call.

    >>> statistical_model_options = {
    ...     "default.uncorrelated_background": {"init_pars": [1.0, 3.0, 4.0]}
    ... }
    

  • **kwargs – Additional keyword arguments forwarded to every individual model’s likelihood evaluation (e.g. optimiser settings).

Returns:

Combined NLL (if return_nll=True) or combined likelihood value. Returns NaN if any individual model cannot be evaluated at the requested \(\mu\).

Return type:

float

maximize_asimov_likelihood(return_nll: bool = True, expected: ExpectationType = observed, test_statistics: str = 'qtilde', initial_muhat_value: float | None = None, par_bounds: List[Tuple[float, float]] | None = None, statistical_model_options: Dict[str, Dict] | None = None, poi_indices: List[int | str] | None = None, **optimiser_options) Tuple[float | Dict[int | str, float], float][source]#

Find the maximum of the combined likelihood evaluated on Asimov data.

This method first generates per-model Asimov datasets via generate_asimov_data(), then maximises the combined NLL with respect to \(\mu\) using those synthetic observations. The result is used to compute the expected (median) test statistic under the signal or background hypothesis, which is the basis of the asymptotic CLs calculation.

The choice of test_statistics controls both the Asimov dataset generation and the signal-strength constraint applied during minimisation:

  • 'qtilde' / 'q0': require \(\hat\mu \geq 0\) (allow_negative_signal = False).

  • 'q' / 'qmu': allow \(\hat\mu < 0\) (allow_negative_signal = True).

Parameters:
  • return_nll (bool, default True) – If True, return the minimised NLL; if False, return the likelihood value at the maximum.

  • expected (ExpectationType) –

    Dataset prescription used when generating Asimov data and profiling nuisance parameters.

    • observed: Use the real data to set nuisance parameters before generating Asimov data (default).

    • aposteriori: Post-fit expected dataset.

    • apriori: Pre-fit SM Asimov data.

  • test_statistics (str, default "qtilde") –

    Test statistic used for Asimov data generation and to determine the \(\mu\) constraint.

    • 'qtilde': Alternative test statistic \(\tilde{q}_{\mu}\), see eq. (62) of [arXiv:1007.1727]. Assumes \(\hat\mu \geq 0\).

      Warning

      When test_statistics='qtilde' is used, allow_negative_signal is set to False internally. If you call this method directly, ensure that your external code handles this constraint consistently. When p-values or upper limits are computed through spey this is managed automatically.

    • 'q': Test statistic \(q_{\mu}\), see eq. (54) of [arXiv:1007.1727]. Allows \(\hat\mu < 0\).

    • 'q0': Discovery test statistic \(q_{0}\), see eq. (47) of [arXiv:1007.1727]. Assumes \(\hat\mu \geq 0\).

  • initial_muhat_value (float, default None) – Starting value for the optimiser. Defaults to 0.0 when None.

  • par_bounds (List[Tuple[float, float]], default None) – Explicit [(mu_min, mu_max)] bounds for the optimiser.

  • statistical_model_options (Dict[str, Dict], default None) –

    Backend- specific keyword arguments forwarded to each model.

    >>> statistical_model_options = {
    ...     "default.uncorrelated_background": {"init_pars": [1.0, 3.0, 4.0]}
    ... }
    

  • poi_indices (List[int | str], default None) – If provided, the returned \(\hat\mu\) is wrapped in a dictionary keyed by these values.

  • **optimiser_options – Additional keyword arguments forwarded to fit().

Returns:

A 2-tuple of (\(\hat\mu\), \(\mathrm{NLL}_{\rm min}\)), where \(\hat\mu\) is a float or a Dict depending on poi_indices.

Return type:

Tuple[float | Dict, float]

maximize_likelihood(return_nll: bool = True, expected: ExpectationType = observed, allow_negative_signal: bool = True, data: Dict[str, List[float]] | None = None, initial_muhat_value: float | None = None, par_bounds: List[Tuple[float, float]] | None = None, statistical_model_options: Dict[str, Dict] | None = None, poi_indices: List[int | str] | None = None, **optimiser_options) Tuple[float | Dict[int | str, float], float][source]#

Find the global maximum of the combined likelihood over \(\mu\).

The optimiser minimises \(-2\ln\mathcal{L}_{\rm comb}(\mu)\) with respect to the single shared signal strength \(\mu\), while the nuisance parameters of each constituent model are profiled out independently inside likelihood().

Initialisation heuristic

If initial_muhat_value is not provided, the starting point for the optimiser is estimated in the Gaussian limit using the inverse-variance weighted mean of the per-model best-fit values,

\[\mu_{\rm init} = \frac{\sum_i \hat\mu_i\,/\,\sigma_{\hat\mu,i}^2} {\sum_i 1\,/\,\sigma_{\hat\mu,i}^2},\]

where \(\hat\mu_i\) is obtained from maximize_likelihood() and \(\sigma_{\hat\mu,i}\) from sigma_mu(). If this estimation fails for any reason (e.g. a model is not yet converging), the initialisation falls back to \(\mu_{\rm init} = 0\).

Parameters:
  • return_nll (bool, default True) – If True, return the minimised NLL; if False, return the likelihood value at the maximum.

  • expected (ExpectationType) –

    Dataset prescription used when profiling nuisance parameters.

    • observed: Real data (default).

    • aposteriori: Post-fit expected data.

    • apriori: Pre-fit SM Asimov data.

  • allow_negative_signal (bool, default True) – If True, \(\hat\mu\) is unconstrained from below. If False, the lower bound is set to zero (i.e. \(\hat\mu \geq 0\)).

  • data (Dict[str, List[float]], default None) – Per-analysis data overrides; see likelihood() for the expected format.

  • initial_muhat_value (float, default None) – Explicit starting value for the optimiser. When None the weighted-mean heuristic described above is used.

  • par_bounds (List[Tuple[float, float]], default None) – Explicit bounds [(mu_min, mu_max)] for the optimiser. When None the bounds are derived from minimum_poi and initial_muhat_value.

  • statistical_model_options (Dict[str, Dict], default None) –

    Backend- specific keyword arguments forwarded to each model’s calls.

    >>> statistical_model_options = {
    ...     "default.uncorrelated_background": {"init_pars": [1.0, 3.0, 4.0]}
    ... }
    

  • poi_indices (List[int | str], default None) – If provided, the returned \(\hat\mu\) is wrapped in a dictionary keyed by the entries of this list (useful when downstream code expects a mapping of POI names to values).

  • **optimiser_options – Additional keyword arguments forwarded to the fit() routine.

Returns:

A 2-tuple of (\(\hat\mu\), \(\mathrm{NLL}_{\rm min}\)). \(\hat\mu\) is a plain float when poi_indices is None, or a Dict mapping each entry of poi_indices to the same \(\hat\mu\) value otherwise.

Return type:

Tuple[float | Dict, float]

property minimum_poi: float#

Lower bound on the parameter of interest \(\mu\) for the combined stack.

Because all models share a single \(\mu\), the combined lower bound is the maximum of the per-model lower bounds: the signal strength must simultaneously satisfy every individual model’s constraint.

Returns:

Maximum of the individual minimum-POI values across all models in the stack.

Return type:

float

ntoys#

Number of toy pseudo-experiments used by the toy-based calculator.

poi_upper_limit(expected: ExpectationType = observed, confidence_level: float = 0.95, low_init: float = 1.0, hig_init: float = 1.0, expected_pvalue: Literal['nominal', '1sigma', '2sigma'] = 'nominal', maxiter: int = 10000, optimiser_arguments: Dict[str, Any] | None = None) float | List[float]#

Compute the upper limit for the parameter of interest (POI), denoted as \(\mu\).

Parameters:
  • expected (ExpectationType, default observed) –

    Specifies the type of expectation for the fitting algorithm and p-value computation.

    • observed: Computes p-values using post-fit prescription, assuming experimental data as the truth (default).

    • aposteriori: Computes expected p-values using post-fit prescription, assuming experimental data as the truth.

    • apriori: Computes expected p-values using pre-fit prescription, assuming the Standard Model (SM) as the truth.

  • confidence_level (float, default 0.95) – Confidence level for the upper limit, representing \(1 - CL_s\). Must be between 0 and 1. Default is 0.95.

  • low_init (Optional[float], default 1.0) –

    Initial lower limit for the search algorithm. If None, it is determined by \(\hat\mu + 1.5\sigma_{\hat\mu}\). Default is 1.0.

    Note

    \(\sigma_{\hat\mu}\) is determined via sigma_mu() function.

  • hig_init (Optional[float], default 1.0) –

    Initial upper limit for the search algorithm. If None, it is determined by \(\hat\mu + 2.5\sigma_{\hat\mu}\). Default is 1.0.

    Note

    \(\sigma_{\hat\mu}\) is determined via sigma_mu() function.

  • expected_pvalue (Literal["nominal", "1sigma", "2sigma"], default "nominal") –

    In case of aposteriori and apriori expectation, specifies the type of expected p-value for upper limit calculation.

    • "nominal": Computes the upper limit for the central p-value. Returns a single value.

    • "1sigma": Computes the upper limit for the central p-value and \(1\sigma\) fluctuation from background. Returns 3 values.

    • "2sigma": Computes the upper limit for the central p-value and \(1\sigma\) and \(2\sigma\) fluctuation from background. Returns 5 values.

    Note

    For expected=spey.ExpectationType.observed, expected_pvalue argument will be overwritten to "nominal".

  • allow_negative_signal (bool, default True) – Allows for negative signal values, changing the computation of the test statistic. Default is False.

  • maxiter (int, default 10000) – Maximum number of iterations for the optimiser. Default is 10000.

  • optimiser_arguments (Dict, default None) – Additional arguments for the optimiser used to compute the likelihood and its maximum. Default is None.

Returns:

  • A single value representing the upper limit for the nominal case.

  • A list of values representing the upper limits for the central value and statistical deviations (for “1sigma” and “2sigma” cases). The order is: \(-2\sigma\), \(-1\sigma\), central value, \(1\sigma\), \(2\sigma\).

Return type:

Union[float, List[float]]

Raises:

AssertionError – If the confidence level is not between 0 and 1.

pull(poi_test: float | Dict[int | str, float] = 1.0, expected: ExpectationType = observed, allow_negative_signal: bool = True, **kwargs) float#

Pull: measures how many standard deviations the observation is away from the expectation.

\[\text{pull}(\mu) = \operatorname{sign}(\hat{\mu}-\mu) \sqrt{-2\log\frac{L(\mu,\hat{\hat{\theta}}(\mu))}{L(\hat{\mu},\hat{\theta})}}\]

the square of the pull is the likelihood-ratio test statistic.

Parameters:
  • poi_test (PoiTest or list[float], default 1.0) – Parameter of interest, \(\mu\). A plain float (or iterable of floats) fixes the primary POI — when iterable, \(\chi^2\) is computed for each element. Alternatively, a dict of {index_or_name: value} fixes multiple parameters simultaneously (iterating over dicts is not supported).

  • expected (ExpectationType) –

    Sets which values the fitting algorithm should focus and p-values to be computed.

    • observed: Computes the p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth (default).

    • aposteriori: Computes the expected p-values with via post-fit prescription which means that the experimental data will be assumed to be the truth.

    • apriori: Computes the expected p-values with via pre-fit prescription which means that the SM will be assumed to be the truth.

  • allow_negative_signal (bool, default True) – If True \(\hat\mu\) value will be allowed to be negative. Only valid when poi_test_denominator=None.

  • kwargs – keyword arguments for the optimiser.

Returns:

value of pull.

Return type:

float

remove(analysis: str) None[source]#

Remove an analysis from the stack.

Parameters:

analysis (Text) – unique identifier of the analysis to be removed.

Raises:

AnalysisQueryError – If the unique identifier does not match any of the statistical models in the stack.

sigma_mu(poi_test: float | Dict[int | str, float], expected: ExpectationType = observed, test_statistics: Literal['qtilde', 'q', 'q0'] = 'qtilde', **kwargs) float#

Estimate the standard deviation of \(\hat\mu\) at a fixed \(\mu\).

Attempts the Hessian-based estimate first (via sigma_mu_from_hessian()) if that method exists on the subclass. When the Hessian is not available, falls back to the Asimov approximation from eq. (31) of [arXiv:1007.1727]:

\[\sigma_A = \frac{|\mu - \mu^\prime|}{\sqrt{q_{\mu,A}}}, \qquad q_{\mu,A} = -2\ln\lambda_A(\mu)\]

where \(\mu^\prime\) is the best-fit value on the Asimov dataset.

Parameters:
  • poi_test (PoiTest) – Parameter of interest value \(\mu\) at which to evaluate \(\sigma_\mu\).

  • expected (ExpectationType) –

    Selects which dataset to condition on.

    • observed: Use observed data (post-fit, default).

    • aposteriori: Use observed data with post-fit nuisance treatment.

    • apriori: Use background-only prediction (pre-fit / SM hypothesis).

  • test_statistics (str, default "qtilde") –

    Test statistic used for the Asimov approximation (ignored when the Hessian path is taken).

    • 'qtilde': \(\tilde{q}_\mu\), eq. (62) of [arXiv:1007.1727].

      Warning

      This assumes \(\hat\mu \geq 0\). spey’s public API enforces this automatically.

    • 'q': \(q_\mu\), eq. (54) of [arXiv:1007.1727].

    • 'q0': Discovery statistic \(q_0\), eq. (47) of [arXiv:1007.1727].

  • kwargs

    Additional keyword arguments forwarded to the optimiser, including:

    • init_pars (List[float], default None): Initial parameter values for the optimiser.

    • par_bounds (List[Tuple[float, float]], default None): Parameter bounds for the optimiser.

Returns:

Estimated standard deviation \(\sigma_\mu\) of the parameter of interest at the given \(\mu\).

Return type:

float

significance(expected: ExpectationType = observed, **kwargs) Tuple[float, float, List[float], List[float]]#

Compute the discovery significance of a positive signal.

Uses the discovery test statistic \(q_0\) (eq. 47 of [arXiv:1007.1727]) to quantify the evidence for a signal above the background-only hypothesis. The Asimov significance \(\sqrt{q_{0,A}}\) gives the median expected sensitivity, while \(\sqrt{q_0}\) is computed from the observed data. See sec. 5.1 of [arXiv:1007.1727] for details.

Note

aposteriori and observed both perform a post-fit computation and therefore return identical results. The only meaningful distinction is between post-fit (observed) and pre-fit (apriori) computations.

Parameters:
  • expected (ExpectationType) –

    Selects which dataset to condition on.

    • observed: Post-fit (default).

    • apriori: Pre-fit / SM hypothesis.

  • kwargs

    Additional keyword arguments forwarded to the optimiser, including:

    • init_pars (List[float], default None): Initial parameter values for the optimiser.

    • par_bounds (List[Tuple[float, float]], default None): Parameter bounds for the optimiser.

Returns:

A 4-tuple (sqrt_q0A, sqrt_q0, pvalues, expected_pvalues) where:

  • sqrt_q0A — Asimov discovery significance \(\sqrt{q_{0,A}}\).

  • sqrt_q0 — Observed discovery significance \(\sqrt{q_0}\).

  • pvalues — Observed p-value(s) for the \(q_0\) test.

  • expected_pvalues — Expected p-value(s) at \(-2\sigma,\,-1\sigma,\,\text{central},\,+1\sigma,\,+2\sigma\).

Return type:

Tuple[float, float, List[float], List[float]]

property statistical_models: Tuple[StatisticalModel]#

Immutable snapshot of the current model stack.

Returns:

All StatisticalModel instances currently registered in the combiner, in insertion order.

Return type:

Tuple[StatisticalModel]